This paper reviews recent advances in estimation and inference for nonparametric and semiparametric models with endogeneity. It first describes methods of sieves and penalization for estimating unknown functions identified via conditional moment restrictions. Examples include nonparametric instrumental variables regression (NPIV), nonparametric quantile IV regression and many more semi-nonparametric structural models. Asymptotic properties of the sieve estimators and the sieve Wald, quasi-likelihood ratio (QLR) hypothesis tests of functionals with nonparametric endogeneity are presented. For sieve NPIV estimation, the rate-adaptive data-driven choices of sieve regularization parameters and the sieve score bootstrap uniform confidence bands are described. Finally, simple sieve variance estimation and over-identification test for semiparametric two-step GMM are reviewed. Monte Carlo examples are included.
We propose new methods for estimating the bid-ask spread from observed transaction prices alone. Our methods are based on the empirical characteristic function instead of the sample autocovariance function like the method of Roll (1984). As in Roll (1984), we have a closed form expression for the spread, but this is only based on a limited amount of the model-implied identification restrictions. We also provide methods that take account of more identification information. We compare our methods theoretically and numerically with the Roll method as well as with its best known competitor, the Hasbrouck (2004) method, which uses a Bayesian Gibbs methodology under a Gaussian assumption. Our estimators are competitive with Roll’s and Hasbrouck’s when the latent true fundamental return distribution is Gaussian, and perform much better when this distribution is far from Gaussian. Our methods are applied to the Emini futures contract on the S&P 500 during the Flash Crash of May 6, 2010. Extensions to models allowing for unbalanced order flow or Hidden Markov trade direction indicators or trade direction indicators having general asymmetric support or adverse selection are also presented, without requiring additional data.
This paper considers estimation of semi-nonparametric GARCH filtered copula models in which the individual time series are modelled by semi-nonparametric GARCH and the joint distributions of the multivariate standardized innovations are characterized by parametric copulas with nonparametric marginal distributions. The models extend those of Chen and Fan (2006) to allow for semi-nonparametric conditional means and volatilities, which are estimated via the method of sieves such as splines. The fitted residuals are then used to estimate the copula parameters and the marginal densities of the standardized innovations jointly via the sieve maximum likelihood (SML). We show that, even using nonparametrically filtered data, both our SML and the two-step copula estimator of Chen and Fan (2006) are still root-n consistent and asymptotically normal, and the asymptotic variances of both estimators do not depend on the nonparametric filtering errors. Even more surprisingly, our SML copula estimator using the filtered data achieves the full semiparametric efficiency bound as if the standardized innovations were directly observed. These nice properties lead to simple and more accurate estimation of Value-at-Risk (VaR) for multivariate financial data with flexible dynamics, contemporaneous tail dependence and asymmetric distributions of innovations. Monte Carlo studies demonstrate that our SML estimators of the copula parameters and the marginal distributions of the standardized innovations have smaller variances and smaller mean squared errors compared to those of the two-step estimators in finite samples. A real data application is presented.
This paper considers semiparametric two-step GMM estimation and inference with weakly dependent data, where unknown nuisance functions are estimated via sieve extremum estimation in the first step. We show that although the asymptotic variance of the second-step GMM estimator may not have a closed form expression, it can be well approximated by sieve variances that have simple closed form expressions. We present consistent or robust variance estimation, Wald tests and Hansen’s (1982) over-identification tests for the second step GMM that properly reflect the first-step estimated functions and the weak dependence of the data. Our sieve semiparametric two-step GMM inference procedures are shown to be numerically equivalent to the ones computed as if the first step were parametric. A new consistent random-perturbation estimator of the derivative of the expectation of the non-smooth moment function is also provided.
In models defined by unconditional moment restrictions, specification tests are possible and estimators can be ranked in terms of efficiency whenever the number of moment restrictions exceeds the number of parameters. We show that a similar relationship between potential refutability of a model and semiparametric efficiency is present in a much broader class of settings. Formally, we show a condition we name local overidentification is required for both specification tests to have power against local alternatives and for the existence of both efficient and inefficient estimators of regular parameters. Our results immediately imply semiparametric conditional moment restriction models are typically locally overidentified, and hence their proper specification is locally testable. We further study nonparametric conditional moment restriction models and obtain a simple characterization of local overidentification in that context. As a result, we are able to determine when nonparametric conditional moment restriction models are locally testable, and when plug-in and two stage estimators of regular parameters are semiparametrically efficient.
In the unconditional moment restriction model of Hansen (1982), specification tests and more efficient estimators are both available whenever the number of moment restrictions exceeds the number of parameters of interest. We show a similar relationship between potential refutability of a model and existence of more efficient estimators is present in much broader settings. Specifically, a condition we name local overidentification is shown to be equivalent to both the existence of specification tests with nontrivial local power and the existence of more efficient estimators of some “smooth” parameters in general semi/nonparametric models. Under our notion of local overidentification, various locally nontrivial specification tests such as Hausman tests, incremental Sargan tests (or optimally weighted quasi-likelihood ratio tests) naturally extend to general semi/nonparametric settings. We further obtain simple characterizations of local overidentification for general models of nonparametric conditional moment restrictions with possibly different conditioning sets. The results are applied to determining when semi/nonparametric models with endogeneity are locally testable, and when nonparametric plug-in and semiparametric two-step GMM estimators are semiparametrically efficient. Examples of empirically relevant semi/nonparametric structural models are presented.
We show that spline and wavelet series regression estimators for weakly dependent regressors attain the optimal uniform (i.e., sup-norm) convergence rate (n/log n)-p/(2p+d) of Stone (1982), where d is the number of regressors and p is the smoothness of the regression function. The optimal rate is achieved even for heavy-tailed martingale difference errors with finite (2 + (d/p))th absolute moment for d/p < 2. We also establish the asymptotic normality of t statistics for possibly nonlinear, irregular functionals of the conditional mean function under weak conditions. The results are proved by deriving a new exponential inequality for sums of weakly dependent random matrices, which is of independent interest.