Skip to main content

Xiaohong Chen Publications

Discussion Paper
Abstract

We propose a new formulation of the maximum score estimator that uses compositions of rectified linear unit (ReLU) functions, instead of indicator functions as in Manski (1975, 1985), to encode the sign alignment restrictions. Since the ReLU function is Lipschitz, our new ReLU-based maximum score criterion function is substantially easier to optimize using standard gradient-based optimization pacakges. We also show that our ReLU-based maximum score (RMS) estimator can be generalized to an umbrella framework defined by multi-index single-crossing (MISC) conditions, while the original maximum score estimator cannot be applied. We establish the n −s/(2s+1) convergence rate and asymptotic normality for the RMS estimator under order-s Holder smoothness. In addition, we propose an alternative estimator using a further reformulation of RMS as a special layer in a deep neural network (DNN) architecture, which allows the estimation procedure to be implemented via state-of-the-art software and hardware for DNN.

Discussion Paper
Abstract

This paper proposes a novel framework for the global optimization of a continuous function in a bounded rectangular domain. Specifically, we show that: (1) global optimization is equivalent to optimal strategy formation in a two-armed decision problem with known distributions, based on the Strategic Law of Large Numbers we establish; and (2) a sign-based strategy based on the solution of a parabolic PDE is asymptotically optimal. Motivated by this result, we propose a class of Strategic Monte Carlo Optimization (SMCO) algorithms, which uses a simple strategy that makes coordinate-wise two-armed decisions based on the signs of the partial gradient (or practically the first difference) of the objective function, without the need of solving PDEs. While this simple strategy is not generally optimal, it is sufficient for our SMCO algorithm to converge to a local optimizer from a single starting point, and to a global optimizer under a growing set of starting points. Numerical studies demonstrate the suitability of our SMCO algorithms for global optimization well beyond the theoretical guarantees established herein. For a wide range of test functions with challenging landscapes (multi-modal, non-differentiable and discontinuous), our SMCO algorithms perform robustly well, even in high-dimensional (d = 200 ∼ 1000) settings. In fact, our algorithms outperform many state-of-the-art global optimizers, as well as local algorithms augmented with the same set of starting points as ours.

Discussion Paper
Abstract

This paper proposes a novel framework for the global optimization of a continuous function in a bounded rectangular domain. Specifically, we show that: (1) global optimization is equivalent to optimal strategy formation in a two-armed decision problem with known distributions, based on the Strategic Law of Large Numbers we establish; and (2) a sign-based strategy based on the solution of a parabolic PDE is asymptotically optimal. Motivated by this result, we propose a class of Strategic Monte Carlo Optimization (SMCO) algorithms, which uses a simple strategy that makes coordinate-wise two-armed decisions based on the signs of the partial gradient (or practically the first difference) of the objective function, without the need of solving PDEs. While this simple strategy is not generally optimal, it is sufficient for our SMCO algorithm to converge to a local optimizer from a single starting point, and to a global optimizer under a growing set of starting points. Numerical studies demonstrate the suitability of our SMCO algorithms for global optimization well beyond the theoretical guarantees established herein. For a wide range of test functions with challenging landscapes (multi-modal, non-differentiable and discontinuous), our SMCO algorithms perform robustly well, even in high-dimensional (d = 200 ∼ 1000) settings. In fact, our algorithms outperform many state-of-the-art global optimizers, as well as local algorithms augmented with the same set of starting points as ours.

Discussion Paper
Abstract

We propose SLIM (Stochastic Learning and Inference in overidentified Models), a scalable stochastic approximation framework for nonlinear GMM. SLIM forms iterative updates from independent mini-batches of moments and their derivatives, producing unbiased directions that ensure almost-sure convergence. It requires neither a consistent initial estimator nor global convexity and accommodates both fixed-sample and random-sampling asymptotics. We further develop an optional second-order refinement and inference procedures based on random scaling and plug-in methods, including plug-in, debiased plug-in, and online versions of the Sargan–Hansen J-test tailored to stochastic learning. In Monte Carlo experiments based on a nonlinear EASI demand system with 576 moment conditions, 380 parameters, and n = 105 , SLIM solves the model in under 1.4 hours, whereas full-sample GMM in Stata on a powerful laptop converges only after 18 hours. The debiased plug-in J-test delivers satisfactory finite-sample inference, and SLIM scales smoothly to n = 106.

Discussion Paper
Abstract

This paper studies nonparametric local (over-)identification, in the sense of Chen and Santos (2018), and the associated semiparametric efficiency in modern causal frameworks. We develop a unified approach that begins by translating structural models with latent variables into their induced statistical models of observables and then analyzes local overidentification through conditional moment restrictions. We apply this approach to three leading models: (i) the general treatment model under unconfoundedness, (ii) the negative control model, and (iii) the long-term causal inference model under unobserved confounding. The first design yields a locally just-identified statistical model, implying that all regular asymptotically linear estimators of the treatment effect share the same asymptotic variance, equal to the (trivial) semiparametric efficiency bound. In contrast, the latter two models involve nonparametric endogeneity and are naturally locally overidentified; consequently, some doubly robust orthogonal moment estimators of the average treatment effect are inefficient. Whereas existing work typically imposes strong conditions to restore just-identification before deriving the efficiency bound, we relax such assumptions and characterize the general efficiency bound, along with efficient estimators, in the overidentified models (ii) and (iii).

Discussion Paper
Abstract

This paper studies the semiparametric estimation and inference of integral functionals on submanifolds, which arise naturally in a variety of econometric settings. For linear integral functionals on a regular submanifold, we show that the semiparametric plugin estimator attains the minimax-optimal convergence rate ns2s+d-m, where s is the Hölder smoothness order of the underlying nonparametric function, d is the dimension of the first-stage nonparametric estimation, m is the dimension of the submanifold over which the integral is taken. This rate coincides with the standard minimax-optimal rate for a (d − m)-dimensional nonparametric estimation problem, illustrating that integration over the m-dimensional manifold effectively reduces the problem’s dimensionality. We then provide a general asymptotic normality theorem for linear/nonlinear submanifold integrals, along with a consistent variance estimator. We provide simulation evidence in support of our theoretical results.

Discussion Paper
Abstract

This paper investigates efficient Difference-in-Differences (DiD) and Event Study (ES) estimation using short panel data sets within the heterogeneous treatment effect framework, free from parametric functional form assumptions and allowing for variation in treatment timing. We provide an equivalent characterization of the DiD potential outcome model using sequential conditional moment restrictions on observables, which shows that the DiD identification assumptions typically imply nonparametric overidentification restrictions. We derive the semiparametric efficient influence function (EIF) in closed form for DiD and ES causal parameters under commonly imposed parallel trends assumptions. The EIF is automatically Neyman orthogonal and yields the smallest variance among all asymptotically normal, regular estimators of the DiD and ES parameters. Leveraging the EIF, we propose simple-to-compute efficient estimators. Our results highlight how to optimally explore different pre-treatment periods and comparison groups to obtain the tightest (asymptotic) confidence intervals, offering practical tools for improving inference in modern DiD and ES applications even in small samples. Calibrated simulations and an empirical application demonstrate substantial precision gains of our efficient estimators in finite samples.

Discussion Paper
Abstract

We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest α-quantile for some α P p0, 1q. We focus on the offline setting whose generating process involves unobserved confounders. Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset. To address these challenges, we propose a suite of causal-assisted policy learning methods that provably enjoy strong theoretical guarantees under mild conditions. In particular, to address (i) and (ii), using causal inference tools such as instrumental variables and negative controls, we propose to estimate the quantile objectives by solving nonlinear functional integral equations. Then we adopt a minimax estimation approach with nonparametric models to solve these integral equations, and propose to construct conservative policy estimates that address (iii). The final policy is the one that maximizes these pessimistic estimates. In addition, we propose a novel regularized policy learning method that is more amenable to computation. Finally, we prove that the policies learned by these methods are Õ(n-1/2) quantile-optimal under a mild coverage assumption on the offline dataset. Here, Õ(·) omits poly-logarithmic factors. To the best of our knowledge, we propose the first sample-efficient policy learning algorithms for estimating the quantile-optimal policy when there exist unmeasured confounding.

Journal of Financial Econometrics
Abstract

We introduce a new class of algorithms, stochastic generalized method of moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin–Wu–Hausman and Sargan–Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well-known empirical examples with large sample sizes.

Review of Economic Studies
Abstract

We introduce two data-driven procedures for optimal estimation and inference in nonparametric models using instrumental variables. The first is a data-driven choice of sieve dimension for a popular class of sieve two-stage least-squares estimators. When implemented with this choice, estimators of both the structural function h0 and its derivatives (such as elasticities) converge at the fastest possible (i.e. minimax) rates in sup-norm. The second is for constructing uniform confidence bands (UCBs) for h0 and its derivatives. Our UCBs guarantee coverage over a generic class of data-generating processes and contract at the minimax rate, possibly up to a logarithmic factor. As such, our UCBs are asymptotically more efficient than UCBs based on the usual approach of undersmoothing. As an application, we estimate the elasticity of the intensive margin of firm exports in a monopolistic competition model of international trade. Simulations illustrate the good performance of our procedures in empirically calibrated designs. Our results provide evidence against common parameterizations of the distribution of unobserved firm heterogeneity.

Journal of Political Economy
Abstract

We develop a state-space model with a transition equation that takes the form of a functional vector autoregression (VAR) and stacks macroeconomic aggregates and a cross-sectional density. The measurement equation captures the error in estimating log densities from repeated cross-sectional samples. The log densities and their transition kernels are approximated by sieves, which leads to a finite-dimensional VAR for macroeconomic aggregates and sieve coefficients. With this model, we study the dynamics of technology shocks, GDP (gross domestic product), employment, and the earnings distribution. We find that spillovers between aggregate and distributional dynamics are generally small, that a positive technology shock tends to decrease inequality, and that a shock that raises earnings inequality leads to a small and insignificant GDP response.

Econometrica
Abstract

We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model. Our test statistic is based on a modified leave-one-out sample analog of a quadratic distance between the restricted and unrestricted sieve two-stage least squares estimators. We provide computationally simple, data-driven choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical values. Our test adapts to the unknown smoothness of alternative functions in the presence of unknown degree of endogeneity and unknown strength of the instruments. It attains the adaptive minimax rate of testing in L2. That is, the sum of the supremum of type I error over the composite null and the supremum of type II error over nonparametric alternative models cannot be minimized by any other tests for NPIV models of unknown regularities. Confidence sets in L2 are obtained by inverting the adaptive test. Simulations confirm that, across different strength of instruments and sample sizes, our adaptive test controls size and its finite-sample power greatly exceeds existing non-adaptive tests for monotonicity and parametric restrictions in NPIV models. Empirical applications to test for shape restrictions of differentiated products demand and of Engel curves are presented.

Discussion Paper
Abstract

Artificial Neural Networks (ANNs) can be viewed as nonlinear sieves that can approximate complex functions of high dimensional variables more effectively than linear sieves. We investigate the computational performance of various ANNs in nonparametric instrumental variables (NPIV) models of moderately high dimensional covariates that are relevant to empirical economics. We present two efficient procedures for estimation and inference on a weighted average derivative (WAD): an orthogonalized plug-in with optimally-weighted sieve minimum distance (OP-OSMD) procedure and a sieve efficient score (ES) procedure. Both estimators for WAD use ANN sieves to approximate the unknown NPIV function and are root-n asymptotically normal and first-order equivalent. We provide a detailed practitioner’s recipe for implementing both efficient procedures. This involves the choice of tuning parameters for the unknown NPIV, the conditional expectations and the optimal weighting function that are present in both procedures but also the choice of tuning parameters for the unknown Riesz representer in the ES procedure. We compare their finite-sample performances in various simulation designs that involve smooth NPIV function of up to 13 continuous covariates, different nonlinearities and covariate correlations. Some Monte Carlo findings include: 1) tuning and optimization are more delicate in ANN estimation; 2) given proper tuning, both ANN estimators with various architectures can perform well; 3) easier to tune ANN OP-OSMD estimators than ANN ES estimators; 4) stable inferences are more difficult to achieve with ANN (than spline) estimators; 5) there are gaps between current implementations and approximation theories. Finally, we apply ANN NPIV to estimate average partial derivatives in two empirical demand examples with multivariate covariates.

Discussion Paper
Abstract

We introduce computationally simple, data-driven procedures for estimation and inference on a structural function h0 and its derivatives in nonparametric models using instrumental variables. Our first procedure is a bootstrap-based, data-driven choice of sieve dimension for sieve nonparametric instrumental variables (NPIV) estimators. When implemented with this data-driven choice, sieve NPIV estimators of h0 and its derivatives are adaptive: they converge at the best possible (i.e., minimax) sup-norm rate, without having to know the smoothness of h0, degree of endogeneity of the regressors, or instrument strength. Our second procedure is a data-driven approach for constructing honest and adaptive uniform confidence bands (UCBs) for h0 and its derivatives. Our data-driven UCBs guarantee coverage for h0 and its derivatives uniformly over a generic class of data-generating processes (honesty) and contract at, or within a logarithmic factor of, the minimax sup-norm rate (adaptivity). As such, our data-driven UCBs deliver asymptotic efficiency gains relative to UCBs constructed via the usual approach of undersmoothing. In addition, both our procedures apply to nonparametric regression as a special case. We use our procedures to estimate and perform inference on a nonparametric gravity equation for the intensive margin of firm exports and find evidence against common parameterizations of the distribution of unobserved firm productivity.