Publications by author | Cowles Foundation for Research in Economics

Abstract

Many economic parameters are identified by “thin sets” (submanifolds with Lebesgue measure zero) and hence difficult to recover from data in an ambient space. This paper provides a unified theory for estimation and inference of such “thin-set” identified functionals. We show that thin sets are not equally thin: their intrinsic dimensionality m matters in a precise manner. For a nonparametric regression h₀ with Hölder smoothness s and d-dimensional covariates in the ambient space, we show that $n — \frac{s}{2 s + d - m}$ , where s is the minimax optimal rate of estimating linear and nonlinear (e.g., quadratic, upper contour) integrals of h₀ on an m-dimensional submanifold (0 ≤ m < d), which is the fastest possible attainable rate among all estimators. The minimax lower bound rate result is generalized to estimating submanifold integrals when h₀ is a nonparametric density and a nonparametric instrumental variable function. The asymptotic normality of t statistics is established via sieve Riesz representation, and the corresponding inference is computed using Sobol points.

Abstract

This paper studies nonparametric local (over-)identification and the semiparametric efficiency in modern causal frameworks. We develop a unified approach that begins by translating structural models with latent variables into their induced statistical models of observables and then analyzes local overidentification through conditional moment restrictions. We apply this approach to three popular classes of causal models: (1) the general treatment model under unconfoundedness; (2) the negative control model, and (3) the long-term causal inference model under unobserved confounding. The first model yields a locally just-identified statistical model, implying that all regular asymptotically linear estimators of the treatment effect have the same asymptotic variance, which equals the (trivial) semiparametric efficient variance bound. In contrast, the latter two models involve nonparametric endogeneity and are naturally locally overidentified; consequently, some doubly robust orthogonal moment estimators of the average treatment effect are inefficient. Whereas existing work typically imposes strong conditions to restore local just-identification to justify the efficiency of their doubly robust orthogonal moment estimators, we characterize the semiparametric efficient variance bounds, along with efficient estimators, for the (locally) overidentified models (2) and (3). A small real data application, along with a simulation study, illustrates the semiparametric efficiency gains in model (3)

Abstract

We study identification and inference in first-price auctions with risk-averse bidders and selective entry, building on a flexible framework we call the Affiliated Signal with Risk Aversion (AS-RA) model. Assuming exogenous variation in either the number of potential bidders (N) or a continuous instrument (z) shifting opportunity costs of entry, we provide a sharp characterization of the nonparametric restrictions implied by equilibrium bidding. This characterization implies that risk neutrality is nonparametrically testable. In addition, with sufficient variation in both N and z, the AS-RA model primitives are nonparametrically identified (up to a bounded constant) on their equilibrium domains. Finally, we explore new methods for inference in set-identified auction models based on Chen et al. (2018, Econometrica, vol. 86, 1965–2018), as well as novel and fast computational strategies using Mathematical Programming with Equilibrium Constraints. Simulation studies reveal the good finite-sample performance of our inference methods, which can readily be adapted to other set-identified flexible equilibrium models with parameter-dependent support.

Abstract

This paper proposes a framework for the global optimization of possibly multimodal continuous functions on bounded domains. The authors show that global optimization is equivalent to optimal strategy formation in a two-armed decision model with known distributions, based on a strategic law of large numbers. They establish asymptotically optimal strategies and introduce a class of Strategic Monte Carlo Optimization (SMCO) algorithms that rely on sign-based decisions rather than gradient magnitudes. Theoretical results provide local and global convergence guarantees, and extensive numerical experiments demonstrate strong performance of the proposed algorithms in high-dimensional and challenging optimization settings.

Abstract

We propose a new formulation of the maximum score estimator that uses compositions of rectified linear unit (ReLU) functions, instead of indicator functions as in Manski (1975, 1985), to encode the sign alignment restrictions. Since the ReLU function is Lipschitz, our new ReLU-based maximum score criterion function is substantially easier to optimize using standard gradient-based optimization pacakges. We also show that our ReLU-based maximum score (RMS) estimator can be generalized to an umbrella framework defined by multi-index single-crossing (MISC) conditions, while the original maximum score estimator cannot be applied. We establish the n −s/(2s+1) convergence rate and asymptotic normality for the RMS estimator under order-s Holder smoothness. In addition, we propose an alternative estimator using a further reformulation of RMS as a special layer in a deep neural network (DNN) architecture, which allows the estimation procedure to be implemented via state-of-the-art software and hardware for DNN.

Abstract

This paper proposes a novel framework for the global optimization of a continuous function in a bounded rectangular domain. Specifically, we show that: (1) global optimization is equivalent to optimal strategy formation in a two-armed decision problem with known distributions, based on the Strategic Law of Large Numbers we establish; and (2) a sign-based strategy based on the solution of a parabolic PDE is asymptotically optimal. Motivated by this result, we propose a class of Strategic Monte Carlo Optimization (SMCO) algorithms, which uses a simple strategy that makes coordinate-wise two-armed decisions based on the signs of the partial gradient (or practically the first difference) of the objective function, without the need of solving PDEs. While this simple strategy is not generally optimal, it is sufficient for our SMCO algorithm to converge to a local optimizer from a single starting point, and to a global optimizer under a growing set of starting points. Numerical studies demonstrate the suitability of our SMCO algorithms for global optimization well beyond the theoretical guarantees established herein. For a wide range of test functions with challenging landscapes (multi-modal, non-differentiable and discontinuous), our SMCO algorithms perform robustly well, even in high-dimensional (d = 200 ∼ 1000) settings. In fact, our algorithms outperform many state-of-the-art global optimizers, as well as local algorithms augmented with the same set of starting points as ours.

Abstract

We propose SLIM (Stochastic Learning and Inference in overidentified Models), a scalable stochastic approximation framework for nonlinear GMM. SLIM forms iterative updates from independent mini-batches of moments and their derivatives, producing unbiased directions that ensure almost-sure convergence. It requires neither a consistent initial estimator nor global convexity and accommodates both fixed-sample and random-sampling asymptotics. We further develop an optional second-order refinement and inference procedures based on random scaling and plug-in methods, including plug-in, debiased plug-in, and online versions of the Sargan–Hansen J-test tailored to stochastic learning. In Monte Carlo experiments based on a nonlinear EASI demand system with 576 moment conditions, 380 parameters, and n = 105 , SLIM solves the model in under 1.4 hours, whereas full-sample GMM in Stata on a powerful laptop converges only after 18 hours. The debiased plug-in J-test delivers satisfactory finite-sample inference, and SLIM scales smoothly to n = 106.

Abstract

This paper studies nonparametric local (over-)identification, in the sense of Chen and Santos (2018), and the associated semiparametric efficiency in modern causal frameworks. We develop a unified approach that begins by translating structural models with latent variables into their induced statistical models of observables and then analyzes local overidentification through conditional moment restrictions. We apply this approach to three leading models: (i) the general treatment model under unconfoundedness, (ii) the negative control model, and (iii) the long-term causal inference model under unobserved confounding. The first design yields a locally just-identified statistical model, implying that all regular asymptotically linear estimators of the treatment effect share the same asymptotic variance, equal to the (trivial) semiparametric efficiency bound. In contrast, the latter two models involve nonparametric endogeneity and are naturally locally overidentified; consequently, some doubly robust orthogonal moment estimators of the average treatment effect are inefficient. Whereas existing work typically imposes strong conditions to restore just-identification before deriving the efficiency bound, we relax such assumptions and characterize the general efficiency bound, along with efficient estimators, in the overidentified models (ii) and (iii).

Abstract

This paper studies the semiparametric estimation and inference of integral functionals on submanifolds, which arise naturally in a variety of econometric settings. For linear integral functionals on a regular submanifold, we show that the semiparametric plugin estimator attains the minimax-optimal convergence rate $n — \frac{s}{2 s + d - m}$ , where s is the Hölder smoothness order of the underlying nonparametric function, d is the dimension of the first-stage nonparametric estimation, m is the dimension of the submanifold over which the integral is taken. This rate coincides with the standard minimax-optimal rate for a (d − m)-dimensional nonparametric estimation problem, illustrating that integration over the m-dimensional manifold effectively reduces the problem’s dimensionality. We then provide a general asymptotic normality theorem for linear/nonlinear submanifold integrals, along with a consistent variance estimator. We provide simulation evidence in support of our theoretical results.

Abstract

This paper investigates efficient Difference-in-Differences (DiD) and Event Study (ES) estimation using short panel data sets within the heterogeneous treatment effect framework, free from parametric functional form assumptions and allowing for variation in treatment timing. We provide an equivalent characterization of the DiD potential outcome model using sequential conditional moment restrictions on observables, which shows that the DiD identification assumptions typically imply nonparametric overidentification restrictions. We derive the semiparametric efficient influence function (EIF) in closed form for DiD and ES causal parameters under commonly imposed parallel trends assumptions. The EIF is automatically Neyman orthogonal and yields the smallest variance among all asymptotically normal, regular estimators of the DiD and ES parameters. Leveraging the EIF, we propose simple-to-compute efficient estimators. Our results highlight how to optimally explore different pre-treatment periods and comparison groups to obtain the tightest (asymptotic) confidence intervals, offering practical tools for improving inference in modern DiD and ES applications even in small samples. Calibrated simulations and an empirical application demonstrate substantial precision gains of our efficient estimators in finite samples.

Abstract

We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest α-quantile for some α P p0, 1q. We focus on the offline setting whose generating process involves unobserved confounders. Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset. To address these challenges, we propose a suite of causal-assisted policy learning methods that provably enjoy strong theoretical guarantees under mild conditions. In particular, to address (i) and (ii), using causal inference tools such as instrumental variables and negative controls, we propose to estimate the quantile objectives by solving nonlinear functional integral equations. Then we adopt a minimax estimation approach with nonparametric models to solve these integral equations, and propose to construct conservative policy estimates that address (iii). The final policy is the one that maximizes these pessimistic estimates. In addition, we propose a novel regularized policy learning method that is more amenable to computation. Finally, we prove that the policies learned by these methods are Õ(n^-1/2) quantile-optimal under a mild coverage assumption on the offline dataset. Here, Õ(·) omits poly-logarithmic factors. To the best of our knowledge, we propose the first sample-efficient policy learning algorithms for estimating the quantile-optimal policy when there exist unmeasured confounding.

Abstract

We introduce a new class of algorithms, stochastic generalized method of moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin–Wu–Hausman and Sargan–Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well-known empirical examples with large sample sizes.

Abstract

We introduce two data-driven procedures for optimal estimation and inference in nonparametric models using instrumental variables. The first is a data-driven choice of sieve dimension for a popular class of sieve two-stage least-squares estimators. When implemented with this choice, estimators of both the structural function h₀ and its derivatives (such as elasticities) converge at the fastest possible (i.e. minimax) rates in sup-norm. The second is for constructing uniform confidence bands (UCBs) for h₀ and its derivatives. Our UCBs guarantee coverage over a generic class of data-generating processes and contract at the minimax rate, possibly up to a logarithmic factor. As such, our UCBs are asymptotically more efficient than UCBs based on the usual approach of undersmoothing. As an application, we estimate the elasticity of the intensive margin of firm exports in a monopolistic competition model of international trade. Simulations illustrate the good performance of our procedures in empirically calibrated designs. Our results provide evidence against common parameterizations of the distribution of unobserved firm heterogeneity.

Abstract

We develop a state-space model with a transition equation that takes the form of a functional vector autoregression (VAR) and stacks macroeconomic aggregates and a cross-sectional density. The measurement equation captures the error in estimating log densities from repeated cross-sectional samples. The log densities and their transition kernels are approximated by sieves, which leads to a finite-dimensional VAR for macroeconomic aggregates and sieve coefficients. With this model, we study the dynamics of technology shocks, GDP (gross domestic product), employment, and the earnings distribution. We find that spillovers between aggregate and distributional dynamics are generally small, that a positive technology shock tends to decrease inequality, and that a shock that raises earnings inequality leads to a small and insignificant GDP response.