We introduce two data-driven procedures for optimal estimation and inference in nonparametric models using instrumental variables. The first is a data-driven choice of sieve dimension for a popular class of sieve two-stage least-squares estimators. When implemented with this choice, estimators of both the structural function h0 and its derivatives (such as elasticities) converge at the fastest possible (i.e. minimax) rates in sup-norm. The second is for constructing uniform confidence bands (UCBs) for h0 and its derivatives. Our UCBs guarantee coverage over a generic class of data-generating processes and contract at the minimax rate, possibly up to a logarithmic factor. As such, our UCBs are asymptotically more efficient than UCBs based on the usual approach of undersmoothing. As an application, we estimate the elasticity of the intensive margin of firm exports in a monopolistic competition model of international trade. Simulations illustrate the good performance of our procedures in empirically calibrated designs. Our results provide evidence against common parameterizations of the distribution of unobserved firm heterogeneity.
We analyze how market segmentation affects consumer welfare when a monopolist can engage in both second-degree price discrimination (through product differentiation) and third-degree price discrimination (through market segmentation). We characterize the consumer-optimal market segmentation and show that it has several striking properties: (1) the market segmentation displays monotonicity—higher-value customers always receive higher quality product than lower-value regardless of their segment and across any segment; and (2) when aggregate demand elasticity exceeds a threshold determined by marginal costs, no segmentation maximizes consumer surplus. Our results demonstrate that strategic market segmentation can benefit consumers even when it enables price discrimination, but these benefits depend critically on demand elasticities and cost structures. The findings have implications for regulatory policy regarding price discrimination and market segmentation practices.
We study mechanism design when agents hold private information about both their preferences and a common payoff-relevant state. We show that standard message-driven mechanisms cannot implement socially efficient allocations when agents have multidimensional types, even under favorable conditions.
To overcome this limitation, we propose data-driven mechanisms that leverage additional post-allocation information, modeled as an estimator of the pay-off relevant state. Our data-driven mechanisms extend the classic Vickrey-Clarke-Groves class. We show that they achieve exact implementation in posterior equilibrium when the state is either fully revealed or the utility is linear in an unbiased estimator. We also show that they achieve approximate implementation with a consistent estimator, converging to exact implementation as the estimator converges, and present bounds on the convergence rate. We demonstrate applications to digital advertising auctions and large language model (llm) - based mechanisms, where user engagement naturally reveals relevant information.
This research examines the determinants of entrepreneurship in China’s transition from agriculture to domestic production in the 1990’s and the subsequent transition to exporting in the 2000’s. The model that we develop and test to describe these transitions incorporates a productivity enhancing role for community (birth county) networks, which emerge in response to market imperfections at early stages of economic development. Using administrative data covering the universe of registered firms over the 1994-2012 period and the universe of exporters over the 2002-2012 period, we provide causal evidence that these networks of firms were active and were effective at increasing the revenues of their members, both in domestic production and exporting. While this substantially increased the number of domestic producers in the first stage, the incumbent domestic networks created a disincentive to enter exporting in the second stage that dominated the positive effect of the export networks. Our analysis provides a novel characterization of the development process in which community-based networks emerge at each stage to facilitate the occupational mobility of their members, and pre-existing networks slow down the growth of the networks that follow.
This research provides a status-based explanation for the high rates of female labor force non-participation (FLFNP) and the sustained increase in these rates over time that have been documented in many developing economies. This explanation is based on the idea that households or ethnic groups can signal their wealth, and thereby increase their social status, by withdrawing women from the labor force. If the value of social status or the willingness to bear the signaling cost is increasing with economic development, then this would explain the persistent increase in FLFNP. To provide empirical support for this argument, we utilize two independent sources of exogenous variation – across Indian districts in the cross-section and within districts over time – to establish that status considerations determine rural FLFNP. Our status-based model, which is used to derive the preceding tests, is able to match the high levels and the increase in rural Indian FLFNP that motivate our analysis. Counterfactual simulations of the estimated model indicate that conventional development policies, such as a reduction in the cost of female education, could raise FLFNP by increasing potential household incomes and, hence, the willingness to compete for social status. The steep increase in female education in recent decades could paradoxically have increased FLFNP in India even further.
We study agents who are more likely to remember some experiences than others but update beliefs as if the experiences they remember are the only ones that occurred. To understand the long-run effects of selective memory, we propose selective-memory equilibrium. We show that if the agent’s behavior converges, their limit strategy is a selective-memory equilibrium, and we provide a sufficient condition for behavior to converge. We use this equilibrium concept to explore the consequences of several well-documented biases. We also show that there is a close connection between selective-memory equilibria and the outcomes of misspecified learning.
In this paper, we explore a scenario where a sender provides an information policy and a receiver, upon observing a realization of this policy, decides whether to take a particular action, such as making a purchase. The sender’s objective is to maximize her utility derived from the receiver’s action, and she achieves this by careful selection of the information policy. Building on the work of Kleiner et al., our focus lies specifically on information policies that are associated with power diagram partitions of the underlying domain. To address this problem, we employ entropy-regularized optimal transport, which enables us to develop an efficient algorithm for finding the optimal solution. We present experimental numerical results that highlight the qualitative properties of the optimal configurations, providing valuable insights into their structure. Furthermore, we extend our numerical investigation to derive optimal information policies for monopolists dealing with multiple products, where the sender discloses information about product qualities.
It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates as covariates in a regression model. We show both theoretically and empirically that naively treating AI- and ML-generated variables as “data” leads to biased estimates and invalid inference. We propose two methods to correct bias and perform valid inference: (i) an explicit bias correction with bias-corrected confidence intervals, and (ii) joint maximum likelihood estimation of the regression model and the variables of interest. Through several applications, we demonstrate that the common approach generates substantial bias, while both corrections perform well.
We develop a state-space model with a transition equation that takes the form of a functional vector autoregression (VAR) and stacks macroeconomic aggregates and a cross-sectional density. The measurement equation captures the error in estimating log densities from repeated cross-sectional samples. The log densities and their transition kernels are approximated by sieves, which leads to a finite-dimensional VAR for macroeconomic aggregates and sieve coefficients. With this model, we study the dynamics of technology shocks, GDP (gross domestic product), employment, and the earnings distribution. We find that spillovers between aggregate and distributional dynamics are generally small, that a positive technology shock tends to decrease inequality, and that a shock that raises earnings inequality leads to a small and insignificant GDP response.
A common tactic to estimate willingness-to-travel exploits variation in the relative proximity of consumers to supplier locations. The validity of these estimates relies on the exogeneity of that consumer-supplier distance. We argue that distance to suppliers is endogenous because suppliers strategically choose locations to target consumers; we introduce a novel instrument to address this form of endogeneity. Using geolocation data from millions of smartphones, we estimate consumer preferences for specific retail chains across income groups and regions. We show that accounting for distance endogeneity significantly alters willingness-to-travel measures. Contrary to the prevailing “retail apocalypse” narrative, we find that consumer surplus per trip to general merchandise stores did not significantly decline from 2010 to 2019. For the lowest-income consumers, the expansion of national chains, particularly dollar stores, nearly compensates for the closure of traditional department stores and regional chains. Notably, failing to account for distance endogeneity leads to the erroneous conclusion that lower-income households experienced statistically significant consumer surplus declines.