Skip to main content

Center for Algorithms, Data, and Market Design at Yale (CADMY)

CADMY is an innovative research center working at the intersection of computer science, economics, and data science. The Center aims to support Yale faculty and students with their research in relevant areas and will serve as a platform to host visiting faculty and postdoctoral fellows, promoting ongoing academic engagement and advancement.

With the arrival of the Internet, including rapid increases in the capacity to transmit, communicate and process data and information, algorithms and data have become central objects of interest in computer science, data science, and economics. Data and digital information have become essential for the allocation and distribution of services and commodities worldwide, which includes the design of markets and resource allocation mechanisms.

From traffic navigation apps to social networks, algorithms and data have become essential. Even with the arrival of large language models that build algorithms on massive data sets, these developments in artificial intelligence have only recently accelerated. The question of how to collect, aggregate, and disseminate data among diverse individuals in a decentralized society is critical for the functioning of democracy, as well as fair and efficient markets.

CADMY’s goal is to initiate and support research and teaching around the fundamental questions that arise at the intersection of computer science, data science, economics, and computational social sciences. CADMY aims to support Yale faculty and students with their research in relevant areas and will serve as a platform to host visiting faculty and postdoctoral fellows, promoting ongoing academic engagement and advancement.  

For more information about CADMY and research areas, please visit cadmy.yale.edu.

Latest Publications

Discussion Paper
Abstract

To meet voluntary climate targets, firms often complement internal decarbonization efforts by purchasing carbon credits in the voluntary carbon market (VCM), which finance projects that reduce emissions elsewhere. However, these emissions reductions are difficult to verify, and growing evidence of overcrediting has cast doubt on the VCM's potential to genuinely offset emissions. We investigate how the VCM's defining features shape its climate effectiveness. Our model captures three central elements: adverse selection, as high-quality projects that truly reduce emissions are costlier yet difficult to distinguish from low-quality ones; imperfect third-party certification, as projects are screened based on a noisy signal of quality; and buyer preferences for non-carbon attributes, as some firms value credits that generate observable social or economic co-benefits beyond reducing emissions. We show that the market fails to sustain trade if certification is sufficiently noisy, as quality uncertainty erodes buyer confidence and triggers a market-for-lemons collapse. However, demand for co-benefits can sustain markets that would otherwise collapse. Yet in such cases, the market remains active but yields limited carbon abatement, as most traded credits are low-quality. We then examine policy and market design interventions reflecting recent developments in practice, such as penalizing buyers for greenwashing and offering credit portfolios. We show that these measures can be counterproductive for carbon mitigation if certification remains inaccurate. Accordingly, we demonstrate that the certifier’s incentives for accuracy can be strengthened by modifying its fee structure so that its revenue is tied to the market value rather than the volume of credits.

Discussion Paper
Abstract

Conversational recommender systems powered by generative AI can enhance personalization by facilitating information elicitation through follow-up questions. However, engaging in these conversations imposes a communication cost on users. As platforms with different objectives and monetization models deploy these systems, a central question is: how does the platform’s objective and sellers’ strategic response shape the design of these systems in terms of their elicitation strategy? We develop a parsimonious model of conversational elicitation in which interaction generates noisy preference information and imposes a communication cost borne by the user. A user-welfare-maximizing platform elicits more information when accurate niche matching yields large gains, even when niche users are rare. In contrast, under a conversion objective, for the same setting, the optimal strategy is to immediately recommend the same mainstream option to all users with no or minimal preference elicitation because the incremental conversion benefit from improved matching is bounded, while communication costs are borne by all users. When prices are endogenous and the platform earns a commission, increased elicitation is again optimal because improved screening raises equilibrium prices and platform revenue; however, these price responses can counteract consumer benefits and reduce user welfare. The model also highlights that the optimal elicitation intensity increases with preference heterogeneity, helping explain why conversational systems ask more in highly differentiated categories than in low-heterogeneity ones. We complement the theory with a dataset of long-form product queries that vary in length and informational content. Using our dataset and LLM-based user simulation, we quantify how additional information impacts user decisions and demonstrate that the magnitude of this impact depends on the degree of preference heterogeneity. Additionally, this dataset provides a testbed for measuring the (incremental) value of preference elicitation and may be of independent interest.

Discussion Paper
Abstract

How does wartime rebel governance shape post-conflict institutions? We study this in Nepal, where the Maoist People's War (1996–2006) dismantled a 240-year caste-based monarchy and ended with Maoists entering democratic politics. During the conflict, Maoists established sub-national “People’s Governments” that administered justice, collected taxes, and delivered local services. Using a spatial regression-discontinuity design, we show that exposure to People's Governments increased political knowledge and participation especially among historically marginalized indigenous groups (Janajatis). Exposure also reshaped party institutions and inter-party competition: candidate-selection committees in more exposed areas have 26 percent more Janajati members who, drawing on novel implicit-attitude data, exhibit less pro-upper caste bias. Non-Maoist parties' Janajati nomination rates nearly double in fully exposed areas, consistent with competition for newly mobilized voters. Nearly two decades on, local governments in exposed areas score 0.2–0.3 standard deviations higher on state capacity indices and receive 13% more in conditional federal grants. These findings show that when rebel groups enter competitive democratic politics, wartime governance institutions can — through citizen mobilization, party gatekeeping, and cross-party competition — enable a more inclusive and capable post-war state.

Discussion Paper
Abstract

We compare how well agents aggregate information in two repeated social learning environments. In the first setting agents have access to a public data set. In the second they have access to the same data, and also to the past actions of others. Despite the fact that actions contain no additional payoff-relevant information, and despite potential herd behavior, free riding and information overload issues, observing and imitating the actions of others leads agents to take the optimal action more often in the second setting. We also investigate the effect of group size, as well as a setting in which agents observe private data and others’ actions.

Discussion Paper
Abstract

We develop a quantitative macroeconomic theory of child mental health. The theory is grounded in child psychiatry, formalized in a life-cycle heterogeneous agent model of child development, and disciplined using micro data on mental health of children and parents. Intergenerational transmission of mental illness arises due to both biological factors and parental behavior. Parents experiencing mental illness have negative expectations and lose time due to rumination. As a result, they invest less in their child’s mental health. We use the model to evaluate policies designed to improve child mental health. We show that subsidizing mental health treatment for children generates sizable welfare gains.

Discussion Paper
Abstract

As AI systems shift from directing users to content toward consuming it directly, publishers need a new revenue model: charging AI crawlers for content access. This model, called pay-per-crawl, must solve a problem of mechanism selection at scale: content is too heterogeneous for a fixed pricing framework. Different sub-types warrant not only different price levels but different pricing rules based on different unstructured features, and there are too many to enumerate or design by hand. We propose the LM Tree, an adaptive pricing agent that grows a segmentation tree over the content library, using LLMs to discover what distinguishes high-value from low-value items and apply those attributes at scale, from binary purchase feedback alone. We evaluate the LM Tree on real content from a major German technology publisher, using 8,939 articles and 80,451 buyer queries with willingness-to-pay calibrated from actual AI crawler traffic. The LM Tree achieves a 65% revenue gain over a single static price and a 47% gain over two-category pricing, outperforming even the publisher’s own 8-segment editorial taxonomy by 40%—recovering content distinctions the publisher’s own categories miss.

Discussion Paper
Abstract

We study the design of efficient dynamic recommendation systems, such as AI shopping assistants, in which a platform interacts with a user over multiple rounds to identify the most suitable product among those offered by advertisers. Advertisers have multi-dimensional private information: their private value from a purchase and private information about the user’s preferences. In each round, the platform displays recommendations; the user learns product characteristics of the shown items and then chooses whether to purchase, exit without purchasing, or submit a new query. These actions generate a stream of feedback—purchase, exit, and follow-up queries—that is informative about the user’s preferences and can be used both to refine future recommendations and to design contingent transfers. We introduce a class of data-driven dynamic team mechanisms that condition payments on realized user feedback. Our main result shows that data-driven dynamic team mechanisms achieve periodic ex-post implementation of the efficient allocation rule. We then develop variants that guarantee participation and deliver budget surplus, and provide conditions under which these properties can be jointly attained.

Discussion Paper
Abstract

Bilateral bargaining under incomplete information provides a controlled testbed for evaluating large language model (LLM) agent capabilities. Bilateral trade demands individual rationality, strategic surplus maximization, and cooperation to realize gains from trade. We develop a structured bargaining environment in which LLMs negotiate via tool calls within an event-driven simulator, separating binding offers from natural-language messages to enable automated evaluation. The environment serves two purposes: as a benchmark for frontier models and as a training environment for open-weight models via reinforcement learning. In benchmark experiments, a round-robin tournament among five frontier models (15,000 negotiations) reveals that effective strategies implement price discrimination through sequential offers. Aggressive anchoring, calibrated concession, and temporal patience are associated with both the highest surplus share and the highest deal rate. Accommodating strategies that concede quickly disable price discrimination in the buyer role, yielding the lowest surplus capture and deal completion. Strategically competent models scale their behavior proportionally to item value, maintaining consistent performance across price tiers; weaker models perform well only when wide zones of possible agreement compensate for suboptimal strategies. In training experiments, we fine-tune Qwen3 (8B, 14B) via supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) against a fixed frontier opponent. The two stages optimize competing objectives: SFT approximately doubles surplus share but reduces deal rates, while RL recovers deal rates but erodes surplus gains—a tension traceable to the reward structure. SFT also compresses surplus variation across price tiers, and this compression generalizes to opponents unseen during training, suggesting that behavioral cloning instills proportional strategies rather than memorized price points.

Discussion Paper
Abstract

A soft-floor auction asks bidders to accept an opening price to participate in a second-price auction. If no bidder accepts, lower bids are considered using first-price rules. Soft floors are common despite being irrelevant with standard assumptions. When bidders regret losing, soft-floor auctions are more efficient and profitable than standard optimal auctions. Revenue increases as bidders are inclined to accept the opening price to compete in a regret-free second-price auction. Efficiency improves because a soft floor allows for a lower hard reserve, reducing the frequency of no sale. Theory and experiment confirm these motivations from practice.

Discussion Paper
Abstract

Here we provide our solutions to the First Proof questions. We also discuss the best responses from publicly available AI systems that we were able to obtain in our experiments prior to the release of the problems on February 5, 2026. We hope this discussion will help readers with the relevant domain expertise to assess such responses.

Discussion Paper
Abstract

We develop a framework for the optimal pricing and product design of LLMs in which a provider sells menus of token budgets to users who differ in their valuations across a continuum of tasks. Under a homogeneous production technology, we show that users’ high-dimensional type profiles are summarized by a scalar index, reducing the seller’s problem to one-dimensional screening. The optimal mechanism takes the form of committed-spend contracts: buyers pay for a budget that they allocate across token classes priced at marginal cost. We extend the analysis to environments with multiple differentiated models and to competition between a proprietary leader and an open-source fringe, showing that competitive pressure reshapes both the intensive and extensive margins of compute provision. Each element of our theory (token-budget menus, maximum- and minimum-spend plans, multi-model versioning, and linear API pricing) has a direct counterpart in the observed pricing practices of providers such as Anthropic, OpenAI, and GitHub.

Discussion Paper
Abstract

This paper develops a framework in which a multiproduct ecosystem competes
with multiple single-product firms in both price and innovation. The ecosystem
can use data from one product to improve the quality of its other products.
We use the framework to study three regulatory policies aimed at leveling the
playing field. Restricting the ecosystem’s cross-product data usage, or forcing it
to share data with single-product firms, benefits those firms and induces them to
innovate more. However, these policies also dampen the ecosystem’s incentive to
collect data and innovate, potentially raising prices. Consumers are better off only
when single-product firms are sufficiently good at innovating. Facilitating data
exchange between single-product firms via a data cooperative can backfire and
harm them, because it induces the ecosystem to price more aggressively. For both
the data-sharing and data-cooperative policies, there exist data-compensation
schemes such that consumers are better off compared to no regulation.