Skip to main content

Chuanhao Li Publications

Discussion Paper
Abstract

Bilateral bargaining under incomplete information provides a controlled testbed for evaluating large language model (LLM) agent capabilities. Bilateral trade demands individual rationality, strategic surplus maximization, and cooperation to realize gains from trade. We develop a structured bargaining environment in which LLMs negotiate via tool calls within an event-driven simulator, separating binding offers from natural-language messages to enable automated evaluation. The environment serves two purposes: as a benchmark for frontier models and as a training environment for open-weight models via reinforcement learning. In benchmark experiments, a round-robin tournament among five frontier models (15,000 negotiations) reveals that effective strategies implement price discrimination through sequential offers. Aggressive anchoring, calibrated concession, and temporal patience are associated with both the highest surplus share and the highest deal rate. Accommodating strategies that concede quickly disable price discrimination in the buyer role, yielding the lowest surplus capture and deal completion. Strategically competent models scale their behavior proportionally to item value, maintaining consistent performance across price tiers; weaker models perform well only when wide zones of possible agreement compensate for suboptimal strategies. In training experiments, we fine-tune Qwen3 (8B, 14B) via supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) against a fixed frontier opponent. The two stages optimize competing objectives: SFT approximately doubles surplus share but reduces deal rates, while RL recovers deal rates but erodes surplus gains—a tension traceable to the reward structure. SFT also compresses surplus variation across price tiers, and this compression generalizes to opponents unseen during training, suggesting that behavioral cloning instills proportional strategies rather than memorized price points.

Discussion Paper
Abstract

Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. However, their application in strategic multi-agent decision-making environments is hampered by significant limitations including poor mathematical reasoning, difficulty in following instructions, and a tendency to generate incorrect information. These deficiencies hinder their performance in strategic and interactive tasks that demand adherence to nuanced game rules, long-term planning, exploration in unknown environments, and anticipation of opponents’ moves. To overcome these obstacles, this paper presents a novel LLM agent framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities. We deploy the tools in a number of economically important environments, in particular bilateral bargaining and multi-agent and dynamic mechanism design. We employ quantitative metrics to assess the framework’s performance in various strategic decision-making problems. Our findings establish that our enhanced framework significantly improves the strategic decision-making capability of LLMs. While we highlight the inherent limitations of current LLM models, we demonstrate the improvements through targeted enhancements, suggesting a promising direction for future developments in LLM applications for interactive environments.