How AIs Handle Pressure: Round 2 of the Women’s World Cup Fantasy Auction

For the 2026 Women’s Cricket World Cup I’ve been pitting AIs against each other in my fantasy league.

This second round highlighted how each AI responded under pressure, with some adjusting their strategies and others doubling down on their initial styles.

Adaptation Under Pressure: Full Squads vs. Incomplete Plans

Two AI managers clearly emerged as the “closers” of Round 2: Claude and Gemini Extended Thinking. They were the only ones to assemble a complete 15-player rosters by the end of the round.

Claude (from Anthropic) demonstrated dramatic adaptation. Notoriously cautious in Round 1 – where a conservative bidding strategy meant it finished with a paltry 2 players – Claude entered Round 2 as an underdog with by far the largest remaining budget ($9,230). Rather than panic, Claude capitalized on this war chest with an aggressive second-round strategy, submitting a staggering 25 bids – more than any other AI. This “spray-and-pray” approach paid off: Claude won 13 of those bids and filled every last roster slot (the round’s biggest gain in players).

Equally important, it balanced its team: Claude snagged the needed specialist batter, bowler, and a wicketkeeper (winning a bid for NZ’s Polly Inglis for $323 as its sole keeper) to meet all lineup requirements. Claude went from worst to fully prepared by learning from Round 1 – it recognized it had been outbid on top stars and needed a broader net. This pivot from ultra-conservative to decisively aggressive bidding showcased Claude’s adaptive capacity under pressure and a willingness to use up resources to recover ground.

In contrast, Gemini Extended Thinking (Google’s slower, more analytical mode) came into Round 2 in a relatively strong position, having already won 11 players in Round 1. With a still sizable budget of $5,337 left, it needed four more players. True to its methodical nature, Gemini Extended devised a safe, distributed plan: bid on 11 high-value targets – nearly triple the number of players it needed – to maximize the chance of filling all gaps. In contrast to Claude’s “comeback from behind” strategy, Gemini Extended opted for a “cautious consolidation from ahead” approach suitable for its position going into Round 2.

The six AIs that failed to fill their squads often exhibited over-optimism or inflexibility in Round 2. For example, OpenAI’s ChatGPT (base) and ChatGPT Deep Research both started Round 2 having only half their squads filled after Round 1. Yet each only bid on 15 players in Round 2, giving themselves precariously slim margins for error. This turned out to be overly optimistic. Regular ChatGPT won only 5 of its 15 bids, and ChatGPT Deep Research won around the same – leaving both with just 10 players in total at the end of Round 2. In effect, both ChatGPT models misjudged the level of bidding competition: they corrected some Round 1 missteps (each targeted more unsung players and addressed roster holes), but then underestimated how many bids would fail.

ChatGPT realized it needed wicketkeepers (having had none after Round 1) and successfully added two keepers in Round 2. But it failed to draft a single specialist batter in either round, meaning its future playing XI would have a glaring gap. This underlines a certain brittleness in ChatGPT’s planning logic: even when alerted to roster requirements, it mis-prioritized needs and amongst its 15 bids only three were for specialist batters – and all were lost to Claude.

ChatGPT Deep Research spent significant time formulating a reasoned strategy and did adjust its focus after noticing it came out of Round 1 with only all-rounders and one keeper. In Round 2, it wisely shifted to bidding on lots of batters and bowlers to cover those gaps. However, in executing this plan, it submitted bids that totaled more than its entire original budget (beyond even its remaining $7,600), requiring me to once again “normalize” its bids downward. Claude and Grok made the same mistake.

Fast vs. Thorough: How Paired Systems Diverged

Round 2 confirmed that giving an AI more time or context for “deep thinking” doesn’t always guarantee finishing strong. In several pairs of AI systems from the same family, the “fast” vs “deep” approaches produced contrasting outcomes.

Google’s Gemini duo

The base Gemini model and Gemini Extended Thinking had very different experiences in Round 2. The standard Gemini soared in Round 1 with a spree of 14 players, but its Round 2 bids completely struck out – none of the six fallback bids it submitted succeeded. Even a sizable $812 bid for India’s Jemimah Rodrigues wasn’t enough as Claude outgunned it at $893.

In contrast, the Extended Thinking version started Round 2 with 11 players, and ended with the squad of 15 it needed.

OpenAI’s ChatGPT vs. ChatGPT Deep Research

Interestingly, both entered Round 2 with similar deficits (needing ~10 more players each) and both fell short.

Their strategies differed. The Deep Research model went big for players from top tier countries such as England, India, New Zealand, and South Africa, but Claude managed to outbid it five times.

Meanwhile the base model branched out by opting for players from Scotland and Ireland amongst others. It arguably overbid on Scotland’s Sarah Bryce, offering $701 when no other AI put in a bid. But this willingness to bet on players from underdog countries also meant that it beat out Claude, Grok, Copilot Base, and Gemini Extended thinking to win Ireland’s Gaby Lewis and Scotland’s Ailsa Lister.

Microsoft’s Copilot vs. Copilot Researcher

Copilot (the standard version) had an average performance in Round 1 but still needed 7 more players. In Round 2 it peppered the field with bids (attempting 20 players), but nearly half of those were invalid because the AI mis-remembered the updated rosters and bid on players already taken by rivals. After those errors were eliminated, Copilot ended up pushing mostly for bowlers – rightly addressing a deficiency in its squad.

Copilot Researcher distinguished itself with a much more rigorous planning process: it was the only AI to pause and ask clarifying questions before bidding, verifying its Round 1 results and budget understanding.

Its second attempt carefully accounted for competitors’ budgets and tendencies, recognizing that some rivals (like Claude) had more cash and that some (like Gemini) favored certain players. It also explicitly factored in tournament mechanics, targeting players likely to reach later stages (where fantasy points multiply) – a sophisticated forward-looking tactic seemingly unique to Copilot Researcher.

Despite these advantages, the Researcher had only a middle-of-pack remaining budget ($5,790) and its moderate aggression (11 bids for 8 needed players) wasn’t enough to beat wealthier or more daring opponents on some key picks. It won 4 players (including a top-order batter in South Africa’s Tazmin Brits and two bowlers), boosting its total to 11 and still shy of a full squad.

Budget Utilization

With budgets dwindling after Round 1, some AIs spent nearly all they had in Round 2, while others remained frugal. Claude and Gemini Extended not only topped up their rosters but also deployed the vast majority of their remaining funds, finishing with just around $3.2K left each. On the flip side, incomplete teams like ChatGPT, ChatGPT Deep, Grok, and Copilot all ended Round 2 with a hefty $4K–$6K still unspent. These AIs had money left on the table that they failed to convert into players, either by aiming too low in certain bids or by not making enough bids to spend it. Lack of aggression can be as costly as overspending in an auction: unspent budget provides no advantage once the round is over.

Second-Order Reasoning

One valuable adaptation in Round 2 was the ability to think about not just one’s own needs but also the competitors’ moves. Some AIs clearly did this. Claude explicitly noted how it had been outbid by 15–30% in Round 1 and observed that other teams like Gemini were already near full strength. Armed with this knowledge, it adjusted by raising its bid amounts significantly in Round 2. Gemini Extended likewise reasoned that managers like Claude, Grok, and the ChatGPTs held large budgets and would be very aggressive, leading it to a more distributed strategy with a high number of bids. Microsoft’s Copilot (the base version) noted specific patterns like “Gemini’s tendency to overbid Australians” and “Grok’s preference for emerging talent” when explaining its second round plan.

In contrast, other AIs seemed more inward-focused – for instance, ChatGPT mainly emphasized its own roster gaps and undervalued players without delving much into others’ behavior. The AIs that most successfully executed Round 2 (Claude and Gemini Extended) were those that showed the greatest awareness of the competitive landscape, proactively reacting to the known budgets and bidding tendencies of their rivals.

What’s Next

There’ll be a third round of bidding to hopefully get our AI managers to the 15 players they need in their squads.

TeamPlayers in SquadRemaining BudgetBatters in SquadBowlers in SquadWK in Squad
Grok10$5,863312
Copilot11$5,248433
ChatGPT Deep Research10$5,081122
ChatGPT10$4,873022
Copilot Researcher11$3,790322
Gemini14$3,383253
Gemini Extended Thinking15$3,195381
Claude15$3,157411

(All Rounders in squad not shown in table above).