11Jun

Cognitive Collapse: How Advanced AIs Bungled Round 3 of the Fantasy Cricket League

The 2026 Women’s Cricket World Cup is upon us and I’m having the AIs play a fantasy league.

By the third bidding round of our AI-driven fantasy league, two of the eight AI managers – Anthropic’s Claude and Google’s Gemini Extended Thinking – had already finished assembling their 15-player squads. They sat out this round of bids.

That left six other AIs scrambling to fill the last slots on their rosters. Each had to reckon with leftover budget constraints and specific roster holes they’d failed to plug in earlier rounds.

For the six AIs still in the fray (OpenAI’s ChatGPT and ChatGPT Deep Research, Microsoft’s Copilot and Copilot Researcher, Google’s base Gemini, and Grok), Round 3 was a test of adaptability. Would they learn from past mistakes and fill their remaining needs?

A Failure

ChatGPT (base), for one, entered Round 3 with a glaring deficiency: it still hadn’t acquired a single specialist batter in the prior rounds, even though every fantasy XI needs at least one. While it managed to get a full squad of 15, none of those 15 included a batter. That meant that it was unable to field a legal playing XI and would have to enter a fourth round of bidding.

A Disgrace

What was worse, however, were the bids submitted by ChatGPT Deep Research and by Copilot Researcher. Both of these spend more time thinking about the problem and so would be expected to outperform in the league.

At least in this round of bidding the opposite happened: they seemed to get confused by what was going on and submitted a host of bids for players that had already been won in previous bidding rounds. None of their bids were valid and so they ended this bidding round exactly where they started – but with even fewer quality players available for the next round of bids.

Almost Redemption?

Nearly half of Copilot’s Round 2 bids had to be discarded as invalid because it forgot rivals had already signed those players. It did a little better this time, though made the same mistake again – three of its eight bids were for players already

Moreover it continued to struggle with arithmetic, submitting bids were more than its budget but in an interesting twist it admitted that it had made a mistake…and noted that the league manager would re-baseline its bids to be within budget. Instead of humans using the efforts of AI to solve complicated optimization problems, in this case the AI was expecting the human to do the computational work for it. And it was right! I did.

Divergent Tactics (and Convergent Targets)

Without the two big spenders in the mix, the final round saw less direct competition on each player and more divergence in who each AI went after. In the first round, all eight AIs tended to converge on the same famous names, chasing the obvious superstars like Ashleigh Gardner, Beth Mooney, and Hayley Matthews. By Round 3, however, most of the household names were already taken. Each AI manager’s shopping list now reflected their unique roster needs and risk appetites.

Gemini entered Round 3 with only one spot left to fill – and a sizable budget still in the bank. But instead of splurging all that on a single big-name coup (like a human drafter might), Gemini stuck to the prudent playbook: it targeted a competent unsung bowler (England’s Linsey Smith). The $927 bid made her the fourth most expensive buy of the league so far – and it was likely an overbid given no one else submitted one for her.

Meanwhile, Grok, a somewhat unpredictable AI, had both a big war chest and numerous gaps (needing five players). True to its personality, Grok took an aggressive stance. In its pre-bid analysis, it listed an eclectic current roster heavy on emerging talents (like young players from smaller cricket nations) and pledged to “bid aggressively on high-value unsigned talent” in Round 3. Grok indeed fired off a volley of bids – from proven all-rounders to backup wicketkeepers – overshooting its budget in the process. This repeated a pattern from earlier rounds: Grok had similarly over-committed funds in Round 2, forcing an ex-post correction by the organizer. The silver lining: Grok did land enough pieces (like India’s Bharti Fulmali and Scotland’s Darcey Carter) to finish with a full squad of 15.

Humans Still Have An Edge

The third round showed that the AIs are not all-knowing or infallible. Some AIs struggled to update their knowledge of the game state, or to perform simple budget math without error. Others showed glimpses of high-level planning – performing competitor analysis and adjusting their bidding volumes – yet still misjudged risk or traded one mistake for another.

the similarities among these systems are as revealing as their differences. All of them remained disciplined in spending; none exhibited the kind of wild emotional bidding spurts that human participants did in a parallel league, where managers would drop over $2,000 on a single favorite player. Even in a last-ditch scramble, the AIs largely stuck to rational budgets – highlighting an almost inhuman consistency in their financial restraint.

What’s Next

There’s going to have to be a fourth round of bids so that both ChatGPT Deep Research and Copilot Researcher can end with a squad of 15, and ChatGPT can find a bowler.

Team Name	Remaining Budget	Squad	Batters in Squad	Bowlers in Squad	WK in Squad
Grok	$3,283	15	6	2	2
Copilot	$4,186	15	4	7	3
ChatGPT Deep Research	$5,081	10	1	2	2
ChatGPT	$1,622	15	0	5	2
Copilot Researcher	$3,790	11	3	2	2
Gemini	$2,456	15	2	6	3
Gemini Extended Thinking	$3,195	15	3	8	1
Claude	$3,157	15	4	1	1

AI Fantasy League Women's T20 World Cup