Why Advanced AI Isn’t Always Better: Evidence from a Fantasy League
We’re trying out a Women’s Cricket World Cup fantasy league for the world’s top AIs. We’re into the fourth round of bidding.
Two advanced models – Microsoft’s Copilot Researcher and OpenAI’s ChatGPT Deep Research – submitted nothing but invalid bids in Round 3, repeatedly targeting players who’d already been drafted. Meanwhile, ChatGPT (base model), though finishing Round 3 with a full 15-player roster, hadn’t managed to win any batters, making it unable to field a legal XI. These three AIs competed in this final round of bidding before the tournament kicked off.
With fewer rivals in play and much of the talent pool already claimed, Round 4 presented a unique mirror for the AIs’ cognitive quirks. How would they handle a simpler but high-stakes scenario?
Copilot Researcher: When Thinking Harder Isn’t Smarter
Shockingly, the advanced planner fumbled again, despite being the only AI in the league for which I have a paid subscription (through my company).
Nine of its ten bids were once again invalid – aimed at players already owned by other teams. This was all the more disappointing because in the initial prompt I had specifically asked it to make sure it not make the same mistake again.
I would not run a fifth round of bidding. The league rules give me the right, after a certain number of bids, to just assign players to teams that aren’t able to build a squad of 15. I was forced to do so.
The experience has revealed something interesting about AI cognition: more elaborate reasoning doesn’t guarantee better results. Copilot Researcher’s performance suggests that “thinking hard” can turn into overthinking or misdirected thinking. Perhaps it got tangled in its own complex approach or “stale” reference data. In humans too overly complex reasoning can lead to maladaptive outcomes.
ChatGPT Deep Research: A Methodical Mind Redeems Itself
ChatGPT Deep Research is designed for slower, more detailed reasoning, and like Copilot Researcher, it botched Round 3 by fixating on players already off the board.
Its cerebral response was insightful. it cited generic fantasy auction best practices (don’t overspend more than ~22% of budget on one player, focus on mid-tier “value” picks) and recapped its own roster’s strengths and weaknesses. Acknowledging it already had plenty of all-rounders and wicketkeepers it vowed to focus new bids on true batters and bowlers to achieve a balanced team.
The AI bid on a cluster of lesser-known but capable players from around the cricket world, successfully winning enough to form a full squad of 15.
ChatGPT (Base Model): Simple, Fast – and Finally in the Clear
For Round 4, I informed ChatGPT that I would only accept bids for unsigned batters, and that once it had won a batter it would have to decide which current player to drop to make room for the incoming batter.
With just two other AIs as competition and only one new player needed, ChatGPT reasoned it should go big on one or two promising batters. ChatGPT’s top bid was an assertive $401 for Kavisha Dilhari.
Now ChatGPT had to decide who it would drop from its squad to make room for Dilhari. It opted to cut Ailsa Lister, a backup wicketkeeper, despite her being its fourth most expensive buy, reasoning that carrying two specialized keepers was a luxury it could dispense with especially since its primary keeper, Sarah Bryce, is considered stronger.
ChatGPT may have overlooked a nuance: there’s no formal limit on keepers in a lineup, but there is a cap on how many bowlers can play, and post-trade it still had a bowler-heavy squad.
Compared to the drama around its peers, ChatGPT’s Round 4 journey was refreshingly uneventful. It’s telling that the fast, base model got it done with minimal fuss.
What’s Next
The first match is less than a day away.
| Team Name | Remaining Budget | Batters in Squad | Bowlers in Squad | WK in Squad |
| Copilot | $4,186 | 4 | 7 | 3 |
| Copilot Researcher | $3,517 | 4 | 5 | 2 |
| Grok | $3,283 | 6 | 2 | 2 |
| Gemini Extended Thinking | $3,195 | 3 | 8 | 1 |
| Claude | $3,157 | 4 | 1 | 1 |
| ChatGPT Deep Research | $3,011 | 3 | 5 | 2 |
| Gemini | $2,456 | 2 | 6 | 3 |
| ChatGPT | $1,221 | 1 | 5 | 1 |
