AI vs AI: Kicking Off My 2026 Women’s T20 Cricket World Cup Fantasy League Experiment
For the 2026 Women’s T20 Cricket World Cup I’m running a fantasy cricket league for my friends. While setting it up I decided it’d be fun to see how today’s leading AI systems would perform if they played as well. I set up a parallel league where the AIs would compete against one another.
Each AI team manager received the exact same instructions, the same player list, the same $10,000 budget, and the same constraints. Their job: submit 20 bids in Round 1.
How the League Works (The Short Version)
The full rules document is… substantial. Seven pages, to be precise. But the core mechanic is straightforward:
- Every team manager (the individual AI service) starts with a fixed budget.
- The league uses a multi‑round auction.
- In each round, managers submit bids for players.
- Only winning bids cost money—unsuccessful bids return to the budget.
Strategy becomes a game of prediction:
- Who will others chase?
- How aggressively should you bid early?
- Do you go for marquee players or hunt for undervalued gems?
The AI Competitors
For Round 1, I invited a broad mix of AI systems.
ChatGPT (Base Model)
ChatGPT produced a reasonable list of 20 bids but immediately stumbled on the math. It claimed its bids totaled $9,480, but the real sum was $10,480—over budget. Fortunately, my rules include a normalization step for exactly this scenario, because humans make these mistakes too.
Its stated strategy: focus on elite all‑rounders and high‑impact players.
ChatGPT Deep Research
This version took its time and produced a more detailed explanation of its reasoning. It also included player names alongside IDs. While it was useful to be able to confirm that the AI system was mapping the player IDs with the names correctly, it copied in a way that made it difficult to put into my Excel. I had to use Excel’s own CoPilot system to make the data Excel-ready.
It said its strategy leaned heavily on historical WT20I stats and marquee performers.
Claude 4.6 (Sonnet)
Claude was the most disciplined of the bunch. It got the math right, stayed under budget, and provided a thoughtful breakdown of why it chose each player. It decided to keep a healthy reserve for later rounds.
Gemini 3.5 Flash
Gemini also made a math error, though in the opposite direction: it underestimated its own total. It claimed $9,985; the real number was $9,633. Still legal, so I accepted it.
Its stated strategy emphasized marquee all‑rounders and unique bid values to avoid ties.
Gemini Extended Thinking
This version produced a more structured explanation and kept its bids safely under budget at $9,821. It also highlighted its compliance with squad‑balance constraints.
Grok (Free Tier)
Grok’s bids were the most chaotic. It claimed a total of $7,950, but the actual sum exceeded $10,000. Like ChatGPT, it required normalization. Conspiracy theorists might wonder if Grok was trying to cheat the system, given its reputation, but I suspect, this is just another example of LLMs struggling with arithmetic.
Copilot (Consumer Version)
Copilot is an odd one because it draws upon ChatGPT and Claude. Still my understanding is that Microsoft’s own algorithms also play a role, so I decided to include it for evaluation purposes.
Copilot also miscalculated its total—claiming $9,970 when the real sum was $9,520—but at least it stayed under budget. Its explanation emphasized a balanced squad with strong wicket‑keepers and all‑rounders.
Copilot Researcher (Corporate Subscription)
My company gives me a Copilot account so I thought I’d try out the “Research” option that I think behaves a little like ChatGPT’s “Deep Research” mode.
This one behaved differently from all the others. Before bidding, it asked clarifying questions about strategy preferences—risk tolerance, marquee prioritization, squad balance. After I told it to “go ahead” without any input from me it produced a clean, mathematically correct set of bids and a detailed rationale grounded in scoring mechanics and tournament progression.
This is the only model that said it proactively tried to optimize based on the scoring system rather than just player reputation.
MetaAI
I couldn’t get past the email verification step. No code ever arrived. They probably prefer that people sign in through Facebook or Instagram. I’ll revisit it for future leagues, but for now, MetaAI sits out the 2026 World Cup.
In the next post, I’ll break down the actual Round 1 bids from each AI system.
