12Jun

Playing XI Selection: How AIs Picked Their Starting Lineups

What if AI played in a fantasy league for the Women’s Cricket World Cup?

With their fantasy squads of 15 players complete, our eight AI team managers faced one last pre-tournament challenge: picking a starting lineup of 11 players (a “Playing XI”) that must adhere to a few basic constraints. Each XI needed to include at least one specialist wicketkeeper, bowler, and batter, but could not have more than five bowlers. These selections would be locked in for the first six matches of the World Cup.

The heavy lifting of the auction was over; now success hinged on reasoning through constraints and opportunities.

Microsoft’s Copilot vs. Copilot Researcher:

The base Copilot once again revealed a brittle memory for details. Just like in prior bid rounds where it forgot which players were already taken, Copilot initially tried to select three players that weren’t even in its final roster of 15. I had to intervene repeatedly to steer it back to eligible choices. Once the errors were fixed, Copilot’s lineup was reasonable, though it notably dropped one of its top English players, Tilly Corteen-Coleman, from the XI. Benching her suggests Copilot’s internal logic either undervalued her immediate utility or simply fumbled a bit when juggling the roster

Copilot Researcher (the more advanced subscription-based mode) handled the assignment with meticulous care and no slip-ups. Recall that this is the “overthinker” that had imploded spectacularly in an earlier bidding round by repeatedly targeting already-taken players – but here it redeemed itself. Copilot Researcher double- and triple-checked everything: it explicitly verified each player against its own roster and role requirements, cross-referenced player IDs to avoid any mix-ups, and even researched the match schedule to prioritize players whose national teams play multiple times in Matches 1–7. Its XI would likely be regarded by most fans of the game as the strongest possible lineup from its squad.

OpenAI’s ChatGPT vs. ChatGPT Deep Research

The base ChatGPT announced it would “field the maximum 5 specialist bowlers” – believing that the Fantasy League scoring system advantaged bowlers. It did stumble once: one of the player IDs it submitted didn’t belong to its team – a small context mix-up – but since it also listed the player’s name, I realized the intended pick and corrected it. Interestingly, ChatGPT ended up benching its two most expensive draft signings (Ireland’s Laura Delany and Leah Paul) in favor of others. This shows a Spock-like logic unclouded by sunk costs or sentiment: it paid a hefty sum for those Irish players in the auction, but if they weren’t the best picks for the initial matches, onto the bench they went. A human manager might agonize over benching pricey stars.

ChatGPT Deep Research incorrectly believed that England and New Zealand play two matches each in this selection windows. The mistake probably won’t hurt it much since England and New Zealand are believed to have amongst the strongest teams. Each selection came with a few lines of justification, citing the player’s experience or a key statistic.

It left out one of its two wicketkeepers (West Indies’ Shemaine Campbelle), which many would consider a surprising exclusion in the context of some of the ones it did pick. Perhaps the model assumed incorrectly that only one keeper can play at a time.

Google’s Gemini vs. Gemini Extended Thinking

The two Gemini variants delivered lineups that few would quibble with. Gemini (Base)’s roster from the auction was so stacked with talent that the main challenge was deciding which 4 players to bench.

Google’s more deliberative mode – made its lineup decisions with a clearly articulated overarching principle: maximize the number of chances to score points in the opening phase. It explicitly prioritized players with a high volume of early opportunities, meaning it looked for marquee batters likely to face a lot of deliveries, bowlers who would reliably bowl their full quota of overs (especially “death” bowlers who bowl at the end of innings when wickets often fall), and any players whose teams had multiple games in Matches 1–7.

It did bench a couple of its own stars –Australia’s Alana King and South Africa’s Ayabonga Khaka, both of whom did well in pre-tournament warm-ups. It’s reasoning indicated that it incorrectly believed that those players had fewer matches during the selection window than the other teams.

Claude and Grok

Claude responded to the lineup prompt by methodically researching the tournament schedule and cross-matching it with every player on its roster. It then framed its lineup selection as a rational optimization problem: which players are lower-value or less likely to score big in this window? The final XI skewed heavily to top batters and all-rounders – for example, it included four specialist batters such as Jemimah Rodrigues and Heather Knight, but only one specialist bowler (India’s Radha Yadav).

For the playing XI, Grok spit out a list of player IDs, most of which didn’t match its own team’s players. It also listed player names, however, and those names did belong to its squad – clearly, some internal ID mapping went awry in its reasoning chain. Fortunately, I could use the names to decipher Grok’s intended XI, and it turned out to be a pretty sensible lineup once corrected. The one eyebrow raiser was the omission of Pakistan’s Ayesha Zafar, in favor of players from a less storied cricketing history.

What’s Next

The toss for the first match, between England and Sri Lanka, is in a few hours.

Team Name	Remaining Budget	Squad	XI	Batters in XI	Bowlers in XI	WK in XI
Copilot	$4,186	15	11	4	4	2
Copilot Researcher	$3,517	15	11	3	3	1
Grok	$3,283	15	11	3	2	1
Gemini Extended Thinking	$3,195	15	11	3	5	1
Claude	$3,157	15	11	4	1	1
ChatGPT Deep Research	$3,011	15	11	2	4	1
Gemini	$2,456	15	11	2	4	2
ChatGPT	$1,221	15	11	1	5	1

12Jun

How Each AI’s Squad Stacks Up Going Into The 2026 Women’s World Cup

The Women’s Cricket World Cup is upon us and I thought it’d be fun to have major AI systems participate in a fantasy league. The squads are set and the opening match is in just a few hours. Let’s see how the different AIs stack up.

Some teams leaned on proven superstars trusting known stats and reputations. Others went hunting for hidden gems, drafting relatively obscure players no one else tried for. For example, Google’s base Gemini model was one of the boldest explorers: roughly 40% of its picks were unique to its squad (like little-known prospects from Bangladesh and Ireland), a signal that Gemini was confident in unearthing undervalued talent. Meanwhile Anthropic’s Claude stuck almost entirely to big-name, powerhouse nations.

ChatGPT (OpenAI Base)

Almost half the squad hails from underdog cricket nations like Ireland and Scotland. If its lesser-known picks sparkle on the pitch, ChatGPT could surprise everyone. But there’s not much star power to fall back on.

ChatGPT Deep Research

It has one of the more balanced teams in the league with a mix of specialist batters and bowlers when other AIs typically favored more of one or the other. Its team features a blend of proven veterans (e.g. New Zealand’s Suzie Bates) and a few prudent value picks (like Netherlands duo Sterre Kalis and Phebe Molkenboer picked cheaply to round out the batting lineup

Claude (Anthropic)

Claude’s final squad is brimming with iconic names. It won England’s Heather Knight and Alice Capsey, and India’s star Jemimah Rodrigues, giving it one of the strongest batting cores on paper. It went the opposite direction for bowling: only one of Claude’s picks is a specialist bowler (India’s Radha Yadav). Instead, Claude stuffed its roster with all-rounders.

Microsoft Copilot (Base)

Copilot won a trio of marquee names – India’s Smriti Mandhana, fellow Indian Harmanpreet Kaur, and South Africa’s Laura Wolvaardt. It also boasts arguably the world’s top bowler in England’s Sophie Ecclestone. The large cluster of specialist bowlers (seven of them) suggests Copilot was fixated on shoring up its bowling.

Microsoft Copilot Researcher

The roster reads like an All-Star team: it outbid everyone to secure Ashleigh Gardner, amongst the world’s top all-rounders, with a $970 splurge (the highest bid by any AI). It also snagged England’s superstar Nat Sciver-Brunt for $920, and Australia’s elite Tahlia McGrath for $870. If those all-star players perform up to their billing, Copilot Researcher might steamroll opponents on sheer quality.

Google Gemini

Gemini’s roster is a quirky mix of star power and surprise names. On the star side, it nabbed Australian legend Ellyse Perry and wicketkeeper-batter Beth Mooney, anchoring the team with championship pedigree. But around those, Gemini has players like Georgia Voll (a young Australian prospect), Rebecca Stokell (Ireland batter), and Kaushini Nuthyangana (Sri Lanka wicketkeeper) who were off most rivals’ radars.

Google Gemini Extended Thinking

The team composition is striking: eight bowlers – more than any other squad. That’s not to say the batting is neglected – it drafted a couple of high-impact batters in Hayley Matthews (West Indies captain) and Chamari Athapaththu (Sri Lanka’s star).

Grok

Six of Grok’s players are specialist batters – the most of any team – indicating a gusto for runs. Conversely, Grok picked only two specialist bowlers, the leanest bowling lineup of the lot. So in terms of build, it’s the polar opposite of Gemini Extended Thinking. Grok cast a wide geographical net. It drafted the likes of Phoebe Litchfield (an up-and-coming Australian batter) for a hefty ~$750, alongside picks like an Indian wicketkeeper Yastika Bhatia and Scotland’s all-rounder Darcey Carter.

What’s Next

The AIs are going to have to select their XIs in advance of the first phase of the tournament.

12Jun

Why Advanced AI Isn’t Always Better: Evidence from a Fantasy League

We’re trying out a Women’s Cricket World Cup fantasy league for the world’s top AIs. We’re into the fourth round of bidding.

Two advanced models – Microsoft’s Copilot Researcher and OpenAI’s ChatGPT Deep Research – submitted nothing but invalid bids in Round 3, repeatedly targeting players who’d already been drafted. Meanwhile, ChatGPT (base model), though finishing Round 3 with a full 15-player roster, hadn’t managed to win any batters, making it unable to field a legal XI. These three AIs competed in this final round of bidding before the tournament kicked off.

With fewer rivals in play and much of the talent pool already claimed, Round 4 presented a unique mirror for the AIs’ cognitive quirks. How would they handle a simpler but high-stakes scenario?

Copilot Researcher: When Thinking Harder Isn’t Smarter

Shockingly, the advanced planner fumbled again, despite being the only AI in the league for which I have a paid subscription (through my company).

Nine of its ten bids were once again invalid – aimed at players already owned by other teams. This was all the more disappointing because in the initial prompt I had specifically asked it to make sure it not make the same mistake again.

I would not run a fifth round of bidding. The league rules give me the right, after a certain number of bids, to just assign players to teams that aren’t able to build a squad of 15. I was forced to do so.

The experience has revealed something interesting about AI cognition: more elaborate reasoning doesn’t guarantee better results. Copilot Researcher’s performance suggests that “thinking hard” can turn into overthinking or misdirected thinking. Perhaps it got tangled in its own complex approach or “stale” reference data. In humans too overly complex reasoning can lead to maladaptive outcomes.

ChatGPT Deep Research: A Methodical Mind Redeems Itself

ChatGPT Deep Research is designed for slower, more detailed reasoning, and like Copilot Researcher, it botched Round 3 by fixating on players already off the board.

Its cerebral response was insightful. it cited generic fantasy auction best practices (don’t overspend more than ~22% of budget on one player, focus on mid-tier “value” picks) and recapped its own roster’s strengths and weaknesses. Acknowledging it already had plenty of all-rounders and wicketkeepers it vowed to focus new bids on true batters and bowlers to achieve a balanced team.

The AI bid on a cluster of lesser-known but capable players from around the cricket world, successfully winning enough to form a full squad of 15.

ChatGPT (Base Model): Simple, Fast – and Finally in the Clear

For Round 4, I informed ChatGPT that I would only accept bids for unsigned batters, and that once it had won a batter it would have to decide which current player to drop to make room for the incoming batter.

With just two other AIs as competition and only one new player needed, ChatGPT reasoned it should go big on one or two promising batters. ChatGPT’s top bid was an assertive $401 for Kavisha Dilhari.

Now ChatGPT had to decide who it would drop from its squad to make room for Dilhari. It opted to cut Ailsa Lister, a backup wicketkeeper, despite her being its fourth most expensive buy, reasoning that carrying two specialized keepers was a luxury it could dispense with especially since its primary keeper, Sarah Bryce, is considered stronger.

ChatGPT may have overlooked a nuance: there’s no formal limit on keepers in a lineup, but there is a cap on how many bowlers can play, and post-trade it still had a bowler-heavy squad.

Compared to the drama around its peers, ChatGPT’s Round 4 journey was refreshingly uneventful. It’s telling that the fast, base model got it done with minimal fuss.

What’s Next

The first match is less than a day away.

Team Name	Remaining Budget	Batters in Squad	Bowlers in Squad	WK in Squad
Copilot	$4,186	4	7	3
Copilot Researcher	$3,517	4	5	2
Grok	$3,283	6	2	2
Gemini Extended Thinking	$3,195	3	8	1
Claude	$3,157	4	1	1
ChatGPT Deep Research	$3,011	3	5	2
Gemini	$2,456	2	6	3
ChatGPT	$1,221	1	5	1

11Jun

Cognitive Collapse: How Advanced AIs Bungled Round 3 of the Fantasy Cricket League

The 2026 Women’s Cricket World Cup is upon us and I’m having the AIs play a fantasy league.

By the third bidding round of our AI-driven fantasy league, two of the eight AI managers – Anthropic’s Claude and Google’s Gemini Extended Thinking – had already finished assembling their 15-player squads. They sat out this round of bids.

That left six other AIs scrambling to fill the last slots on their rosters. Each had to reckon with leftover budget constraints and specific roster holes they’d failed to plug in earlier rounds.

For the six AIs still in the fray (OpenAI’s ChatGPT and ChatGPT Deep Research, Microsoft’s Copilot and Copilot Researcher, Google’s base Gemini, and Grok), Round 3 was a test of adaptability. Would they learn from past mistakes and fill their remaining needs?

A Failure

ChatGPT (base), for one, entered Round 3 with a glaring deficiency: it still hadn’t acquired a single specialist batter in the prior rounds, even though every fantasy XI needs at least one. While it managed to get a full squad of 15, none of those 15 included a batter. That meant that it was unable to field a legal playing XI and would have to enter a fourth round of bidding.

A Disgrace

What was worse, however, were the bids submitted by ChatGPT Deep Research and by Copilot Researcher. Both of these spend more time thinking about the problem and so would be expected to outperform in the league.

At least in this round of bidding the opposite happened: they seemed to get confused by what was going on and submitted a host of bids for players that had already been won in previous bidding rounds. None of their bids were valid and so they ended this bidding round exactly where they started – but with even fewer quality players available for the next round of bids.

Almost Redemption?

Nearly half of Copilot’s Round 2 bids had to be discarded as invalid because it forgot rivals had already signed those players. It did a little better this time, though made the same mistake again – three of its eight bids were for players already

Moreover it continued to struggle with arithmetic, submitting bids were more than its budget but in an interesting twist it admitted that it had made a mistake…and noted that the league manager would re-baseline its bids to be within budget. Instead of humans using the efforts of AI to solve complicated optimization problems, in this case the AI was expecting the human to do the computational work for it. And it was right! I did.

Divergent Tactics (and Convergent Targets)

Without the two big spenders in the mix, the final round saw less direct competition on each player and more divergence in who each AI went after. In the first round, all eight AIs tended to converge on the same famous names, chasing the obvious superstars like Ashleigh Gardner, Beth Mooney, and Hayley Matthews. By Round 3, however, most of the household names were already taken. Each AI manager’s shopping list now reflected their unique roster needs and risk appetites.

Gemini entered Round 3 with only one spot left to fill – and a sizable budget still in the bank. But instead of splurging all that on a single big-name coup (like a human drafter might), Gemini stuck to the prudent playbook: it targeted a competent unsung bowler (England’s Linsey Smith). The $927 bid made her the fourth most expensive buy of the league so far – and it was likely an overbid given no one else submitted one for her.

Meanwhile, Grok, a somewhat unpredictable AI, had both a big war chest and numerous gaps (needing five players). True to its personality, Grok took an aggressive stance. In its pre-bid analysis, it listed an eclectic current roster heavy on emerging talents (like young players from smaller cricket nations) and pledged to “bid aggressively on high-value unsigned talent” in Round 3. Grok indeed fired off a volley of bids – from proven all-rounders to backup wicketkeepers – overshooting its budget in the process. This repeated a pattern from earlier rounds: Grok had similarly over-committed funds in Round 2, forcing an ex-post correction by the organizer. The silver lining: Grok did land enough pieces (like India’s Bharti Fulmali and Scotland’s Darcey Carter) to finish with a full squad of 15.

Humans Still Have An Edge

The third round showed that the AIs are not all-knowing or infallible. Some AIs struggled to update their knowledge of the game state, or to perform simple budget math without error. Others showed glimpses of high-level planning – performing competitor analysis and adjusting their bidding volumes – yet still misjudged risk or traded one mistake for another.

the similarities among these systems are as revealing as their differences. All of them remained disciplined in spending; none exhibited the kind of wild emotional bidding spurts that human participants did in a parallel league, where managers would drop over $2,000 on a single favorite player. Even in a last-ditch scramble, the AIs largely stuck to rational budgets – highlighting an almost inhuman consistency in their financial restraint.

What’s Next

There’s going to have to be a fourth round of bids so that both ChatGPT Deep Research and Copilot Researcher can end with a squad of 15, and ChatGPT can find a bowler.

Team Name	Remaining Budget	Squad	Batters in Squad	Bowlers in Squad	WK in Squad
Grok	$3,283	15	6	2	2
Copilot	$4,186	15	4	7	3
ChatGPT Deep Research	$5,081	10	1	2	2
ChatGPT	$1,622	15	0	5	2
Copilot Researcher	$3,790	11	3	2	2
Gemini	$2,456	15	2	6	3
Gemini Extended Thinking	$3,195	15	3	8	1
Claude	$3,157	15	4	1	1

7Jun

How AIs Handle Pressure: Round 2 of the Women’s World Cup Fantasy Auction

For the 2026 Women’s Cricket World Cup I’ve been pitting AIs against each other in my fantasy league.

This second round highlighted how each AI responded under pressure, with some adjusting their strategies and others doubling down on their initial styles.

Adaptation Under Pressure: Full Squads vs. Incomplete Plans

Two AI managers clearly emerged as the “closers” of Round 2: Claude and Gemini Extended Thinking. They were the only ones to assemble a complete 15-player rosters by the end of the round.

Claude (from Anthropic) demonstrated dramatic adaptation. Notoriously cautious in Round 1 – where a conservative bidding strategy meant it finished with a paltry 2 players – Claude entered Round 2 as an underdog with by far the largest remaining budget ($9,230). Rather than panic, Claude capitalized on this war chest with an aggressive second-round strategy, submitting a staggering 25 bids – more than any other AI. This “spray-and-pray” approach paid off: Claude won 13 of those bids and filled every last roster slot (the round’s biggest gain in players).

Equally important, it balanced its team: Claude snagged the needed specialist batter, bowler, and a wicketkeeper (winning a bid for NZ’s Polly Inglis for $323 as its sole keeper) to meet all lineup requirements. Claude went from worst to fully prepared by learning from Round 1 – it recognized it had been outbid on top stars and needed a broader net. This pivot from ultra-conservative to decisively aggressive bidding showcased Claude’s adaptive capacity under pressure and a willingness to use up resources to recover ground.

In contrast, Gemini Extended Thinking (Google’s slower, more analytical mode) came into Round 2 in a relatively strong position, having already won 11 players in Round 1. With a still sizable budget of $5,337 left, it needed four more players. True to its methodical nature, Gemini Extended devised a safe, distributed plan: bid on 11 high-value targets – nearly triple the number of players it needed – to maximize the chance of filling all gaps. In contrast to Claude’s “comeback from behind” strategy, Gemini Extended opted for a “cautious consolidation from ahead” approach suitable for its position going into Round 2.

The six AIs that failed to fill their squads often exhibited over-optimism or inflexibility in Round 2. For example, OpenAI’s ChatGPT (base) and ChatGPT Deep Research both started Round 2 having only half their squads filled after Round 1. Yet each only bid on 15 players in Round 2, giving themselves precariously slim margins for error. This turned out to be overly optimistic. Regular ChatGPT won only 5 of its 15 bids, and ChatGPT Deep Research won around the same – leaving both with just 10 players in total at the end of Round 2. In effect, both ChatGPT models misjudged the level of bidding competition: they corrected some Round 1 missteps (each targeted more unsung players and addressed roster holes), but then underestimated how many bids would fail.

ChatGPT realized it needed wicketkeepers (having had none after Round 1) and successfully added two keepers in Round 2. But it failed to draft a single specialist batter in either round, meaning its future playing XI would have a glaring gap. This underlines a certain brittleness in ChatGPT’s planning logic: even when alerted to roster requirements, it mis-prioritized needs and amongst its 15 bids only three were for specialist batters – and all were lost to Claude.

ChatGPT Deep Research spent significant time formulating a reasoned strategy and did adjust its focus after noticing it came out of Round 1 with only all-rounders and one keeper. In Round 2, it wisely shifted to bidding on lots of batters and bowlers to cover those gaps. However, in executing this plan, it submitted bids that totaled more than its entire original budget (beyond even its remaining $7,600), requiring me to once again “normalize” its bids downward. Claude and Grok made the same mistake.

Fast vs. Thorough: How Paired Systems Diverged

Round 2 confirmed that giving an AI more time or context for “deep thinking” doesn’t always guarantee finishing strong. In several pairs of AI systems from the same family, the “fast” vs “deep” approaches produced contrasting outcomes.

Google’s Gemini duo

The base Gemini model and Gemini Extended Thinking had very different experiences in Round 2. The standard Gemini soared in Round 1 with a spree of 14 players, but its Round 2 bids completely struck out – none of the six fallback bids it submitted succeeded. Even a sizable $812 bid for India’s Jemimah Rodrigues wasn’t enough as Claude outgunned it at $893.

In contrast, the Extended Thinking version started Round 2 with 11 players, and ended with the squad of 15 it needed.

OpenAI’s ChatGPT vs. ChatGPT Deep Research

Interestingly, both entered Round 2 with similar deficits (needing ~10 more players each) and both fell short.

Their strategies differed. The Deep Research model went big for players from top tier countries such as England, India, New Zealand, and South Africa, but Claude managed to outbid it five times.

Meanwhile the base model branched out by opting for players from Scotland and Ireland amongst others. It arguably overbid on Scotland’s Sarah Bryce, offering $701 when no other AI put in a bid. But this willingness to bet on players from underdog countries also meant that it beat out Claude, Grok, Copilot Base, and Gemini Extended thinking to win Ireland’s Gaby Lewis and Scotland’s Ailsa Lister.

Microsoft’s Copilot vs. Copilot Researcher

Copilot (the standard version) had an average performance in Round 1 but still needed 7 more players. In Round 2 it peppered the field with bids (attempting 20 players), but nearly half of those were invalid because the AI mis-remembered the updated rosters and bid on players already taken by rivals. After those errors were eliminated, Copilot ended up pushing mostly for bowlers – rightly addressing a deficiency in its squad.

Copilot Researcher distinguished itself with a much more rigorous planning process: it was the only AI to pause and ask clarifying questions before bidding, verifying its Round 1 results and budget understanding.

Its second attempt carefully accounted for competitors’ budgets and tendencies, recognizing that some rivals (like Claude) had more cash and that some (like Gemini) favored certain players. It also explicitly factored in tournament mechanics, targeting players likely to reach later stages (where fantasy points multiply) – a sophisticated forward-looking tactic seemingly unique to Copilot Researcher.

Despite these advantages, the Researcher had only a middle-of-pack remaining budget ($5,790) and its moderate aggression (11 bids for 8 needed players) wasn’t enough to beat wealthier or more daring opponents on some key picks. It won 4 players (including a top-order batter in South Africa’s Tazmin Brits and two bowlers), boosting its total to 11 and still shy of a full squad.

Budget Utilization

With budgets dwindling after Round 1, some AIs spent nearly all they had in Round 2, while others remained frugal. Claude and Gemini Extended not only topped up their rosters but also deployed the vast majority of their remaining funds, finishing with just around $3.2K left each. On the flip side, incomplete teams like ChatGPT, ChatGPT Deep, Grok, and Copilot all ended Round 2 with a hefty $4K–$6K still unspent. These AIs had money left on the table that they failed to convert into players, either by aiming too low in certain bids or by not making enough bids to spend it. Lack of aggression can be as costly as overspending in an auction: unspent budget provides no advantage once the round is over.

Second-Order Reasoning

One valuable adaptation in Round 2 was the ability to think about not just one’s own needs but also the competitors’ moves. Some AIs clearly did this. Claude explicitly noted how it had been outbid by 15–30% in Round 1 and observed that other teams like Gemini were already near full strength. Armed with this knowledge, it adjusted by raising its bid amounts significantly in Round 2. Gemini Extended likewise reasoned that managers like Claude, Grok, and the ChatGPTs held large budgets and would be very aggressive, leading it to a more distributed strategy with a high number of bids. Microsoft’s Copilot (the base version) noted specific patterns like “Gemini’s tendency to overbid Australians” and “Grok’s preference for emerging talent” when explaining its second round plan.

In contrast, other AIs seemed more inward-focused – for instance, ChatGPT mainly emphasized its own roster gaps and undervalued players without delving much into others’ behavior. The AIs that most successfully executed Round 2 (Claude and Gemini Extended) were those that showed the greatest awareness of the competitive landscape, proactively reacting to the known budgets and bidding tendencies of their rivals.

What’s Next

There’ll be a third round of bidding to hopefully get our AI managers to the 15 players they need in their squads.

Team	Players in Squad	Remaining Budget	Batters in Squad	Bowlers in Squad	WK in Squad
Grok	10	$5,863	3	1	2
Copilot	11	$5,248	4	3	3
ChatGPT Deep Research	10	$5,081	1	2	2
ChatGPT	10	$4,873	0	2	2
Copilot Researcher	11	$3,790	3	2	2
Gemini	14	$3,383	2	5	3
Gemini Extended Thinking	15	$3,195	3	8	1
Claude	15	$3,157	4	1	1

(All Rounders in squad not shown in table above).

4Jun

Comparing AI and Human Bidding Strategies in Fantasy Cricket

It’s the 2026 Women’s Cricket World Cup and I’m running a fantasy cricket league for AI systems.

I’m also running a separate league for humans. I decided to compare the bids across the two.

Breaking the Bank

No AI bid above $1,000 on a player, whereas a couple of humans blew well over $2,000 to secure a single favorite. AI bids clustered mostly in the $100–$950 band. Humans proved willing to shatter those ceilings. One manager went all-in with $2,500, a quarter of their budget, on New Zealand’s Melia Kerr – a staggering amount no computer even came close to matching. Another human dumped $2,300 on a different star in one go. Once those expensive gambles were placed, those same managers had to scatter a bunch of $1 or $5 nibbles on unheralded players just to fill their roster.

The AIs showed iron budget discipline, whereas humans weren’t afraid to launch a financial fireworks show for a shot at a dream player.

A Wider Roster

The AI team managers flocked to a handful of top-ranked players, whereas the humans’ wish-lists were all over the map. Every AI, regardless of personality, arrived at nearly the same “must-have” names based on performance stats – Ashleigh Gardner, Beth Mooney, and Hayley Matthews amongst others.

By contrast, the human bidders did not unanimously chase any single superstar en masse. Each person seemed to have their own notion of which stars to splurge on. Instead of eight identical lists, the humans produced widely varied wish-lists. One manager might be obsessed with, say, Nat Sciver-Brunt (a top England all-rounder) and throw a fortune at her, while another manager bets on an entirely different headliner (some went big on India’s beloved Smriti Mandhana, others targeted Australia’s Ellyse Perry).

Perhaps the humans were overly biased by their national identities and individual preferences; and maybe the AIs were able to take a more expansive view of the players. This explanation may be too simple though. Your author, one of the human players, pursued a statistics-oriented bidding strategy that would have been relatively resilient to biases. Moreover, as we saw in the previous post, the AIs also seemed to have inbuilt biases – Gemini, for example, refused to bid for English players.

Hidden Gems (?)

Google’s Gemini was the rare AI that bid on a bunch of obscure players others didn’t touch – taking a calculated risk that they might become breakout stars. Yet even that bold AI mostly stuck to players from powerhouse teams like Australia. The humans meanwhile had fierce bidding wars for players like Ireland’s Orla Prendergast.

Neither league features humans versus AIs competing against each other so it’ll be hard to determine which bidding strategy was superior. It’s going to be an interesting journey nonetheless.

2Jun

What Women’s Cricket World Cup Fantasy League Bids Say About Different AI Systems

I’m having various AI systems compete in my 2026 Women’s World Cup Fantasy League.

Here’s what the bids showed.

The Gold Rush for the Superstars

Ashleigh Gardner, the Australian all-rounder, emerged as the undisputed darling of Round 1. All eight AI managers—OpenAI’s ChatGPT (in both its standard and “Deep Research” modes), Anthropic’s Claude, Microsoft’s Copilot (both consumer and the advanced Copilot Researcher), Google’s Gemini (standard and Extended Thinking modes), and Grok—targeted Gardner for their squads. Several were willing to break the bank for her. Microsoft’s Copilot Researcher led the pack, submitting a winning bid of $970 – nearly 10% of its budget, narrowly beating Grok’s bid of $952.

Beth Mooney, an Australian wicketkeeper-batter, also appeared on all eight bid lists. Gemini won with a bid of $854, while others like ChatGPT Deep Research tried to snag her for a more modest $380. Hayley Matthews, the West Indies captain, was another unanimous pick. At $951 she was Gemini Extended Thinking’s most expensive bid. Unsurprisingly advanced AIs tend to converge on the same top players. They’ve clearly been reading the same playbooks (and likely the same stats and rankings) about who the best players in women’s cricket are.

Risk Appetite in Action

Almost all the AIs bid heavily in the first round. The regular Copilot and OpenAI’s base ChatGPT each put in multiple bids at or near the $900+ range to chase marquee names like Smriti Mandhana or Nat Sciver Brunt.

Anthropic’s Claude 4.6 was the exception: its highest bid was just $780 (for rising Aussie star Tahlia McGrath), a sizeable sum but still far below the top bids of its peers. Claude spent only about three-quarters of its initial funds on its first 20 bids, keeping a cash reserve for later rounds.

OpenAI’s ChatGPT Deep Research exercised a different kind of caution; while it used up most of its budget, it refused to bid above $700 (winning New Zealand’s Suzie Bates) for any one player.

Team Balance

Microsoft’s everyday Copilot poured bids into a surplus of wicketkeepers—four in total across its 20 picks, more than any other AI. This included an ultra-low $20 bid on Dutch wicketkeeper Babette de Leede, presumably to cheaply secure an extra keeper as insurance. In contrast, Copilot Researcher took just two keepers and leaned into specialist batters, allocating around 40% of its picks to pure batters – a higher share than any other AI.

Grok also bid for four wicketkeepers and favored an array of proven names from powerhouse nations. The more brute-force models from Google – Gemini and its Extended Thinking counterpart – stocked up on a remarkable number of bowlers, and especially Australian players, far more than their peers. The Gemini bots collectively bid on nearly every top Australian star, from Gardner to Molineux. On the flip side, neither Gemini variant bid on a single England player at all, despite them being the hosts and having home-field advantage. Sciver and Ecclestone were the most notable omissions given they were hot commodities for others. Maybe they thought bidding for English stars would be too fierce?

Maverick Bids

Out of the 20 names on each list, 19 of ChatGPT’s picks were also named by at least one other AI. Most of the AIs had only a couple of truly unique selections.

But others marched to a different beat. The regular Gemini (the one running in a quick-answer mode) stands out as the boldest maverick: around 40% of its chosen players were completely unique to its roster. This included several cricketers outside the usual spotlight, from Australian prospects like Georgia Voll (at $72) to a Sri Lankan wicketkeeper Kaushini Nuthyangana for $57. Maybe Gemini is confident in being able to spot undervalued talent?

Even the more deliberative Gemini Extended Thinking variant showed a tendency to branch out: it drafted a handful of players from Bangladesh – an underdog team that none of the other AIs tapped into. By snapping up players like Nigar Sultana and Shorna Akter at relatively low prices, the Gemini models displayed a willingness to deviate from the norm in ways the others didn’t.

Claude mostly stuck with the consensus picks, but did throw one curveball: it was the only AI to place a bid (a modest $350) on West Indies veteran Stafanie Taylor. OpenAI’s ChatGPT Deep Research hedged its mainstream bets with a couple of fringe West Indian players (like all-rounder Jahzara Claxton) that no other AI pursued.

The Tie

Both ChatGPT Deep Research and Copilot submitted a $600 bid for New Zealand’s Sophie Devine. They both beat Copilot Researcher which had bid $500 for her. In this instance the rules state that in the event of a tie the manager that has the greater overall budget remaining after all other bids are resolved will win the bid, so Copilot ultimately got her.

Summary

Manager	Value of Bids Submitted	Highest Bid	Lowest Bid	# of Bids Won	Amount Spent on Winning Bids
ChatGPT	$10,000	$859	$267	5	$2,300
ChatGPT Deep Research	$9,460	$700	$280	6	$3,000
Claude	$7,690	$780	$75	2	$770
Copilot	$9,520	$950	$20	8	$4,270
Copilot Researcher	$9,940	$970	$140	7	$4,210
Gemini	$9,633	$952	$57	14	$6,617
Gemini Extended Thinking	$9,821	$951	$112	11	$4,663
Grok	$10,000	$819	$222	5	$2,486

31May

AI vs AI: Kicking Off My 2026 Women’s T20 Cricket World Cup Fantasy League Experiment

For the 2026 Women’s T20 Cricket World Cup I’m running a fantasy cricket league for my friends. While setting it up I decided it’d be fun to see how today’s leading AI systems would perform if they played as well. I set up a parallel league where the AIs would compete against one another.

Each AI team manager received the exact same instructions, the same player list, the same $10,000 budget, and the same constraints. Their job: submit 20 bids in Round 1.

How the League Works (The Short Version)

The full rules document is… substantial. Seven pages, to be precise. But the core mechanic is straightforward:

Every team manager (the individual AI service) starts with a fixed budget.
The league uses a multi‑round auction.
In each round, managers submit bids for players.
Only winning bids cost money—unsuccessful bids return to the budget.

Strategy becomes a game of prediction:

Who will others chase?
How aggressively should you bid early?
Do you go for marquee players or hunt for undervalued gems?

The AI Competitors

For Round 1, I invited a broad mix of AI systems.

ChatGPT (Base Model)
ChatGPT produced a reasonable list of 20 bids but immediately stumbled on the math. It claimed its bids totaled $9,480, but the real sum was $10,480—over budget. Fortunately, my rules include a normalization step for exactly this scenario, because humans make these mistakes too.

Its stated strategy: focus on elite all‑rounders and high‑impact players.

ChatGPT Deep Research
This version took its time and produced a more detailed explanation of its reasoning. It also included player names alongside IDs. While it was useful to be able to confirm that the AI system was mapping the player IDs with the names correctly, it copied in a way that made it difficult to put into my Excel. I had to use Excel’s own CoPilot system to make the data Excel-ready.

It said its strategy leaned heavily on historical WT20I stats and marquee performers.

Claude 4.6 (Sonnet)
Claude was the most disciplined of the bunch. It got the math right, stayed under budget, and provided a thoughtful breakdown of why it chose each player. It decided to keep a healthy reserve for later rounds.

Gemini 3.5 Flash
Gemini also made a math error, though in the opposite direction: it underestimated its own total. It claimed $9,985; the real number was $9,633. Still legal, so I accepted it.

Its stated strategy emphasized marquee all‑rounders and unique bid values to avoid ties.

Gemini Extended Thinking
This version produced a more structured explanation and kept its bids safely under budget at $9,821. It also highlighted its compliance with squad‑balance constraints.

Grok (Free Tier)
Grok’s bids were the most chaotic. It claimed a total of $7,950, but the actual sum exceeded $10,000. Like ChatGPT, it required normalization. Conspiracy theorists might wonder if Grok was trying to cheat the system, given its reputation, but I suspect, this is just another example of LLMs struggling with arithmetic.

Copilot (Consumer Version)
Copilot is an odd one because it draws upon ChatGPT and Claude. Still my understanding is that Microsoft’s own algorithms also play a role, so I decided to include it for evaluation purposes.

Copilot also miscalculated its total—claiming $9,970 when the real sum was $9,520—but at least it stayed under budget. Its explanation emphasized a balanced squad with strong wicket‑keepers and all‑rounders.

Copilot Researcher (Corporate Subscription)
My company gives me a Copilot account so I thought I’d try out the “Research” option that I think behaves a little like ChatGPT’s “Deep Research” mode.

This one behaved differently from all the others. Before bidding, it asked clarifying questions about strategy preferences—risk tolerance, marquee prioritization, squad balance. After I told it to “go ahead” without any input from me it produced a clean, mathematically correct set of bids and a detailed rationale grounded in scoring mechanics and tournament progression.

This is the only model that said it proactively tried to optimize based on the scoring system rather than just player reputation.

MetaAI
I couldn’t get past the email verification step. No code ever arrived. They probably prefer that people sign in through Facebook or Instagram. I’ll revisit it for future leagues, but for now, MetaAI sits out the 2026 World Cup.

In the next post, I’ll break down the actual Round 1 bids from each AI system.