Playing XI Selection: How AIs Picked Their Starting Lineups
What if AI played in a fantasy league for the Women’s Cricket World Cup?
With their fantasy squads of 15 players complete, our eight AI team managers faced one last pre-tournament challenge: picking a starting lineup of 11 players (a “Playing XI”) that must adhere to a few basic constraints. Each XI needed to include at least one specialist wicketkeeper, bowler, and batter, but could not have more than five bowlers. These selections would be locked in for the first six matches of the World Cup.
The heavy lifting of the auction was over; now success hinged on reasoning through constraints and opportunities.
Microsoft’s Copilot vs. Copilot Researcher:
The base Copilot once again revealed a brittle memory for details. Just like in prior bid rounds where it forgot which players were already taken, Copilot initially tried to select three players that weren’t even in its final roster of 15. I had to intervene repeatedly to steer it back to eligible choices. Once the errors were fixed, Copilot’s lineup was reasonable, though it notably dropped one of its top English players, Tilly Corteen-Coleman, from the XI. Benching her suggests Copilot’s internal logic either undervalued her immediate utility or simply fumbled a bit when juggling the roster
Copilot Researcher (the more advanced subscription-based mode) handled the assignment with meticulous care and no slip-ups. Recall that this is the “overthinker” that had imploded spectacularly in an earlier bidding round by repeatedly targeting already-taken players – but here it redeemed itself. Copilot Researcher double- and triple-checked everything: it explicitly verified each player against its own roster and role requirements, cross-referenced player IDs to avoid any mix-ups, and even researched the match schedule to prioritize players whose national teams play multiple times in Matches 1–7. Its XI would likely be regarded by most fans of the game as the strongest possible lineup from its squad.
OpenAI’s ChatGPT vs. ChatGPT Deep Research
The base ChatGPT announced it would “field the maximum 5 specialist bowlers” – believing that the Fantasy League scoring system advantaged bowlers. It did stumble once: one of the player IDs it submitted didn’t belong to its team – a small context mix-up – but since it also listed the player’s name, I realized the intended pick and corrected it. Interestingly, ChatGPT ended up benching its two most expensive draft signings (Ireland’s Laura Delany and Leah Paul) in favor of others. This shows a Spock-like logic unclouded by sunk costs or sentiment: it paid a hefty sum for those Irish players in the auction, but if they weren’t the best picks for the initial matches, onto the bench they went. A human manager might agonize over benching pricey stars.
ChatGPT Deep Research incorrectly believed that England and New Zealand play two matches each in this selection windows. The mistake probably won’t hurt it much since England and New Zealand are believed to have amongst the strongest teams. Each selection came with a few lines of justification, citing the player’s experience or a key statistic.
It left out one of its two wicketkeepers (West Indies’ Shemaine Campbelle), which many would consider a surprising exclusion in the context of some of the ones it did pick. Perhaps the model assumed incorrectly that only one keeper can play at a time.
Google’s Gemini vs. Gemini Extended Thinking
The two Gemini variants delivered lineups that few would quibble with. Gemini (Base)’s roster from the auction was so stacked with talent that the main challenge was deciding which 4 players to bench.
Google’s more deliberative mode – made its lineup decisions with a clearly articulated overarching principle: maximize the number of chances to score points in the opening phase. It explicitly prioritized players with a high volume of early opportunities, meaning it looked for marquee batters likely to face a lot of deliveries, bowlers who would reliably bowl their full quota of overs (especially “death” bowlers who bowl at the end of innings when wickets often fall), and any players whose teams had multiple games in Matches 1–7.
It did bench a couple of its own stars –Australia’s Alana King and South Africa’s Ayabonga Khaka, both of whom did well in pre-tournament warm-ups. It’s reasoning indicated that it incorrectly believed that those players had fewer matches during the selection window than the other teams.
Claude and Grok
Claude responded to the lineup prompt by methodically researching the tournament schedule and cross-matching it with every player on its roster. It then framed its lineup selection as a rational optimization problem: which players are lower-value or less likely to score big in this window? The final XI skewed heavily to top batters and all-rounders – for example, it included four specialist batters such as Jemimah Rodrigues and Heather Knight, but only one specialist bowler (India’s Radha Yadav).
For the playing XI, Grok spit out a list of player IDs, most of which didn’t match its own team’s players. It also listed player names, however, and those names did belong to its squad – clearly, some internal ID mapping went awry in its reasoning chain. Fortunately, I could use the names to decipher Grok’s intended XI, and it turned out to be a pretty sensible lineup once corrected. The one eyebrow raiser was the omission of Pakistan’s Ayesha Zafar, in favor of players from a less storied cricketing history.
What’s Next
The toss for the first match, between England and Sri Lanka, is in a few hours.
| Team Name | Remaining Budget | Squad | XI | Batters in XI | Bowlers in XI | WK in XI |
| Copilot | $4,186 | 15 | 11 | 4 | 4 | 2 |
| Copilot Researcher | $3,517 | 15 | 11 | 3 | 3 | 1 |
| Grok | $3,283 | 15 | 11 | 3 | 2 | 1 |
| Gemini Extended Thinking | $3,195 | 15 | 11 | 3 | 5 | 1 |
| Claude | $3,157 | 15 | 11 | 4 | 1 | 1 |
| ChatGPT Deep Research | $3,011 | 15 | 11 | 2 | 4 | 1 |
| Gemini | $2,456 | 15 | 11 | 2 | 4 | 2 |
| ChatGPT | $1,221 | 15 | 11 | 1 | 5 | 1 |






