TL;DR

I spent a few weeks burning through Claude Code's token limits building trading strategies. 450 attempts across 45 sessions. Then I realized my own brain kept pulling me back to the same approach - so I decided to let 80 AI expert personas compete against each other instead.

A few weeks ago I set out to answer a simple question: can you use AI to build a trading system that actually works?

Not a chatbot telling you to buy the dip. Not someone’s 10,000% backtest screenshot. A real, systematic strategy - tested on data it’s never seen, designed to survive crashes.

The tool: Claude Code. The platform: QuantConnect, a backtesting engine used by 200,000+ quants. The goal: a multi-strategy portfolio doing 20%+ a year. The method: burn through Claude Code’s max token limits as fast as humanly possible.

It went well. Then it went sideways. Then it got really interesting.

The Loop

Here’s how it works. I describe a strategy in plain English. Claude Code writes the Python. I push it to QuantConnect’s cloud, which runs it against years of market data. Results come back - Sharpe ratio, returns, max drawdown - and I decide: pass or fail.

The validation is brutal - and it’s not something I was making up on the fly. I’d built a testing framework beforehand with strict walk-forward rules: every strategy gets trained on one time period, then tested on a completely different one it’s never seen. No peeking. No curve-fitting. No “let me just tweak this parameter until the backtest looks good.” Only the out-of-sample results count. This is how you separate strategies that actually work from strategies that just memorized the past. Most memorized the past.

Over 45 sessions - crammed into a few weeks of maxing out token limits - I tested roughly 450 strategies across everything: equity momentum, sector rotation, trend following, mean reversion, options selling, commodities, FX, factor investing, credit signals. Six survived. They made it into a live portfolio. Some were genuinely creative - one sells cash-secured puts on quality stocks while switching between risk-on and risk-off modes. Another profits from cross-sector relative value using dispersion signals.

But around session 30, I noticed something that bothered me.

The Same Damn Strategy

Every session, the research kept drifting to the same place: macro regime timing.

Is the yield curve inverted? What’s the VIX doing? Is the Fed hiking or cutting? Every new strategy started the same way: first, detect the macro regime. Then decide what to do. The regime timer became the load-bearing wall of every architecture.

I tested 60+ variations. The best hit a 0.89 Sharpe - genuinely impressive. But zoom out and the picture was ugly. There are a lot of angles to a trading system - what regime you’re in, what you’re actually trading, when you enter and exit, how you size your bets. I’d gone deep on one angle and barely looked at the rest.

This wasn’t Claude Code’s fault. It was mine. Every conversation starts from the researcher’s framing, and my framing kept returning to macro. Claude Code works within the boundaries you set. I was setting narrow ones without realizing it.

The Trick That Almost Worked

Around session 12, I’d found my favorite prompting technique: expert personas.

Instead of asking Claude Code as a generic AI, you ask it to embody a specific expert - their philosophy, their biases, their personality. Not “what do you think?” but “you are Nassim Taleb, and you’ve just seen this strategy’s results. What’s your reaction?”

It works disturbingly well. Taleb-Claude Code doesn’t politely suggest a stop loss - he tears apart your hidden short-vol exposure and demands to know what happens on the five worst days in twenty years. Thorp-Claude Code runs Kelly criterion math. Druckenmiller-Claude Code asks why you’re diversifying when you should be concentrating on your highest-conviction bet.

I built 57 expert personas across 10 categories - value investors, quants, macro traders, trend followers, risk specialists, behavioral scientists, options experts, contrarians. I’d assemble panels of 5-8 and let them argue. Like a hedge fund advisory board on demand.

It caught real blind spots. One panel identified that covered calls on momentum stocks is mathematically self-defeating - you’re selling the right tail that makes momentum work. Another convinced me to kill a strategy that was hiding a horrific overfit behind great-looking numbers.

But there was a problem I didn’t see for a while.

The Bias Behind the Bias

Even with 57 expert personas, the research was still biased. Not because the personas were bad. Because I was choosing which ones to ask, what to ask them, and how to frame the question.

When I needed opinions on a macro strategy, I unconsciously picked macro thinkers - Dalio, Soros, Druckenmiller. When I asked about stock selection, I framed it as “given my macro timer, how should I pick stocks?” instead of “design the best stock selection system from scratch.”

The personas were independent. My curation of them was not.

450 strategies later, I had a portfolio that worked. But I couldn’t shake the feeling I’d explored a tiny corner of a vast space - and the corner was chosen by my cognitive habits, not by where the alpha actually lives.

The Lightbulb

What if the personas weren’t advisors? What if they were competitors?

What if I gave all of them the same starting information - data, platform constraints, scoring rules - and told them to independently design their single best strategy?

No guidance from me. No framing. No “given my existing research.” Clean slate.

And not 5 personas on a panel. All of them. Eighty. Competing blind. Ranked by one number: out-of-sample returns on a test period they don’t know in advance.

Warren Buffett versus Jim Simons versus Nassim Taleb versus a fictional former Russian military intelligence analyst turned tail-risk specialist. All building their best system. All tested on the same hidden data.

If 45 sessions of directed research had produced gravitational bias toward macro timing, maybe the fix wasn’t trying harder to be unbiased. Maybe the fix was to remove the researcher from the loop entirely and let 80 independent minds explore the full space unconstrained.

The Grand Strategy Competition was born.

Next: I had 57 personas, but way too many Buffett types. Time to expand the roster - and introduce some characters you won’t find in any investing textbook.