An LLM built a production-grade trading system but couldn't find alpha

I spent years running a quant fund, so I know what a real trading system looks like and how long it takes to build one. So I ran an experiment: I had Claude build one from scratch, aimed at a specific idea -- harvesting the lag between a catalyst posting as text and the market repricing it, in micro-cap altcoins too small for big funds to bother with. It built the whole thing in a fraction of the time it took my team, but it still couldn't find alpha.

The HOW: it built the entire machine, fast. The pipeline is the architecture I'd have drawn on a whiteboard:

DataSource         prices + catalysts
   |
FeatureExtractor   unstructured text -> bounded signal (LLM)
   |
FeatureStore       aligned alpha matrix
   |
Optimizer          CVXPY, constrained portfolio
   |
Backtester         daily loop, net of trading costs
   |
Report             equity curve vs BTC / ETH

A data layer pulls prices and catalysts. An LLM feature extractor reads the unstructured catalyst text -- governance votes, tokenomics changes, listing and regulatory disclosures -- and turns it into a bounded directional signal and an expected-return vector. That feeds a CVXPY constrained optimizer that builds a portfolio under real account limits, which feeds a daily backtester that runs the strategy net of trading costs and benchmarks the equity curve against just holding BTC and ETH, then prints Sharpe, Sortino, drawdown, turnover. This is production-grade quant infrastructure, the same tools, CVXPY included, that I paid experienced engineers to build. It came together in days.

The WHAT: the edge wasn't there. The thesis was reasonable on paper. The machinery executed it correctly. And the signal came out flat -- no real edge once you net out costs. Nothing was broken, and the system did exactly what I asked. It just turns out that asking the question well and building the apparatus to test it is not the same as the inefficiency actually existing and being mine to capture.

Building the system is the HOW. Knowing which inefficiency is real, why it exists, and why it will survive other people noticing it is the WHAT, and the LLM is no help there. It will wire up any thesis you hand it with equal conviction, including a bad one. The judgment for which edge is real is still entirely human.

The HOW used to be a moat. For most of trading's history, building robust infrastructure -- clean data, a sane optimizer, an honest backtester that doesn't lie to you about costs -- was hard and slow, and a lot of edge could come from being better at the plumbing. LLMs are erasing that moat. Anyone can now stand up a credible backtester and production system in a weekend.

So when the HOW gets commoditized, the competition collapses onto the WHAT. A genuine, non-obvious, durable thesis about why a price is wrong becomes the only thing that separates anyone, and now everyone has the tooling to test theirs against yours instantly. LLMs made building a trading system easy and finding real alpha harder, for the same reason.

This is private research, not investment advice, and a flat backtest is the most honest result you can get -- it means the apparatus didn't fool me. The next step is the unglamorous human work: information-coefficient analysis, event studies, figuring out whether there's a real signal in there at all. The machine can't do that part for me.

← Writing