1. Watch a market come alive
A real price isn't decided by one person. It's the result of a crowd buying and selling at once. Below is a miniature version. Each dot is a trader buying, selling, and waiting. The line at the top is the price — notice that no one sets it; it just falls out of everyone's trades. Press play and drag the sliders.
A simplified illustration that runs entirely in your browser. The real project simulates a full order book with many trader types — but the idea is exactly this: price is an emergent crowd phenomenon.
2. The honest experiment: can you beat the market?
It's tempting to believe you can find a pattern that prints money. So I tested eight different kinds of strategies on real market data — trend-following, “buy the dip,” momentum, calendar effects, and more.
The catch: if you try enough random strategies, one will look great by pure luck. To avoid fooling myself, every strategy had to pass a strict statistical “lie detector” (it corrects for how many things you tried) and still win after the real cost of trading.
That's not a failure — it's the truth. Liquid markets are efficient: the easy patterns are already traded away. Most people who think they have an edge simply haven't tested it honestly. Building the test that says “no” is the hard, valuable part.
3. The one real pattern: how wild tomorrow will be
You can't reliably predict which way the price goes. But there is something you can predict: volatility — how much prices swing around. Calm days tend to follow calm days; stormy days cluster together. (You can even see it in the demo above: turn up “Chaotic” and the swings stay big for a while.)
My forecast of this “choppiness” matched the industry-standard models on real data, out-of-sample, across both stocks and crypto. You don't get rich betting on direction — but forecasting risk is exactly what banks and funds use to size positions and manage danger. So the project found a real, useful signal — just not the get-rich one.
4. Why this was worth building
The valuable part was never striking gold. It was building a research system honest enough to tell me “no” — with safeguards against the classic ways people fool themselves (curve-fitting, ignoring costs, peeking at the answer). Every result is logged in a tamper-evident record so nothing can be quietly re-spun.
- 🧱 A market simulator: order book, many trader types, emergent prices.
- 🔬 A validation pipeline with proper statistics and leakage controls.
- 🧾 An honest verdict: no easy edge, but volatility is forecastable.
Under the hood: what makes the results trustworthy
The hard part of quantitative research isn't writing a strategy — it's not fooling yourself. Markets are noisy enough that if you try a few hundred ideas, one will look brilliant by pure chance. Almost every "I beat the market" claim dies here. So most of the engineering went into a validation firewall: a stack of guardrails, each closing off a specific way people accidentally lie to themselves with data. Here's the machinery, in plain terms.
Prices that emerge from an order book, not a formula
Instead of drawing a random squiggle, the engine runs a real limit order book with price–time-priority matching — the same rule actual exchanges use to pair buyers and sellers. Heterogeneous agents (market makers, momentum traders, mean-reverters, large institutions working a hidden order, options dealers hedging their gamma) submit orders; the price is whatever those orders clear at. It's a closed loop: agents react to the price they collectively just created.
Why it's robust: realistic behavior — volatility clustering, fat tails, the leverage effect (down moves are choppier than up moves) — emerges from the mechanics rather than being hard-coded. That makes it a fair sandbox for stress-testing ideas, not a model rigged to confirm them.Deflated Sharpe Ratio & multiple-testing correction
A strategy's "Sharpe ratio" measures reward per unit of risk. The trap: test 100 strategies and the best one looks amazing even if all 100 are worthless — that's the multiple-comparisons problem. The Deflated Sharpe Ratio (Bailey & López de Prado) corrects the bar for how many strategies you tried, how long the track record is, and how fat-tailed / skewed the returns are. A result only counts if it clears that bar.
Why it's robust: it converts "looks good" into "is statistically real after accounting for the search," which is exactly the step that kills false discoveries from data-snooping.Walk-forward testing with purged gaps
Every model is trained only on the past and scored on later, unseen data — rolling forward through history the way you'd actually have lived it. A purge (a small gap between train and test) prevents a label that peeks slightly into the future from leaking backward across the boundary.
Why it's robust: it blocks look-ahead leakage, the single most common reason a backtest looks profitable and then collapses in live trading. Results reflect foresight, not hindsight.Realistic costs and the right benchmark
Every result is reported both at zero cost and after realistic transaction costs (spread, fees, slippage), and is compared against the benchmark you actually have to beat — usually just buying and holding. A strategy that wins on paper but evaporates after costs is recorded as a loss.
Why it's robust: an "edge" smaller than your trading costs isn't an edge. This gate is where most surviving candidates quietly die — as they should.Does the synthetic world predict the real one?
A simulator is only useful if lessons learned in it transfer to reality. So the pipeline measures the rank correlation (Spearman) between how strategies are ordered on simulated data versus real data — with the sim's volatility matched to a disjoint slice of real data, never the held-out test set.
Why it's robust: it tests the assumption that synthetic results are informative instead of taking it on faith — and quietly flags when the sim and reality disagree.Negative controls on pure noise
The machine-learning harness is deliberately pointed at random noise and must report essentially zero skill; the volatility benchmark is guarded against numerical blow-ups. If a pipeline "discovers" signal where none exists, the pipeline is broken — not the market.
Why it's robust: the tools that judge the strategies are themselves tested. A measuring instrument you've never checked against a known-zero is just a random-number generator with confidence.An append-only research log with integrity hashing
Every experiment writes an immutable record — data span, symbols,
targets, number of trials, cost assumptions, leakage controls, final
metrics, the exact code commit, and whether it passed each gate — to an
append-only log, each line sealed with a SHA-256 hash.
Dependency-light, deterministic, and tested
The core is pure Python — even the statistics (normal CDF, inverse-CDF, the linear-algebra solver) are implemented from scratch rather than pulled from heavy libraries, so the math is transparent and the install is trivial. Runs are seeded for determinism, and a suite of ~350 automated tests plus linting and continuous integration guard every change.
Why it's robust: reproducibility and transparency. Anyone can read exactly how each number is computed and re-run it to get the same answer.The verdict this machinery produced
Pointed at free market data across eight strategy families, the firewall said no to every directional, get-rich claim — the correct, honest answer for liquid, efficient markets. The one signal that survived every gate: volatility is forecastable, and the forecaster is competitive out-of-sample with the industry-standard HAR-RV and EWMA (RiskMetrics) models on both equities and crypto.
That's the real deliverable — not a money printer, but a research process disciplined enough to be trusted when it says "no."
Want the specifics? The full methodology, code, and locked results are in the repository linked below.