A Tiny Stock Market — and a Lie Detector for Trading Strategies

1. Watch a market come alive

A real price isn't decided by one person. It's the result of a crowd buying and selling at once. Below is a miniature version. Each dot is a trader buying, selling, and waiting. The line at the top is the price — notice that no one sets it; it just falls out of everyone's trades. Press play and drag the sliders.

Price—

Mood—

Choppiness (volatility) —

Number of traders Trend-chasers ↔ Contrarians Calm ↔ Chaotic

A simplified illustration that runs entirely in your browser. The real project simulates a full order book with many trader types — but the idea is exactly this: price is an emergent crowd phenomenon.

2. The honest experiment: can you beat the market?

It's tempting to believe you can find a pattern that prints money. So I tested eight different kinds of strategies on real market data — trend-following, “buy the dip,” momentum, calendar effects, and more.

The catch: if you try enough random strategies, one will look great by pure luck. To avoid fooling myself, every strategy had to pass a strict statistical “lie detector” (it corrects for how many things you tried) and still win after the real cost of trading.

8 strategy types tested

0 that honestly beat the market

That's not a failure — it's the truth. Liquid markets are efficient: the easy patterns are already traded away. Most people who think they have an edge simply haven't tested it honestly. Building the test that says “no” is the hard, valuable part.

3. The one real pattern: how wild tomorrow will be

You can't reliably predict which way the price goes. But there is something you can predict: volatility — how much prices swing around. Calm days tend to follow calm days; stormy days cluster together. (You can even see it in the demo above: turn up “Chaotic” and the swings stay big for a while.)

My forecast of this “choppiness” matched the industry-standard models on real data, out-of-sample, across both stocks and crypto. You don't get rich betting on direction — but forecasting risk is exactly what banks and funds use to size positions and manage danger. So the project found a real, useful signal — just not the get-rich one.

4. Why this was worth building

The valuable part was never striking gold. It was building a research system honest enough to tell me “no” — with safeguards against the classic ways people fool themselves (curve-fitting, ignoring costs, peeking at the answer). Every result is logged in a tamper-evident record so nothing can be quietly re-spun.

🧱 A market simulator: order book, many trader types, emergent prices.
🔬 A validation pipeline with proper statistics and leakage controls.
🧾 An honest verdict: no easy edge, but volatility is forecastable.

Under the hood: what makes the results trustworthy

The hard part of quantitative research isn't writing a strategy — it's not fooling yourself. Markets are noisy enough that if you try a few hundred ideas, one will look brilliant by pure chance. Almost every "I beat the market" claim dies here. So most of the engineering went into a validation firewall: a stack of guardrails, each closing off a specific way people accidentally lie to themselves with data. Here's the machinery, in plain terms.

The simulator

Prices that emerge from an order book, not a formula

Instead of drawing a random squiggle, the engine runs a real limit order book with price–time-priority matching — the same rule actual exchanges use to pair buyers and sellers. Heterogeneous agents (market makers, momentum traders, mean-reverters, large institutions working a hidden order, options dealers hedging their gamma) submit orders; the price is whatever those orders clear at. It's a closed loop: agents react to the price they collectively just created.

Why it's robust: realistic behavior — volatility clustering, fat tails, the leverage effect (down moves are choppier than up moves) — emerges from the mechanics rather than being hard-coded. That makes it a fair sandbox for stress-testing ideas, not a model rigged to confirm them.

The lie detector

Deflated Sharpe Ratio & multiple-testing correction

A strategy's "Sharpe ratio" measures reward per unit of risk. The trap: test 100 strategies and the best one looks amazing even if all 100 are worthless — that's the multiple-comparisons problem. The Deflated Sharpe Ratio (Bailey & López de Prado) corrects the bar for how many strategies you tried, how long the track record is, and how fat-tailed / skewed the returns are. A result only counts if it clears that bar.

Why it's robust: it converts "looks good" into "is statistically real after accounting for the search," which is exactly the step that kills false discoveries from data-snooping.

No peeking

Walk-forward testing with purged gaps

Every model is trained only on the past and scored on later, unseen data — rolling forward through history the way you'd actually have lived it. A purge (a small gap between train and test) prevents a label that peeks slightly into the future from leaking backward across the boundary.

Why it's robust: it blocks look-ahead leakage, the single most common reason a backtest looks profitable and then collapses in live trading. Results reflect foresight, not hindsight.

Economic honesty

Realistic costs and the right benchmark

Every result is reported both at zero cost and after realistic transaction costs (spread, fees, slippage), and is compared against the benchmark you actually have to beat — usually just buying and holding. A strategy that wins on paper but evaporates after costs is recorded as a loss.

Why it's robust: an "edge" smaller than your trading costs isn't an edge. This gate is where most surviving candidates quietly die — as they should.

Sim → real bridge

Does the synthetic world predict the real one?

A simulator is only useful if lessons learned in it transfer to reality. So the pipeline measures the rank correlation (Spearman) between how strategies are ordered on simulated data versus real data — with the sim's volatility matched to a disjoint slice of real data, never the held-out test set.

Why it's robust: it tests the assumption that synthetic results are informative instead of taking it on faith — and quietly flags when the sim and reality disagree.

Validate the validator

Negative controls on pure noise

The machine-learning harness is deliberately pointed at random noise and must report essentially zero skill; the volatility benchmark is guarded against numerical blow-ups. If a pipeline "discovers" signal where none exists, the pipeline is broken — not the market.

Why it's robust: the tools that judge the strategies are themselves tested. A measuring instrument you've never checked against a known-zero is just a random-number generator with confidence.

Audit the sandbox

The simulator is tested against real market laws — and failures are published

A simulator you've never audited will happily confirm whatever you built into it. So the engine is probed with controlled experiments: inject one large "whale" order into the market and compare against a same-seed twin run where the whale stays silent — an exact counterfactual, so the price response IS the order's impact. The measured curve is compared against the square-root law, a famous empirical regularity of real markets. The first audit failed (big orders left no permanent trace), and that failure was published in the research log, not hidden. It drove real mechanism upgrades — traders that update their beliefs from order flow, liquidity that takes time to replenish, depth spread across price levels — after which permanent impact genuinely emerges and grows with order size. The remaining gap is documented as an open finding.

Why it's robust: the rule is "fix the mechanism and re-measure — never tune the knob toward the answer." Three plausible explanations were experimentally falsified along the way; the audit reports whatever comes out, including the parts that still disagree with reality.

Tamper-evident

An append-only research log with integrity hashing

Every experiment writes an immutable record — data span, symbols, targets, number of trials, cost assumptions, leakage controls, final metrics, the exact code commit, and whether it passed each gate — to an append-only log, each line sealed with a SHA-256 hash.

Why it's robust: you can't quietly re-run an experiment until it "works" and present only the winner. The honest verdict is recorded and auditable; the firewall is enforced by the machine, not by willpower.

Engineering

Dependency-light, deterministic, and tested

The core is pure Python — even the statistics (normal CDF, inverse-CDF, the linear-algebra solver) are implemented from scratch rather than pulled from heavy libraries, so the math is transparent and the install is trivial. Runs are seeded for determinism, and a suite of ~390 automated tests — including property-based invariant tests on the exchange and accounting — plus strict type-checking and continuous integration guard every change.

Why it's robust: reproducibility and transparency. Anyone can read exactly how each number is computed and re-run it to get the same answer.

The verdict this machinery produced

Pointed at free market data across eight strategy families, the firewall said no to every directional, get-rich claim — the correct, honest answer for liquid, efficient markets. The one signal that survived every gate: volatility is forecastable, and the forecaster is competitive out-of-sample with the industry-standard HAR-RV and EWMA (RiskMetrics) models on both equities and crypto.

That's the real deliverable — not a money printer, but a research process disciplined enough to be trusted when it says "no."

Want the specifics? The full methodology, code, and locked results are in the repository linked below.