§ Design decisions

What it does well — and where it doesn't.

What this framework does well, where it has boundaries, and why certain trade-offs were made. Published in full on the site because the right way to use any backtester is with a clear-eyed view of what it can and can't tell you.

The authoritative version lives alongside the code on GitHub ↗. If anything here disagrees with the repo, the repo wins.

§ 01

Data providers

Twelve equity providers, crypto via CCXT (110+ exchanges), and options via philippdubach. Yahoo Finance and CCXT are the primary free-tier providers most users will use.

Yahoo Finance (primary, free, no key)

  • Supports daily and intraday (1m, 5m, 15m, 1h)
  • Auto-adjusted for splits and dividends
  • Intraday limits: 1m data ~8 days, 5m/15m ~60 days, 1h ~730 days
  • Daily data: 20+ years
  • No API key required
  • Wraps yfinance (unofficial library) — could break if Yahoo changes their site, but yfinance is actively maintained and widely used

Other equity providers

Each has different free-tier limits — documented in the provider docstrings:

  • Tiingo · 20+ years, 1000 req/day — best depth
  • Polygon · ~2 years free, implements pagination and rate-limit sleep
  • Alpaca · 5–6 years, 200 req/min, paginated
  • Alpha Vantage · ~5 months on free compact mode
  • Twelve Data, Finnhub, FMP, MarketStack · Various limits

Design decision: no pre-validation of tickers

We let the API return an error rather than maintaining a ticker list that goes stale. The trade-off is slightly less helpful error messages in exchange for zero maintenance.

§ 02

Crypto (CCXT)

What works well

  • 110+ exchanges — far more than most comparable platforms
  • Public OHLCV free without API keys on every exchange
  • All intervals: 1m, 5m, 15m, 30m, 1h, 4h, 1d, 1w
  • Rate limiting with enableRateLimit=True plus sleep between pages
  • Retry logic: 3 retries with exponential backoff
  • Pagination with deduplication at chunk boundaries
  • BacktestConfig.for_crypto() sets 365 bars/year for correct annualization

Known quirks

  • Kraken uses XBT/USD internally — pass BTC/USD and it works for daily but some intervals may need the native symbol
  • Binance is geo-blocked in some regions (US) — use Kraken, Coinbase, or another exchange
  • Historical depth varies by exchange (~3–5 years for most pairs)
  • Volume is reported in base currency (e.g., BTC not USD) — standard for crypto
§ 03

Options

Options backtesting uses synthetic Black-Scholes pricing from the underlying's realized volatility. This is a deliberate design choice — it lets you test options strategies with just OHLCV data, no options data feed required.

What works well

  • Greeks (delta, gamma, theta, vega, rho) verified against known values to 1e-10 precision
  • Multi-leg positions: verticals, iron condors, straddles, strangles via PositionFactory
  • Margin calculation: Reg-T and simplified portfolio margin
  • IV surface interpolation with cubic / linear / nearest fallback
  • Exercise and assignment handling at expiration
  • Bid-ask spread modeling with configurable width and market impact
  • Real historical chains available via philippdubach (104 tickers, 2008–2025, free)

Design decisions and boundaries

  • European-style pricing · we use Black-Scholes, not binomial trees. For the strategies we're testing (covered calls, spreads), the difference is small. American early exercise matters most for deep-ITM puts near ex-div — an edge case for our use case.
  • Realized vol, not implied vol · we price from the underlying's own volatility, not from the options market. This means our prices don't capture supply/demand effects (skew, smile). For strategy logic testing this is fine; for P&L estimation it's approximate.
  • No dividend tracking · the covered call strategy uses holds_underlying=True to correctly track stock exposure, but we don't model dividend events or ex-div assignment risk.
§ 04

Synthetic data (Monte Carlo + GAN)

Six synthetic-data generators — more than any comparable free framework. Each serves a different testing purpose.

GBM (Geometric Brownian Motion)

  • Purpose · baseline sanity check. No strategy should profit on pure random walks.
  • Normal increments (no fat tails, no vol clustering) — this is by design. GBM is the null hypothesis.
  • If your strategy shows positive Sharpe on GBM, it's overfitting.

Block bootstrap

  • Preserves the real distribution (fat tails, vol clustering) from historical data
  • Autocorrelation preserved within blocks (default size 5), breaks at boundaries
  • Trade-off: larger blocks preserve more structure but reduce scenario diversity

Regime switching (Markov)

  • Three regimes (bull / sideways / bear) with a configurable transition matrix
  • Regime persistence ~2–3 weeks (realistic timescale)
  • Returns are normal within each regime — the regime switching itself creates fat tails in the aggregate distribution

Noise injection

  • Safest source — inherits all properties from the base data
  • Tests graceful degradation: robust strategies degrade smoothly, brittle ones break
  • OHLCV consistency maintained via clamping

cGAN (trained conditional GAN)

  • Generates regime-conditioned paths (bullish, bearish, sideways, crash)
  • Trained on real SPY data from four distinct market periods
  • Trained on 30-step windows — longer paths stitch chunks, so autocorrelation resets every 30 bars. Adequate for regime stress testing; less ideal for multi-week momentum strategies.
  • Crash regime has limited training data (73 bars from COVID March 2020) — generated crashes resemble COVID-style drawdowns specifically
  • Could be improved by retraining on longer windows and more crash data

Heston Monte Carlo

  • Stochastic volatility with mean reversion and leverage effect
  • More realistic than GBM (captures vol clustering and the leverage effect)
  • Parameters are defaults, not calibrated to a specific asset — suitable for general stress testing
§ 05

Backtester engines

Bar engine (fast, simple)

  • Market orders only — fills at next bar's open
  • Suitable for signal-based strategies, daily swing trading, indicator testing
  • SL/TP checks use the current bar's High/Low — standard for bar-based backtesting (same approach as Backtrader, bt, etc.). The event engine provides a more conservative alternative.

Event engine (realistic)

  • Supports market, limit, stop, and OCO bracket orders
  • Orders submitted on bar t become active on bar t+1 (no lookahead)
  • Stop / limit orders can rest across multiple bars
  • Stop-limit orders not supported (limit and stop orders work independently)

Running both engines on the same strategy

One of our unique features. If the bar and event engines agree (typically < 1 pp divergence for market-order strategies), the strategy's edge doesn't depend on execution assumptions. If they diverge, the strategy is sensitive to fill timing — important to know before live trading.

Boundaries (both engines)

  • Single-asset · one symbol per backtest. Portfolio-level backtesting is a future goal.
  • Integer shares · no fractional share support. Minimal impact for stocks > $10.
  • Cash-only · no margin or leverage. Strategies are tested with the capital available.
  • No borrow costs · short selling works but doesn't model borrow fees. For short-heavy strategies, real costs could be 0.5–50%+ annually depending on the stock.
  • No corporate actions · Yahoo Finance data is pre-adjusted for splits/dividends, so historical price charts are correct. But we don't track dividend cash flows or model assignment risk around ex-div dates.
  • Fixed BPS slippage · not volume-aware. Adequate for liquid large-cap equities; may underestimate costs for small-caps or illiquid instruments.

Metrics

  • Sharpe ratio · mean(returns) / std(returns) × √(bars_per_year) with risk-free = 0%. Industry-standard formula used by QuantConnect, Zipline, Backtrader, and most platforms. Relative strategy rankings are always correct.
  • Sortino ratio · same formula using downside deviation (MAR = 0%). Standard implementation.
  • Calmar ratio · annualized return / max drawdown. Correct.
  • All other metrics (profit factor, expectancy, win rate, SQN, Kelly, etc.) are standard calculations.
§ 06

Scorecard

What works

  • Four-page report: bar backtest, Monte Carlo (GBM + GAN + noise), event-driven comparison, grades
  • Auto-runs 280+ backtests (50 GBM, 150 GAN across 3 regimes, 150 noise injection)
  • Automatic letter grades (A–F) on 12 dimensions
  • Dark and light theme support
  • Works on daily and intraday data

Boundaries

  • Monte Carlo page adds ~30–60 s to scorecard generation (280+ scenarios)
  • Options scorecard is single-page — no Monte Carlo or engine comparison for options strategies
  • GAN scenarios use the SPY-trained model — regime characteristics are SPY-specific

§ Built for
  • Strategy validation and comparison (same pipeline, same metrics, fair evaluation).
  • Overfitting detection (GBM noise test, GAN crash test, in/out-of-sample).
  • Execution robustness testing (bar vs event engine divergence).
  • Daily and intraday equity / crypto backtesting.
  • Options strategy prototyping with synthetic pricing.
  • Educational use and rapid prototyping.
§ Outside current scope
  • Live / paper trading (no broker integration).
  • Multi-asset portfolio allocation (single-symbol per backtest).
  • Margin / leverage strategies (cash-only).
  • Futures and forex (no data providers for these).
  • Tick-level data (minimum resolution is 1-minute).