What it does well — and where it doesn't.
What this framework does well, where it has boundaries, and why certain trade-offs were made. Published in full on the site because the right way to use any backtester is with a clear-eyed view of what it can and can't tell you.
The authoritative version lives alongside the code on GitHub ↗. If anything here disagrees with the repo, the repo wins.
Data providers
Twelve equity providers, crypto via CCXT (110+ exchanges), and options via philippdubach. Yahoo Finance and CCXT are the primary free-tier providers most users will use.
Yahoo Finance (primary, free, no key)
- Supports daily and intraday (1m, 5m, 15m, 1h)
- Auto-adjusted for splits and dividends
- Intraday limits: 1m data ~8 days, 5m/15m ~60 days, 1h ~730 days
- Daily data: 20+ years
- No API key required
- Wraps yfinance (unofficial library) — could break if Yahoo changes their site, but yfinance is actively maintained and widely used
Other equity providers
Each has different free-tier limits — documented in the provider docstrings:
- Tiingo · 20+ years, 1000 req/day — best depth
- Polygon · ~2 years free, implements pagination and rate-limit sleep
- Alpaca · 5–6 years, 200 req/min, paginated
- Alpha Vantage · ~5 months on free compact mode
- Twelve Data, Finnhub, FMP, MarketStack · Various limits
Design decision: no pre-validation of tickers
We let the API return an error rather than maintaining a ticker list that goes stale. The trade-off is slightly less helpful error messages in exchange for zero maintenance.
Crypto (CCXT)
What works well
- 110+ exchanges — far more than most comparable platforms
- Public OHLCV free without API keys on every exchange
- All intervals: 1m, 5m, 15m, 30m, 1h, 4h, 1d, 1w
- Rate limiting with enableRateLimit=True plus sleep between pages
- Retry logic: 3 retries with exponential backoff
- Pagination with deduplication at chunk boundaries
- BacktestConfig.for_crypto() sets 365 bars/year for correct annualization
Known quirks
- Kraken uses XBT/USD internally — pass BTC/USD and it works for daily but some intervals may need the native symbol
- Binance is geo-blocked in some regions (US) — use Kraken, Coinbase, or another exchange
- Historical depth varies by exchange (~3–5 years for most pairs)
- Volume is reported in base currency (e.g., BTC not USD) — standard for crypto
Options
Options backtesting uses synthetic Black-Scholes pricing from the underlying's realized volatility. This is a deliberate design choice — it lets you test options strategies with just OHLCV data, no options data feed required.
What works well
- Greeks (delta, gamma, theta, vega, rho) verified against known values to 1e-10 precision
- Multi-leg positions: verticals, iron condors, straddles, strangles via PositionFactory
- Margin calculation: Reg-T and simplified portfolio margin
- IV surface interpolation with cubic / linear / nearest fallback
- Exercise and assignment handling at expiration
- Bid-ask spread modeling with configurable width and market impact
- Real historical chains available via philippdubach (104 tickers, 2008–2025, free)
Design decisions and boundaries
- European-style pricing · we use Black-Scholes, not binomial trees. For the strategies we're testing (covered calls, spreads), the difference is small. American early exercise matters most for deep-ITM puts near ex-div — an edge case for our use case.
- Realized vol, not implied vol · we price from the underlying's own volatility, not from the options market. This means our prices don't capture supply/demand effects (skew, smile). For strategy logic testing this is fine; for P&L estimation it's approximate.
- No dividend tracking · the covered call strategy uses holds_underlying=True to correctly track stock exposure, but we don't model dividend events or ex-div assignment risk.
Synthetic data (Monte Carlo + GAN)
Six synthetic-data generators — more than any comparable free framework. Each serves a different testing purpose.
GBM (Geometric Brownian Motion)
- Purpose · baseline sanity check. No strategy should profit on pure random walks.
- Normal increments (no fat tails, no vol clustering) — this is by design. GBM is the null hypothesis.
- If your strategy shows positive Sharpe on GBM, it's overfitting.
Block bootstrap
- Preserves the real distribution (fat tails, vol clustering) from historical data
- Autocorrelation preserved within blocks (default size 5), breaks at boundaries
- Trade-off: larger blocks preserve more structure but reduce scenario diversity
Regime switching (Markov)
- Three regimes (bull / sideways / bear) with a configurable transition matrix
- Regime persistence ~2–3 weeks (realistic timescale)
- Returns are normal within each regime — the regime switching itself creates fat tails in the aggregate distribution
Noise injection
- Safest source — inherits all properties from the base data
- Tests graceful degradation: robust strategies degrade smoothly, brittle ones break
- OHLCV consistency maintained via clamping
cGAN (trained conditional GAN)
- Generates regime-conditioned paths (bullish, bearish, sideways, crash)
- Trained on real SPY data from four distinct market periods
- Trained on 30-step windows — longer paths stitch chunks, so autocorrelation resets every 30 bars. Adequate for regime stress testing; less ideal for multi-week momentum strategies.
- Crash regime has limited training data (73 bars from COVID March 2020) — generated crashes resemble COVID-style drawdowns specifically
- Could be improved by retraining on longer windows and more crash data
Heston Monte Carlo
- Stochastic volatility with mean reversion and leverage effect
- More realistic than GBM (captures vol clustering and the leverage effect)
- Parameters are defaults, not calibrated to a specific asset — suitable for general stress testing
Backtester engines
Bar engine (fast, simple)
- Market orders only — fills at next bar's open
- Suitable for signal-based strategies, daily swing trading, indicator testing
- SL/TP checks use the current bar's High/Low — standard for bar-based backtesting (same approach as Backtrader, bt, etc.). The event engine provides a more conservative alternative.
Event engine (realistic)
- Supports market, limit, stop, and OCO bracket orders
- Orders submitted on bar t become active on bar t+1 (no lookahead)
- Stop / limit orders can rest across multiple bars
- Stop-limit orders not supported (limit and stop orders work independently)
Running both engines on the same strategy
One of our unique features. If the bar and event engines agree (typically < 1 pp divergence for market-order strategies), the strategy's edge doesn't depend on execution assumptions. If they diverge, the strategy is sensitive to fill timing — important to know before live trading.
Boundaries (both engines)
- Single-asset · one symbol per backtest. Portfolio-level backtesting is a future goal.
- Integer shares · no fractional share support. Minimal impact for stocks > $10.
- Cash-only · no margin or leverage. Strategies are tested with the capital available.
- No borrow costs · short selling works but doesn't model borrow fees. For short-heavy strategies, real costs could be 0.5–50%+ annually depending on the stock.
- No corporate actions · Yahoo Finance data is pre-adjusted for splits/dividends, so historical price charts are correct. But we don't track dividend cash flows or model assignment risk around ex-div dates.
- Fixed BPS slippage · not volume-aware. Adequate for liquid large-cap equities; may underestimate costs for small-caps or illiquid instruments.
Metrics
- Sharpe ratio · mean(returns) / std(returns) × √(bars_per_year) with risk-free = 0%. Industry-standard formula used by QuantConnect, Zipline, Backtrader, and most platforms. Relative strategy rankings are always correct.
- Sortino ratio · same formula using downside deviation (MAR = 0%). Standard implementation.
- Calmar ratio · annualized return / max drawdown. Correct.
- All other metrics (profit factor, expectancy, win rate, SQN, Kelly, etc.) are standard calculations.
Scorecard
What works
- Four-page report: bar backtest, Monte Carlo (GBM + GAN + noise), event-driven comparison, grades
- Auto-runs 280+ backtests (50 GBM, 150 GAN across 3 regimes, 150 noise injection)
- Automatic letter grades (A–F) on 12 dimensions
- Dark and light theme support
- Works on daily and intraday data
Boundaries
- Monte Carlo page adds ~30–60 s to scorecard generation (280+ scenarios)
- Options scorecard is single-page — no Monte Carlo or engine comparison for options strategies
- GAN scenarios use the SPY-trained model — regime characteristics are SPY-specific
- Strategy validation and comparison (same pipeline, same metrics, fair evaluation).
- Overfitting detection (GBM noise test, GAN crash test, in/out-of-sample).
- Execution robustness testing (bar vs event engine divergence).
- Daily and intraday equity / crypto backtesting.
- Options strategy prototyping with synthetic pricing.
- Educational use and rapid prototyping.
- Live / paper trading (no broker integration).
- Multi-asset portfolio allocation (single-symbol per backtest).
- Margin / leverage strategies (cash-only).
- Futures and forex (no data providers for these).
- Tick-level data (minimum resolution is 1-minute).