Out-of-Sample Performance

Last refreshed2026-04-29OOS panel: 19 months

Every number on this page comes directly from the most recent bi-weekly pipeline run. No synthesized backtest, no carried-forward figures. The panel below is short — this is a live research process with a small OOS window, not a polished marketing backtest.

HYPOTHETICAL PERFORMANCE RESULTS have many inherent limitations. No representation is being made that any account will or is likely to achieve profits or losses similar to those shown. Past performance, whether actual or hypothetical, is not indicative of future results.

Read full disclaimer
Significance Verdict
FAILNo test clears at the 5% level.
19 OOS months is a short sample. The battery (Deflated Sharpe, stationary bootstrap, purged hold-out) is how we disclose statistical weakness — not how we hide it.
Scope of the backtest

Every CAGR, Sharpe, MaxDD, and Calmar number below is from a hypothetical long-short equities portfolio — long the top-quintile of predictions, short the bottom-quintile, equally weighted, monthly rebalance. No leverage, no stock borrow cost, no slippage.

The options overlay shown per ticker on the Signal Dashboard (optROC, atmROC, otm20ROC, putCrROC) is a forward Monte Carlo from today's IV surface — not a historical backtest.

We attempted a real historical backtest against IBKR's historical options API on 2026-04-24. It was structurally blocked: IBKR's contract-resolution endpoint does not reliably return already-expired option contracts even with includeExpired=True, so we could not retrieve premiums for the historical entry dates that the walk-forward OOS panel needs. The options overlay is therefore a forward-only track: the first realised P&L row lands on live_vs_backtest.csv around 2026-05-23, and a full year of forward data is complete by 2027-04. No historical options CAGR / Sharpe / MaxDD is published here until forward data is deep enough to be statistically meaningful.

Walk-Forward OOS Panel — Equities L/S

19
Months OOS
570 stock-month obs.
0.027
Mean IC
t = 0.41
1.86%
L/S Monthly
Q5 − Q1 spread, eq-wt
0.74
L/S Sharpe
Annualized
53.7%
Hit Rate
Sign-of-return accuracy
0.1123
RMSE
Prediction error
0.0780
MAE · ridge
Per-model OOS
0.0778
MAE · enet
Per-model OOS

Monthly OOS Information Coefficient

19 months · mean = 0.027
0+0.31-0.310.003M10.158M20.147M30.126M4-0.242M5-0.329M60.177M70.238M8-0.191M9-0.194M10-0.397M110.496M120.298M13-0.269M14-0.291M150.003M16-0.140M170.295M180.626M19

Information Coefficient = rank correlation between predicted 1-month forward return and realised cross-sectional return. Positive bars mean the ranking was directionally useful that month; negative bars mean it was not. With 19 data points, individual months carry little statistical weight.

Hypothetical L/S Equities Portfolio

Long top-quintile / short bottom-quintile of the model's cross-sectional ranking, equally weighted within each leg, rebalanced monthly. No leverage, no execution costs, no stock borrow cost. Stocks only — the options overlay is not in these numbers. Point estimates only; with 19 OOS months the confidence intervals on every number below are wide.

19.93%
CAGR
Annualised, n months
-17.58%
Max Drawdown
Peak-to-trough on cum. path
0.74
Sharpe
Annualised, eq-wt L/S
1.13
Calmar
CAGR / |MaxDD|
33.34%
Total Return
Cumulative over 19 mo
1.86%
Mean Monthly
σ = 8.68%
19
Months
n = 570 obs
FAIL
Verdict
Significance battery

Cumulative Path

start = 1.00
1.000.861.101.33t₀2026-03-31

This path is the live walk-forward, not a smoothed backtest. Treat the headline numbers as point estimates with wide confidence intervals — the bootstrap 95% CI on Sharpe spans [−0.37, +1.62]. The significance battery below shows DSR and stationary bootstrap still failing at 5% with n=42.

Regime-Conditional Performance

Point-in-time VIX
RegimeMonths% of panelMean monthly L/SAnnualised SharpeHit rate
Moderate19100%+1.86%+0.7453%

VIX regime at the end of each prediction month (not just the current snapshot). The headline Sharpe is a weighted average across these regimes — forward-looking expected Sharpe depends on which regime you're actually in. If the next 12 months are mostly Moderate, the realised Sharpe will be closer to the Moderate bucket's number than to the panel-wide headline.

Significance Battery

Deflated Sharpe

Fail
p-value1.0000
SR (ann.)0.32
SR threshold1.90
trials assumed20

Bailey & López de Prado (2014). Corrects the Sharpe ratio for multiple-testing bias across the search space.

Stationary Bootstrap

Fail
p-value0.6317
95% CI[-0.087, 0.148]
resamples2000

Politis & Romano (1994). Two-sided test for mean monthly IC ≠ 0, robust to autocorrelation.

Purged Hold-Out

Inconsistent
hold-out3 months
full IC0.027
trimmed IC-0.017
magnitude ratio-0.616

CPCV-lite: drop the last k months, re-run the walk-forward, and require the surviving IC to match the full-panel sign and ≥ 50% of its magnitude.

Factor Information Coefficients

4 kept · 7 dropped
FactorMean ICt-statp-valuen monthsSignificantKept
low_vol_60d-0.112-1.700.10125no
beta_residual+0.1021.840.07925yes
sector_neutral_momentum+0.0921.660.11025no
momentum_12_1+0.0881.380.17925no
price_acceleration-0.033-0.580.56925no
volume_trend+0.0310.770.44725no
ret_5d-0.030-0.550.58825no
momentum_3_1+0.0280.470.64125no
volume_shock-0.022-0.530.60325no
short_term_reversal-0.022-0.370.71125no
volatility_ratio+0.0210.740.46825no

Each factor is tested cross-sectionally per month over a 25-month panel. “Kept” factors survive the IC screen (p < 0.10 or |mean IC| ≥ 0.05) and enter the ensemble. “Significant” means p < 0.05 from a t-test on the monthly IC series. At this sample size, expect most factors to fall short of individual significance even when the ensemble has predictive value.

Note on low_vol_60d: the factor is defined as the negative of 60-day realised vol so that, under the classical low-vol anomaly, high factor values would map to low-vol stocks and positive forward returns. The panel IC over 2022–2026 came out negative — i.e. in this mega-cap sample high-vol names outperformed, driven by the post-2022 AI tech rally. The ensemble learns the realised sign from the panel; the IC screen kept the factor on the magnitude criterion (|IC| ≥ 0.05), not the direction criterion. Treat this as regime-specific: in a future period where low-vol mean-reverts to the historical anomaly, the ensemble would need to retrain.

Current Ensemble

Model Weights

ridge
0.334
enet
0.335
gb
0.331

Source: inverse_oos_mae

Regime & Universe

RegimeModerate
Universe size30 tickers
PipelineV12
Last refreshed2026-04-29

See the current signal snapshot, or read the methodology behind every stage.