Out-of-Sample Performance

Last refreshed2026-04-29OOS panel: 19 months

Every number on this page comes directly from the most recent bi-weekly pipeline run. No synthesized backtest, no carried-forward figures. The panel below is short — this is a live research process with a small OOS window, not a polished marketing backtest.

HYPOTHETICAL PERFORMANCE RESULTS have many inherent limitations. No representation is being made that any account will or is likely to achieve profits or losses similar to those shown. Past performance, whether actual or hypothetical, is not indicative of future results.

Read full disclaimer

Significance Verdict

FAILNo test clears at the 5% level.

19 OOS months is a short sample. The battery (Deflated Sharpe, stationary bootstrap, purged hold-out) is how we disclose statistical weakness — not how we hide it.

Scope of the backtest

Every CAGR, Sharpe, MaxDD, and Calmar number below is from a hypothetical long-short equities portfolio — long the top-quintile of predictions, short the bottom-quintile, equally weighted, monthly rebalance. No leverage, no stock borrow cost, no slippage.

The options overlay shown per ticker on the Signal Dashboard (optROC, atmROC, otm20ROC, putCrROC) is a forward Monte Carlo from today's IV surface — not a historical backtest.

We attempted a real historical backtest against IBKR's historical options API on 2026-04-24. It was structurally blocked: IBKR's contract-resolution endpoint does not reliably return already-expired option contracts even with includeExpired=True, so we could not retrieve premiums for the historical entry dates that the walk-forward OOS panel needs. The options overlay is therefore a forward-only track: the first realised P&L row lands on live_vs_backtest.csv around 2026-05-23, and a full year of forward data is complete by 2027-04. No historical options CAGR / Sharpe / MaxDD is published here until forward data is deep enough to be statistically meaningful.

Walk-Forward OOS Panel — Equities L/S

Months OOS

570 stock-month obs.

0.027

Mean IC

t = 0.41

1.86%

L/S Monthly

Q5 − Q1 spread, eq-wt

0.74

L/S Sharpe

Annualized

53.7%

Hit Rate

Sign-of-return accuracy

0.1123

RMSE

Prediction error

0.0780

MAE · ridge

Per-model OOS

0.0778

MAE · enet

Per-model OOS

Monthly OOS Information Coefficient

19 months · mean = 0.027

Information Coefficient = rank correlation between predicted 1-month forward return and realised cross-sectional return. Positive bars mean the ranking was directionally useful that month; negative bars mean it was not. With 19 data points, individual months carry little statistical weight.

Hypothetical L/S Equities Portfolio

Long top-quintile / short bottom-quintile of the model's cross-sectional ranking, equally weighted within each leg, rebalanced monthly. No leverage, no execution costs, no stock borrow cost. Stocks only — the options overlay is not in these numbers. Point estimates only; with 19 OOS months the confidence intervals on every number below are wide.

19.93%

CAGR

Annualised, n months

-17.58%

Max Drawdown

Peak-to-trough on cum. path

0.74

Sharpe

Annualised, eq-wt L/S

1.13

Calmar

CAGR / |MaxDD|

33.34%

Total Return

Cumulative over 19 mo

1.86%

Mean Monthly

σ = 8.68%

Months

n = 570 obs

FAIL

Verdict

Significance battery

Cumulative Path

start = 1.00

This path is the live walk-forward, not a smoothed backtest. Treat the headline numbers as point estimates with wide confidence intervals — the bootstrap 95% CI on Sharpe spans [−0.37, +1.62]. The significance battery below shows DSR and stationary bootstrap still failing at 5% with n=42.

Regime-Conditional Performance

Point-in-time VIX

Regime	Months	% of panel	Mean monthly L/S	Annualised Sharpe	Hit rate
Moderate	19	100%	+1.86%	+0.74	53%

VIX regime at the end of each prediction month (not just the current snapshot). The headline Sharpe is a weighted average across these regimes — forward-looking expected Sharpe depends on which regime you're actually in. If the next 12 months are mostly Moderate, the realised Sharpe will be closer to the Moderate bucket's number than to the panel-wide headline.

Significance Battery

Deflated Sharpe

Fail

p-value1.0000

SR (ann.)0.32

SR threshold1.90

trials assumed20

Bailey & López de Prado (2014). Corrects the Sharpe ratio for multiple-testing bias across the search space.

Stationary Bootstrap

Fail

p-value0.6317

95% CI[-0.087, 0.148]

resamples2000

Politis & Romano (1994). Two-sided test for mean monthly IC ≠ 0, robust to autocorrelation.

Purged Hold-Out

Inconsistent

hold-out3 months

full IC0.027

trimmed IC-0.017

magnitude ratio-0.616

CPCV-lite: drop the last k months, re-run the walk-forward, and require the surviving IC to match the full-panel sign and ≥ 50% of its magnitude.

Factor Information Coefficients

4 kept · 7 dropped

Factor	Mean IC	t-stat	p-value	n months	Significant	Kept
low_vol_60d	-0.112	-1.70	0.101	25	no	✓
beta_residual	+0.102	1.84	0.079	25	yes	✓
sector_neutral_momentum	+0.092	1.66	0.110	25	no	✓
momentum_12_1	+0.088	1.38	0.179	25	no	✓
price_acceleration	-0.033	-0.58	0.569	25	no	✗
volume_trend	+0.031	0.77	0.447	25	no	✗
ret_5d	-0.030	-0.55	0.588	25	no	✗
momentum_3_1	+0.028	0.47	0.641	25	no	✗
volume_shock	-0.022	-0.53	0.603	25	no	✗
short_term_reversal	-0.022	-0.37	0.711	25	no	✗
volatility_ratio	+0.021	0.74	0.468	25	no	✗

Each factor is tested cross-sectionally per month over a 25-month panel. “Kept” factors survive the IC screen (p < 0.10 or |mean IC| ≥ 0.05) and enter the ensemble. “Significant” means p < 0.05 from a t-test on the monthly IC series. At this sample size, expect most factors to fall short of individual significance even when the ensemble has predictive value.

Note on low_vol_60d: the factor is defined as the negative of 60-day realised vol so that, under the classical low-vol anomaly, high factor values would map to low-vol stocks and positive forward returns. The panel IC over 2022–2026 came out negative — i.e. in this mega-cap sample high-vol names outperformed, driven by the post-2022 AI tech rally. The ensemble learns the realised sign from the panel; the IC screen kept the factor on the magnitude criterion (|IC| ≥ 0.05), not the direction criterion. Treat this as regime-specific: in a future period where low-vol mean-reverts to the historical anomaly, the ensemble would need to retrain.

Current Ensemble

Model Weights

ridge

0.334

enet

0.335

0.331

Source: inverse_oos_mae

Regime & Universe

RegimeModerate

Universe size30 tickers

PipelineV12

Last refreshed2026-04-29

See the current signal snapshot, or read the methodology behind every stage.

Signal Dashboard Methodology