Methodology
Overview of the bi-weekly research pipeline. Open documentation of how factors are constructed, how models are blended, how options strategies are priced, and how every fallback and NaN is logged.
Pipeline Overview
Universe & Data Collection
We select US mega-cap equities from the S&P 100, filtered for sufficient options liquidity. Price, volume, and fundamental data sourced from standard market data providers with daily reconciliation checks.
Cross-Sectional Factor Construction
Multiple cross-sectional factors are computed per stock spanning momentum, mean-reversion, fundamental quality, analyst sentiment, and volatility metrics. Each factor undergoes information coefficient (IC) testing; weak factors are pruned. Remaining factors are winsorized and normalized.
Adaptive Ensemble Estimation
Multiple machine learning models are trained on the factor matrix. Model weights are determined dynamically based on out-of-sample performance and adjusted for the current VIX regime. Shrinkage is applied to blended estimates to reduce overconfidence.
Monte Carlo Simulation & Vol Surface
Large-scale Monte Carlo simulation per stock generates model-implied volatility. This is compared against market-quoted implied volatility to compute a volatility edge metric that informs options strategy selection.
Options Strategy Selection
Multiple options strategies are evaluated per stock and re-priced using an evidence-based spread model trained on real options chain data. Strategies that appear profitable before execution costs but are unprofitable after are eliminated.
Portfolio Construction
Position sizing uses a drawdown-adjusted methodology with sector diversification constraints. Options allocation is capped based on conviction tier. The result is a fully specified portfolio with conviction-tiered overlays.
What Makes This Different
Real Execution Costs
Most backtests assume zero spread costs. We train a spread model on real options chain data and apply half-spread-per-leg costs. This eliminates a significant fraction of apparently profitable strategies.
Regime Awareness
Model behavior changes across VIX regimes. Rather than using fixed weights, our ensemble dynamically adjusts which models receive more influence based on the current volatility environment.
Volatility Edge
Beyond directional estimates, we compare model-implied vol against market-implied vol. This vol-edge metric informs hypothetical strategy selection between directional options and premium-selling strategies.
Conviction Tiers
Model outputs are categorized by estimate strength and model confidence. Higher-conviction outputs receive hypothetical options overlays; lower-conviction outputs are equity-only with smaller position sizes.
Full Pipeline Documentation
The complete pipeline manual — including factor definitions, model hyperparameters, ensemble weights, tier thresholds, spread model specification, and the significance battery (Deflated Sharpe, stationary bootstrap, purged hold-out) — lives in the public repository alongside the code that produces every signal.
Repository
- pipeline/MANUAL.md
- pipeline/README.md
- pipeline/config/pipeline_config.yaml
Reproducibility
- Pinned lockfile (requirements-lock.txt)
- Per-run artifacts under pipeline/output/<run_id>/
- Permanent snapshots
Honesty
- Every run writes significance.json
- Live-vs-backtest tracker
- All fallbacks logged, all NaNs preserved
This methodology document describes quantitative research techniques for educational purposes only. It is NOT investment advice and should not be used as the basis for any investment decision. Model outputs are hypothetical and past performance is not indicative of future results.