Methodology

Overview of the bi-weekly research pipeline. Open documentation of how factors are constructed, how models are blended, how options strategies are priced, and how every fallback and NaN is logged.

Pipeline Overview

Universe & Data Collection

We select US mega-cap equities from the S&P 100, filtered for sufficient options liquidity. Price, volume, and fundamental data sourced from standard market data providers with daily reconciliation checks.

Cross-Sectional Factor Construction

Multiple cross-sectional factors are computed per stock spanning momentum, mean-reversion, fundamental quality, analyst sentiment, and volatility metrics. Each factor undergoes information coefficient (IC) testing; weak factors are pruned. Remaining factors are winsorized and normalized.

Adaptive Ensemble Estimation

Multiple machine learning models are trained on the factor matrix. Model weights are determined dynamically based on out-of-sample performance and adjusted for the current VIX regime. Shrinkage is applied to blended estimates to reduce overconfidence.

Monte Carlo Simulation & Vol Surface

Large-scale Monte Carlo simulation per stock generates model-implied volatility. This is compared against market-quoted implied volatility to compute a volatility edge metric that informs options strategy selection.

Options Strategy Selection

Multiple options strategies are evaluated per stock and re-priced using an evidence-based spread model trained on real options chain data. Strategies that appear profitable before execution costs but are unprofitable after are eliminated.

Portfolio Construction

Position sizing uses a drawdown-adjusted methodology with sector diversification constraints. Options allocation is capped based on conviction tier. The result is a fully specified portfolio with conviction-tiered overlays.

What Makes This Different

Real Execution Costs

Most backtests assume zero spread costs. We train a spread model on real options chain data and apply half-spread-per-leg costs. This eliminates a significant fraction of apparently profitable strategies.

Regime Awareness

Model behavior changes across VIX regimes. Rather than using fixed weights, our ensemble dynamically adjusts which models receive more influence based on the current volatility environment.

Volatility Edge

Beyond directional estimates, we compare model-implied vol against market-implied vol. This vol-edge metric informs hypothetical strategy selection between directional options and premium-selling strategies.

Conviction Tiers

Model outputs are categorized by estimate strength and model confidence. Higher-conviction outputs receive hypothetical options overlays; lower-conviction outputs are equity-only with smaller position sizes.

Full Pipeline Documentation

The complete pipeline manual — including factor definitions, model hyperparameters, ensemble weights, tier thresholds, spread model specification, and the significance battery (Deflated Sharpe, stationary bootstrap, purged hold-out) — lives in the public repository alongside the code that produces every signal.

Repository

pipeline/MANUAL.md
pipeline/README.md
pipeline/config/pipeline_config.yaml

Reproducibility

Pinned lockfile (requirements-lock.txt)
Per-run artifacts under pipeline/output/<run_id>/
Permanent snapshots

Honesty

Every run writes significance.json
Live-vs-backtest tracker
All fallbacks logged, all NaNs preserved

This methodology document describes quantitative research techniques for educational purposes only. It is NOT investment advice and should not be used as the basis for any investment decision. Model outputs are hypothetical and past performance is not indicative of future results.