AutoTrader - Dave Liu

The Story

AutoTrader started the way a lot of side projects do: with a question I couldn't let go of. I'd been working with recommendation systems at my day job, and it struck me that the core problem—predicting what a person will want next based on noisy, incomplete signals—isn't that different from predicting where a stock will move next based on noisy, incomplete market data.

So I started small. A few tickers, a basic feature set, a model that ran on my M3 laptop. But as I dug in, the scope grew naturally. A single ticker needed multi-timeframe analysis. Multi-timeframe analysis needed richer features. Richer features needed a real data pipeline. A real data pipeline needed cloud infrastructure. And before long, I was building a system that ingests data for 600+ tickers every night, engineers 400+ features from eight distinct sources, trains over 1,800 models, and delivers ranked predictions to subscribers before the opening bell.

Every component was designed and built by me from scratch. It runs autonomously on Google Cloud for about $55 a month, and it's become the most technically satisfying project I've worked on—a place where I get to combine ML modeling, data engineering, infrastructure design, and product thinking all in one system.

1,800+ Trained Models

400+ Engineered Features

600+ Tickers Covered

~$55/mo Total Infrastructure Cost

System Architecture

The system is split across two GCP virtual machines, coordinated through Google Cloud Storage and PostgreSQL. The separation isn't arbitrary: feature engineering is I/O-bound (lots of API calls and database writes), while model training is CPU-bound (lots of number crunching). Putting them on different VMs means I can right-size each machine's resources without overpaying for either workload.

Everything is orchestrated by cron jobs that hand off data downstream in sequence. There are no manual steps in the daily workflow—from raw market data to delivered email predictions, the system runs end-to-end without intervention.

Daily Workflow (EST)

12:10 AM

Data Collection

VM2 pulls OHLCV data from 600+ tickers via the EODHD API, computes 400+ features per ticker, and writes everything to PostgreSQL and GCS. Takes about 45–60 minutes.

3:00 AM (Sat)

Comprehensive Training

VM3 retrains all ~1,800 models with Optuna hyperparameter optimization and walk-forward validation. Incremental training (100–200 models) runs on weekdays.

5:00 AM

Inference

VM3 loads every active model and generates predictions for the upcoming trading day. Each prediction combines a directional call with a magnitude estimate and a confidence score.

5:30 AM

Email Delivery

Tiered emails go out to subscribers with ranked predictions, market sentiment context, and analysis reports—all before the 9:30 AM open.

Infrastructure

VM2: Feature Engineering

2 vCPU, 8 GB RAM · $15/mo

EODHD API data collection
Feature computation (400+)
VWAP label generation
Auxiliary data (PCR, VIX, sentiment)

GCS + PostgreSQL

Shared data layer · $0.30/mo

Raw OHLCV data & feature store
Trained model artifacts
Model registry & predictions
Subscriber management

VM3: Training & Inference

4 vCPU, 16 GB RAM · $40/mo

Dual model training (XGBoost)
Optuna hyperparameter search
Daily prediction generation
Tiered email delivery

Data flow: EODHD API → VM2 (collect & engineer) → GCS / PostgreSQL (store) → VM3 (train & predict) → Subscribers (email)

Pipeline Details

Click any section below to expand or collapse it.

Data Collection & Feature Engineering ▼

This is where the raw ingredients come from. Every night, the pipeline pulls fresh market data from the EODHD API for all S&P 500 constituents plus 184 ETFs, then transforms that data into a rich set of 400+ engineered features. The features come from eight distinct sources, each capturing a different dimension of market behavior.

How It Works

Check daily API usage against the 100K call limit (and queue any overage for backfill)
Prioritize high-signal tickers (SPY, QQQ, AAPL processed first)
Fetch OHLCV data across three timeframes (daily, weekly, monthly)
Store raw data to both PostgreSQL and GCS for redundancy
Run the HybridFeatureComputationPipeline to compute 400+ features per ticker
Collect auxiliary signals: CBOE Put/Call ratio, CNN Fear & Greed Index, ApeWisdom social sentiment, Forex Factory economic calendar
Compute VWAP-based labels that serve as training targets for downstream models

Feature Sources (8 Families)

TA-Lib Technical Indicators — ~200
Trend (EMA, MACD, ADX), momentum (RSI, Stochastic), volatility (ATR, Bollinger), volume (OBV, CMF), plus 61 candlestick patterns

EODHD Fundamentals — ~100
Valuation ratios (P/E, P/B, P/S), dividend yield, EPS estimates, and news sentiment polarity scores

Cross-Industry Signals — ~100
Sector rotation strength, defensive vs. cyclical momentum, market breadth, and capitulation detection

Kinematics — ~80
Derivatives of price movement: velocity, acceleration, jerk, turning point patterns, and momentum extremes

All-Time-High Analysis — ~45
Days since ATH, drought severity and frequency, statistical significance of proximity to highs

Enhanced VWAP — ~38
Calendar and event flags (FOMC meetings, earnings dates, options expiration), institutional activity signals

Auxiliary & Sentiment — ~30
Economic calendar events, Fear & Greed index, Reddit trending stocks (ApeWisdom), CBOE Put/Call ratios

Intraday — ~20
5-minute VWAP, intraday momentum profiles, hour-of-day effects

Model Training ▼

The core insight behind the training architecture is that direction and magnitude are fundamentally different prediction tasks and benefit from being modeled separately. Every ticker/timeframe combination gets two XGBoost models: a classifier that predicts whether the stock goes up or down, and a regressor that predicts by how much.

Dual Model Architecture

Training two models per ticker lets each be optimized for what it's best at:

Classifier (XGBClassifier): Predicts direction (bullish or bearish). Optimized on AUC. Trained on filtered data with low-movement noise days removed (bottom 20% by absolute VWAP move), so it focuses on days when the market is actually making a call.
Regressor (XGBRegressor): Predicts magnitude (% expected move). Optimized on directional accuracy. Trained on the full dataset to capture the complete range of outcomes, including quiet days.

At inference time, the two predictions are combined:
signal = (direction * 2 - 1) * magnitude
confidence = probability * (1 + magnitude)

This produces a single confidence-ranked score that captures both conviction and expected size of the move.

Training Process

Priority queue: Models are queued by strategy (worst-performing first, new tickers first, etc.) so training time is spent where it has the most impact
Data loading: Features and VWAP labels pulled from PostgreSQL/GCS
Noise filtering: Bottom 20th percentile of movement days removed for classifier training
Walk-forward validation: 5 expanding-window folds that respect temporal ordering (no future data leakage)
Hyperparameter optimization: Optuna runs 30–50 trials per model, searching across learning rate, max depth, subsample, and regularization
Evaluation: AUC, directional accuracy, and separate bullish/bearish accuracy tracked per fold
Model storage: Artifacts saved to GCS, metadata registered in PostgreSQL
Lifecycle management: Top 3 model versions retained per ticker/timeframe; older versions pruned automatically

Comprehensive training (all ~1,800 models) runs every Saturday and takes 2–4 hours. Incremental training (100–200 models) runs on weekdays in 20–60 minutes, focusing on new tickers and underperformers.

Inference & Prediction ▼

Every weekday morning at 5:00 AM, the inference pipeline loads all active models and generates a prediction for each ticker/timeframe pair. The output is a ranked list of the day's highest-confidence predictions, ready for delivery.

How It Works

Query the model registry for all active models (status = active)
For each ticker/timeframe: load the classifier and regressor from GCS
Load the most recent features for the current prediction date
Generate a direction prediction (bullish/bearish) with probability
Generate a magnitude prediction (% expected move)
Combine into a single confidence-ranked score
Store all predictions in PostgreSQL and upload a snapshot to GCS
Rank by confidence and split into top bullish and top bearish lists

Current Production Scale

937 Predictions Generated Daily

358 Tickers with Active Models

~32/min Prediction Throughput

Email Delivery & Subscriptions ▼

The delivery system takes predictions and wraps them in context: market sentiment, economic calendar events, and analysis reports. Subscribers receive content matched to their tier, delivered as polished HTML emails with optional attachments.

Delivery Workflow

Validate PostgreSQL tunnel connectivity (auto-start if needed)
Check data freshness via TradingDayValidator—trigger a sync if data is stale
Generate or load analysis reports for the current trading day
Load predictions from PostgreSQL
Collect market context: Put/Call ratio, Fear & Greed index, social sentiment (ApeWisdom), Forex Factory economic calendar
Load subscriber list and filter by tier
Render tier-specific HTML emails with appropriate attachments
Send via SMTP with a lock file to prevent duplicate sends
SMS notification to admin on success or failure

Subscription Tiers

Content scales with tier—everyone gets predictions, but the depth of analysis and number of picks increases as you move up.

Tier	Predictions	Analysis	Attachments
Basic	Top 5	Summary	—
Premium	Top 10	Market context	TXT
Professional	All	Full reports	TXT + PDF
Secret	All	Full + econ calendar	TXT + PDF

Design Decisions

A system like this involves hundreds of small choices. Here are the ones that shaped the architecture most significantly—and the reasoning behind each.

Why dual models instead of one?

Early on I tried a single model that predicted signed returns directly. It was mediocre at both direction and magnitude. Splitting the problem into a classifier ("which way?") and a regressor ("how far?") lets each model focus on what it does best. The classifier trains on filtered data with noise days removed; the regressor sees the full distribution. The combined signal is stronger than either alone.

Why walk-forward validation?

Standard K-fold cross-validation would let the model see Tuesday's data while training on Thursday's. In financial data, that's cheating—any time-series pattern, regime change, or structural break gets leaked across the boundary. Walk-forward validation with expanding windows respects temporal ordering, which means the performance estimates I get are realistic rather than flattering.

Why VWAP labels instead of close-to-close returns?

A stock can gap up 2% at the open, sell off all day, and still show a positive daily return. Close-to-close labels miss the intraday story. Volume-Weighted Average Price captures where trading activity actually concentrated, giving a more honest signal about directional intent. It's noisier to compute but produces better labels for training.

Why separate VMs?

Feature engineering spends most of its time waiting on API responses and writing to databases (I/O-bound). Model training spends most of its time in XGBoost's gradient computations (CPU-bound). Running both on a single VM would mean paying for 16 GB of RAM during data collection when I only need 8, or paying for beefy CPUs during the data pipeline when they'd sit idle. The 2-VM split lets me right-size each workload and keeps total costs under $55/month.

Why filter low-movement days for the classifier?

On days when a stock barely moves (less than 0.1%), predicting "up" or "down" is basically a coin flip—and training on those coin flips adds noise without signal. Removing the bottom 20% of movement days by absolute VWAP change lets the classifier focus on days when the market is actually making a directional commitment. The regressor still sees all data, because even small-movement days carry information about magnitude distributions.

Why build everything from scratch?

Partly because I wanted to understand every piece of the system at a level that using off-the-shelf solutions wouldn't give me. But also because the constraints of a personal project—tight budget, single maintainer, zero tolerance for pager fatigue—reward simplicity. Cron jobs, PostgreSQL, and GCS are boring, well-understood technologies. That's the point. I'd rather spend my engineering time on feature research and model architecture than debugging Kubernetes manifests.

Tech Stack

Machine Learning

XGBoost Optuna Scikit-learn TA-Lib FAISS Pandas NumPy SciPy

Data Sources

EODHD API CBOE CNN Fear & Greed ApeWisdom Forex Factory AAII Sentiment

Infrastructure

GCP Compute Engine Google Cloud Storage PostgreSQL SQLite Cron SSH Tunnels

Trading & Delivery

Alpaca API SMTP / Gmail SMS Alerts Stripe

Languages & Tools

Python Bash SQL Git Selenium

What's Next

AutoTrader is a living system—it runs in production daily, but it's also my primary playground for exploring new ideas. A few things on the roadmap:

Ensemble methods: Exploring how to combine predictions across timeframes (daily, weekly, monthly) into a single multi-horizon signal, weighted by each model's recent accuracy.
Transformer-based models: The current XGBoost approach works well on tabular features, but I'm curious whether attention mechanisms over raw price sequences could capture patterns that hand-engineered features miss.
Portfolio optimization: Moving beyond individual ticker predictions to portfolio-level allocation—factoring in correlation, sector exposure, and risk constraints.
Real-time inference: Currently predictions run once daily. Exploring whether intraday feature updates and streaming inference could capture opportunities that the overnight pipeline misses.