COMPLETE TECHNICAL GUIDE

TimesFM 2.5 for Trading:
The Complete Guide

Google's foundation model for time series forecasting — empirically evaluated across 6 trading use cases. What works, what fails, and how to build a production pipeline around it.

AI/ML
TimesFM 2.5
Trading
Quantitative
25 min read

What is TimesFM?

TimesFM (Time Series Foundation Model) is Google Research's pre-trained foundation model for time series forecasting. Released as an open-weight model and available via API, it brings the zero-shot transfer paradigm — so successful in NLP with GPT-class models — to the world of time series data.

The key insight: instead of training a dedicated model for each time series (the classical approach with ARIMA, Prophet, LSTM), TimesFM was pre-trained on a massive corpus of real-world time series data — over 100 billion time points covering finance, weather, energy, retail, and more. It then generalizes to unseen datasets without any fine-tuning.

500M
Parameters
Larger than GPT-2, smaller than GPT-3. Efficient enough to run on commodity hardware.
Zero-Shot
No Fine-Tuning
Feed raw time series, get forecasts + confidence intervals (q10/q50/q90) out of the box.
100B+
Training Points
Pre-trained on a diverse real-world corpus spanning finance, weather, retail, energy, and more.
0.4s
Per Ticker
Fast enough for production pipelines. 10 ETFs in 4.3s, post-screener enrichment viable.

Architecture: Patched Decoder Transformer

TimesFM uses a patched decoder-only transformer architecture — an important departure from encoder-decoder models like PatchTST. Here's what that means in practice:

Component Design Choice Why It Matters for Trading
Input Patching Time series split into fixed-size patches (e.g., 32 points). Each patch = one token. Reduces sequence length dramatically. Handles 512+ lookback bars efficiently.
Decoder-Only Autoregressive generation (like GPT). Predicts next patch conditioned on prior patches. Natural fit for causal time series. No information leakage from the future.
Quantile Heads Outputs multiple quantiles (q10, q50, q90) simultaneously via specialized output heads. Built-in uncertainty quantification. CI bands = actionable TP/SL zones.
Normalization Per-instance normalization before patching. Denormalization after inference. Works on any scale: price in dollars, ATR in %, volume in millions.
Covariates (v2.5) Optional exogenous inputs alongside the target series. Pass sector index alongside individual stock for context enrichment.

The Key Mental Model

Think of TimesFM as a "pre-trained brain for time series" — exactly like how ChatGPT knows grammar without being trained on your specific documents. You give it 150 bars of ATR history, it gives you the next 10 bars of ATR prediction with uncertainty bounds. No custom training, no hyperparameter tuning. That's the entire value proposition.

TimesFM 2.5 vs Prior Versions

Version Key Addition Relevance to Trading
TimesFM 1.0 Original model. Point forecasts + basic quantiles. Baseline zero-shot performance.
TimesFM 2.0 Covariate support. Multi-series batch inference. Sector rotation enabled. Faster pipeline.
TimesFM 2.5 Improved quantile calibration. Better uncertainty on financial data. CI bands more reliable. q10-q90 covers ~80% of actual moves.

6 Use Cases: Empirical Evaluation

We ran an extensive backtest across 120 evaluation points, 15 tickers, and 8 time windows to assess TimesFM's practical utility for trading. Here's the honest scorecard — no vendor marketing, just data.

Use Case Score Verdict Best Application
UC1 — Price Forecasting 6/10 Partial CI bands as TP/SL zones; mega-caps only for direction
UC2 — Volatility (ATR/RVOL) 8/10 Strong Pre-squeeze detection, dynamic position sizing
UC3 — Volume Forecasting 8.5/10 Very Strong Breakout filter, false breakout elimination
UC4 — Earnings / Events 2/10 Fail N/A — exclude earnings windows entirely
UC5 — Multi-Series / Rotation 7.5/10 Good Weekly sector rotation ranking, spread forecasting
UC6 — Setup Scoring 5/10 Partial Multi-factor only (CI_width + vol + sector), never direction alone

The Big Insight Upfront

TimesFM is not a price prediction machine. It's a volatility and volume regime detector with built-in uncertainty quantification. The moment you stop asking "where will the price go?" and start asking "how predictable is the behavior of this series?" — the tool becomes genuinely useful.

UC1: Price Forecasting 6/10

Price forecasting is the obvious first use case — and the most disappointing. Raw directional accuracy across 15 tickers sits at 44% globally, which is worse than a coin flip when you account for transaction costs. However, the story is more nuanced for specific asset types.

Directional Accuracy by Ticker Class

Ticker / Category Directional Accuracy Verdict Notes
SPY 62% Use Most liquid, mean-reverting regime captured well
AMZN 75% Use Best performer. Institutional flow = smoother series
META 75% Use Same profile as AMZN. High market cap, predictable
QQQ, IWM 54–58% Caution Marginal edge. Use CI bands only, not direction
Small/mid-cap equities 38–42% Avoid Below random. Erratic institutional flows
Crypto (BTC, ETH) 41–46% Avoid High sentiment noise, random walk behavior
Global average 44% Skip direction Direction alone is a losing signal at scale

The Real Value: Confidence Interval Bands

Where TimesFM genuinely earns its keep in price forecasting is the CI band output. The q10-q90 bands cover approximately 80% of actual price realizations across our test window. This makes them directly usable as calibrated TP/SL zones.

How to Use CI Bands Correctly

q90 = maximum realistic upside (TP ceiling). A setup that requires price to pierce q90 in 5 days is high-risk by definition — the model says only 10% of outcomes land there.

q10 = stop-loss floor. If price breaks below q10, the move is a genuine outlier — either the thesis is wrong or a catalyst hit.

CI_width = uncertainty proxy. Wide CI (>10% of current price) = reduce position size 50%. Tight CI (<5%) = high-confidence setup, full size.

What NOT to Do

Do NOT use the point forecast (q50) as a price target. The model's confidence score is always 0.95 — completely non-discriminant, a known limitation. CI_width is your real signal. Do NOT apply direction forecasting to anything other than SPY, AMZN, META — everywhere else is noise.

Optimal Parameters for UC1

20 bars
Lookback
20 trading days = 1 month. Best directional accuracy window for mega-caps (67% on SPY at exactly 20d).
5–10d
Horizon
Beyond 10 days, CI bands become too wide to be actionable. Stay within weekly swing trade horizon.

UC2: Volatility Forecasting (ATR / RVOL) 8/10

Volatility is where TimesFM genuinely shines. The reason is structural: volatility is mean-reverting. Periods of high volatility are followed by compression; low-vol squeezes precede expansions. This clustering behavior is exactly what a pattern-recognition model captures well.

Our tests show 67–73% directional accuracy for ATR and RVOL forecasting across all 15 tickers — not just mega-caps. This is consistent, actionable edge.

Key Applications

Pre-Squeeze Detection
RVOL_forecast < RVOL_now × 0.80 signals an imminent volatility compression (squeeze forming). Fire alert for setup scouting.
Expansion Alert
RVOL_forecast > RVOL_now × 1.30 = expect a breakout. Width of ATR CI tells you whether to favor long or short volatility trades.
Dynamic Sizing
ATR_forecast drives Kelly-adjusted position sizing. High ATR forecast = reduce position. Low ATR = full size.
Stop Calibration
Use ATR_forecast (q90) as the stop distance, not historical ATR. Forward-looking stops reduce premature exits by ~15%.

Directional Accuracy by Ticker — Volatility Forecast

Metric ATR Forecast RVOL Forecast Historical Baseline
Directional Accuracy (5d) 71% 68% 52% (rolling avg)
MAPE (Mean Absolute % Error) 12.3% 14.1% 18.7%
CI Coverage (q10-q90) 82% 79% N/A
Squeeze Detection Rate 73% (RVOL_forecast < 0.80× threshold)
Optimal Lookback 150 bars 150 bars

Pre-Squeeze Detection Formula

RVOL_forecast = ForecastRaw(RVOL[-150:], horizon=10).pred_avg

If RVOL_forecast < RVOL_now × 0.80 → squeeze forming, scout for setup
If RVOL_forecast > RVOL_now × 1.30 → breakout incoming, prepare entry

Why Volatility Works When Price Doesn't

Price is a random walk with drift — it has no natural "ceiling" or "floor" in the short term. Volatility, by contrast, is bounded by economic reality: stocks cannot stay in sustained high-vol regimes indefinitely (cost of hedging, risk appetite cycles, central bank reaction functions). This mean-reversion property gives the model a structural edge it simply doesn't have for raw price.

UC3: Volume Forecasting 8.5/10

Volume is TimesFM's strongest use case in trading. With 69% directional accuracy across all 15 tickers, it outperforms both price and volatility forecasting in consistency. The structural reason is identical to volatility: volume is mean-reverting. High-volume days cluster (institutional accumulation/distribution phases) and are followed by normalization.

The Breakout Filter

The most impactful application is as a false breakout eliminator. A technical breakout without volume confirmation is a textbook trap. TimesFM adds a forward-looking layer:

# Post-screener volume enrichment — runs for each retained ticker vol_forecast = ForecastRaw(volume[-150:], horizon=10) pred_avg = vol_forecast.pred_avg # avg predicted volume over next 10d avg20 = mean(volume[-20:]) # 20-day rolling avg volume # Breakout confirmation signal if pred_avg > avg20 * 1.10: label = "Volume Favorable ✅" # +10% above 20d avg = institutional interest elif pred_avg > avg20 * 0.90: label = "Volume Neutral ⚠️" # Within ±10% = watch closely else: label = "Volume Weak ❌" # Low volume forecast = avoid breakout trade

Predicted vs Actual Volume — Sample Window

Backtest Results — Volume Direction Accuracy

Metric Value vs Baseline (Rolling Avg)
Overall directional accuracy (5d) 69% +13pp vs 56% baseline
High-volume days predicted correctly 74% Clusters captured well
Low-volume compression detected 67% Pre-holiday lulls often correct
False breakout filter effectiveness 61% Eliminates ~60% of low-vol traps
MAPE on volume level 16.2% Level forecasts less reliable than direction

Practical Implementation Tip

Apply the volume filter after your screener shortlist is generated — not during. Running ForecastRaw for 50+ tickers per session adds latency. For 10 final candidates, the total cost is ~4 seconds and the signal quality improvement is material: expect 10–15% fewer false setups entering your watchlist.

Why Volume Is Easier to Predict Than Price

Calendar Predictability
OpEx weeks, FOMC days, earnings seasons create predictable volume spikes. The model has learned these patterns.
Mean Reversion
Extreme volume (10×+ ADV) always reverts. Quiet volume also has floors (institutional maintenance). Bounded behavior = predictable.
Institutional Flows
Large programs execute over multiple days in blocks. Volume clustering over 3–5 day windows is a repeating structural pattern.

UC4: Earnings & Event Windows 2/10

This is the model's clearest failure mode. Around earnings announcements and major macro events, TimesFM performs significantly worse than random — directional accuracy drops by 16 percentage points versus non-event periods. The cause is fundamental: earnings create discontinuous jumps that no historical pattern can predict.

Critical Rule: Exclude Earnings Windows

Never run TimesFM forecasts within ±5 trading days of an earnings announcement. The model has no knowledge of the earnings outcome, but its training data contains the price reaction — so it may pattern-match to "stocks usually go up/down before earnings" in ways that are completely unreliable for your specific ticker.

MAPE Comparison: Normal vs Earnings Windows

Period Price MAPE Vol MAPE Directional Acc. Action
Normal (no event) 8.2% 11.1% 52% price / 71% vol Use model
Earnings window (±5d) 24.7% 31.4% 36% price / 44% vol Exclude entirely
Post-earnings (+2d) 9.8% 13.2% 49% / 66% Resume carefully
FOMC week 14.1% 18.9% 44% / 58% Reduce confidence
NFP / CPI day 18.3% 22.4% 41% / 55% CI bands only

Why XReg (Exogenous Regressors) Would Help

The principled fix would be to pass earnings date flags as covariates, allowing the model to "know" that a discontinuity is coming and widen its uncertainty bounds accordingly. This is theoretically possible with TimesFM 2.5's covariate support, but not yet implemented in our MCP integration. The current workaround — exclusion — is more conservative but safer.

Practical Calendar Guard

# Before calling any TimesFM tool, check earnings proximity def is_safe_window(ticker, target_date, earnings_db): next_earnings = earnings_db.get_next(ticker, from_date=target_date) prev_earnings = earnings_db.get_prev(ticker, from_date=target_date) days_to_next = (next_earnings - target_date).days days_from_prev = (target_date - prev_earnings).days # Exclusion zone: ±5 trading days if days_to_next <= 5 or days_from_prev <= 2: return False # Skip TimesFM for this ticker return True

UC5: Multi-Series & Sector Rotation 7.5/10

One of TimesFM's underrated strengths is batch inference across multiple series simultaneously. Instead of forecasting one ticker at a time, you pass 10 sector ETFs in a single call and get back relative rankings. The absolute forecast values matter less than the ranking — which sectors have the best predicted momentum over the next 10 trading days.

Weekly Rotation Pipeline

1

Monday Morning — Batch Forecast

Call Forecast({tickers: 10_SECTOR_ETFS, context: 200, horizon: 10}). Total time: 4.3 seconds. Returns predicted_return_pct for each ETF.

2

Sort by Predicted Return

Top 3 ETFs → long bias for the week. Bottom 3 → avoid or short-side hedges. Middle 4 → neutral, sector-specific catalyst dependent.

3

Spread / Ratio Forecasting

Call ForecastRaw(XLE/SPY[-150:], horizon=10) and ForecastRaw(XLK/SPY[-150:]) to get macro regime signals. Rising XLE/SPY = value/energy rotation.

4

Integrate into Scanner Weighting

Tickers in top-ranked sectors get a +5 point score bonus in the screener output. Bottom-ranked sectors get -10 penalty (structural headwind).

Sector Rotation Rankings — Sample Week

ETF Sector Predicted Return (10d) CI Width Confidence
XLE Energy +3.2% 4.1% High
XLF Financials +2.7% 4.8% High
XLI Industrials +1.9% 5.2% Medium
XLK Technology +0.4% 7.1% Medium
XLU Utilities -1.2% 3.9% High
XLP Cons. Staples -1.8% 4.3% High
XLRE Real Estate -2.4% 5.8% Medium

Relative Ranking Matters More Than Absolute Values

A predicted return of +3.2% for XLE doesn't mean "buy XLE and expect 3.2% gain in 10 days." It means XLE is forecast to outperform XLRE by ~5.6pp over that window. Use it as a relative signal, not an absolute forecast. The model consistently ranks sectors correctly ~75% of weeks even when level forecasts are off.

UC6: Setup Scoring & Confirmation 5/10

Using TimesFM direction forecasts alone for trade confirmation is a losing strategy globally (44% accuracy). The score jumps to useful territory when combined with other signals into a multi-factor scoring system. The model's uncertainty output — not its point forecast — is what makes it valuable here.

The Confidence Trap

Model Confidence = Always 0.95 (Useless)

TimesFM always reports 0.95 confidence regardless of input quality. This is a known model characteristic — do not use it. The real uncertainty signal is CI_width: the distance between q90 and q10 expressed as a percentage of current price. This is what differentiates high-confidence vs uncertain setups.

Multi-Factor Setup Scoring Architecture

# Multi-factor setup scoring using TimesFM signals def score_setup(ticker, screener_score): base_score = screener_score # Technical score from screener (0-100) # Factor 1: CI_width (uncertainty proxy) price_fc = Forecast(ticker, horizon=10) ci_width_pct = (price_fc.q90 - price_fc.q10) / price_fc.q50 * 100 if ci_width_pct < 5: base_score += 10 # High confidence — tight CI elif ci_width_pct > 10: base_score -= 15 # High uncertainty — reduce size # Factor 2: Volume forecast (UC3) vol_fc = ForecastRaw(ticker.volume[-150:], horizon=10) if vol_fc.pred_avg > ticker.avg20_volume * 1.10: base_score += 8 # Volume favorable # Factor 3: Sector coherence (UC5) sector_rank = get_sector_rank(ticker.sector) if sector_rank <= 3: # Top 3 sectors base_score += 5 elif sector_rank >= 8: # Bottom 3 sectors base_score -= 10 # Factor 4: Volatility regime (UC2) rvol_fc = ForecastRaw(ticker.rvol[-150:], horizon=10) if rvol_fc.pred_avg < ticker.rvol_now * 0.80: base_score += 7 # Squeeze forming — pre-breakout return min(base_score, 100)

Score Improvement by Factor Combination

Configuration Win Rate Improvement Setup Count Impact
Direction forecast alone -3pp (worse than no model) No filter — all setups pass
CI_width filter only +4pp -20% setups (eliminates uncertain)
CI_width + Volume (UC3) +9pp -35% setups
CI_width + Volume + Sector rank +13pp -40% setups
Full multi-factor (all 4) +17pp -45% setups (quality over quantity)

Production Architecture

TimesFM runs as a Python FastAPI service on our infrastructure (Nomad/Docker, 16 cores, 27GB RAM). It's exposed via MCP tools that the AI pipeline calls directly. Here's the complete integration architecture.

MCP Tool Reference

MCP Tool Use Case Key Parameters When to Call
Forecast Multi-ticker price + CI bands tickers[], context, horizon Sector rotation (UC5), CI-based TP/SL (UC1)
ForecastRaw Single series (vol, volume, spread) values[], horizon ATR squeeze detection (UC2), volume filter (UC3)
ForecastVix VIX-specific volatility forecast horizon, context Market regime assessment, options positioning
Backtest Historical accuracy evaluation ticker, window, metric Calibrating model expectations per ticker class

Daily Scanner Pipeline

The integration runs post-screener — after the algorithmic screener has already filtered the universe down to 10 A+ candidates. TimesFM enriches each setup, not the full universe.

1

RunAutoScreener + RunScreener DSL

Generates ~30-50 raw candidates from the full universe. Purely technical/quantitative filters.

2

TimesFM Volume Filter [UC3]

ForecastRaw(volume[-150:], horizon=10) for each candidate. Drop tickers where pred_avg < avg20 × 0.90. Eliminates ~30% of candidates.

3

Earnings Calendar Guard [UC4]

Cross-reference each remaining ticker with earnings calendar. Flag or exclude tickers within ±5 days of earnings.

4

Volatility Squeeze Scan [UC2]

ForecastRaw(RVOL[-150:], horizon=10). Tickers with RVOL_forecast < 0.80× current get pre-squeeze flag (+7 score bonus).

5

CI Band Generation [UC1]

Forecast({tickers: final_list, horizon: 10}). Generates q10/q50/q90 bands used as TP/SL levels in the scanner output.

6

Sector Coherence Check [UC5]

Compare each ticker's sector against Monday's rotation ranking. Apply sector bonus/penalty to final score.

7

Final Score + Publication

Multi-factor score computed. Top 10 A+ setups selected. CI bands displayed as TP targets in scanner HTML output.

Weekly Rotation Pipeline

# Every Monday morning — sector rotation forecast SECTOR_ETFS = ["XLK", "XLE", "XLF", "XLI", "XLV", "XLU", "XLP", "XLRE", "XLY", "XLB"] # Step 1: Batch forecast — all 10 ETFs in one call rotation = Forecast(tickers=SECTOR_ETFS, context=200, horizon=10) # Step 2: Sort by predicted return, extract top/bottom ranked = sorted(rotation, key=lambda x: x.predicted_return_pct, reverse=True) top_3_sectors = ranked[:3] # Long bias this week bottom_3 = ranked[-3:] # Avoid / hedge # Step 3: Spread forecasting for macro context xle_spy_spread = ForecastRaw("XLE/SPY"[-150:], horizon=10) xlk_spy_spread = ForecastRaw("XLK/SPY"[-150:], horizon=10) # Step 4: Update scanner sector weights update_scanner_weights(top_3_sectors, bottom_3)

Graceful Degradation

Service Down = Pipeline Continues

The TimesFM service is optional at every integration point. If the FastAPI service is unreachable, each pipeline step has a fallback: volume filter falls back to historical 20d average, CI bands are replaced with ATR-based levels, sector rotation uses last Monday's cached ranking. The scanner publishes regardless — TimesFM is an enrichment layer, not a blocker.

Key Takeaways & Decision Rules

After 120+ evaluation points across 15 tickers and 8 time windows, here is the distilled playbook for TimesFM in trading contexts.

The Master Decision Table

Application
Verdict
CI bands (q10-q90) as TP/SL zones — covers ~80% of actual moves
USE
Volatility forecast (ATR/RVOL) — 67–73% directional accuracy, all tickers
USE
Volume forecast — 69% accuracy, best false breakout filter available
USE
Sector rotation ranking (weekly) — relative ranking 75% accurate week-over-week
USE
CI_width as uncertainty proxy — tight CI = confidence, wide CI = reduce size 50%
USE
Direction forecast for SPY, AMZN, META only — ≥62% accuracy, usable as confirmation filter
PARTIAL
Direction as primary signal (any ticker) — 44% global = worse than random at scale
SKIP
Earnings/event windows (±5d) — -16pp accuracy, exclude completely
SKIP
Model confidence score (always 0.95) — non-discriminant, completely useless
SKIP
Horizon > 10 days — CI bands too wide to be actionable
SKIP
Biotech / small-cap catalytic events — FDA jumps are unpredictable by construction
SKIP

Optimal Parameter Reference

Parameter UC1 Price UC2 Volatility UC3 Volume UC5 Rotation
Lookback (context) 20 bars 150 bars 150 bars 200 bars
Horizon 5–10d max 5–10d 5–10d 10d (weekly)
CI usage q10/q90 = SL/TP q90 = stop distance q50 (point est.) CI_width = confidence
Action threshold CI_width < 5% RVOL < 0.80× pred > avg20 × 1.10 Top 3 / Bottom 3

The Structural Insight

Why Vol and Volume > Price

Price is theoretically a martingale in efficient markets — no exploitable memory. Volatility and volume, by contrast, exhibit structural mean reversion enforced by economic mechanisms: cost of capital constrains sustained high-vol regimes; institutional program trading creates predictable volume clustering over multi-day windows.

TimesFM's pre-training on diverse time series has implicitly learned these mean-reversion patterns. When you use it on vol and volume, you are exploiting a genuine structural regularity. When you use it on price, you're asking it to predict something closer to a random walk — which no model does well systematically.

Getting Started in 3 Steps

1

Start with UC3 (Volume)

Easiest integration, highest accuracy. Add ForecastRaw(volume[-150:], horizon=10) to your post-screener pipeline. Compare pred_avg vs avg20. Drop weak-volume setups. Run this for 2 weeks and measure false breakout rate improvement before adding other factors.

2

Add UC2 (Volatility Squeeze)

Layer in RVOL forecasting. Flag pre-squeeze setups for priority attention. Use ATR_forecast (q90) to set stop distances instead of historical ATR. The improvement in stop placement alone often covers the service cost.

3

Add UC5 (Weekly Rotation)

Run every Monday morning. Takes 4.3 seconds. Integrate sector ranks into your screener scoring. Over time this provides a macro alignment layer that systematically avoids headwind trades.

ForecastVix: The Market Regime Detector

Beyond individual tickers, TimesFM provides a dedicated ForecastVix tool that forecasts the CBOE Volatility Index (VIX) over a 5–10 day horizon. Because VIX is the market-wide fear gauge, its forecast is one of the most valuable macro inputs available for position sizing and regime classification.

VIX Regime Classification

VIX Level Regime Implication for Trading TimesFM Action
< 15 Risk-On Full position sizes, momentum strategies thrive Use full score, no penalty
15–20 Neutral Selective; favor quality setups, watch sector rotation Apply CI_width filter strictly
20–28 Early Risk-Off Reduce sizes 30–50%, prefer defensive sectors Discount direction forecasts further
> 28 Risk-Off Cash is king; only high-conviction setups CI bands only, no direction signal

ForecastVix Integration in the Daily Pipeline

# Every evening before scanner publication vix_fc = ForecastVix(horizon=5) # Fast — single series current_vix = vix_fc.current forecast_vix = vix_fc.q50_mean # Expected VIX in 5 days vix_trend = forecast_vix - current_vix # Regime classification if forecast_vix < 15: regime = "RISK-ON" size_multiplier = 1.0 elif forecast_vix < 20: regime = "NEUTRAL" size_multiplier = 0.80 elif forecast_vix < 28: regime = "EARLY-RISK-OFF" size_multiplier = 0.50 else: regime = "RISK-OFF" size_multiplier = 0.25 # VIX trend signal if vix_trend > +3: alert = "⚠️ VIX RISING — tighten stops across all positions" elif vix_trend < -2: alert = "✅ VIX FALLING — vol compression, favorable for breakouts" # Inject into scanner scoring final_scores = [s * size_multiplier for s in raw_scores]

Using VIX Forecast as a Pre-Filter

If ForecastVix predicts VIX rising above 25 within 5 days, consider delaying any new long entries by 1–2 days until the model's uncertainty resolves. This simple rule reduced drawdown by ~8% in backtests over 2024–2025 by avoiding entries immediately before volatility spikes.

Backtesting Methodology

Producing reliable accuracy numbers for a forecasting model requires careful methodology. Standard pitfalls — look-ahead bias, survivorship bias, overfitting to evaluation windows — are especially treacherous with AI models because their pre-training may include data that overlaps with your "out-of-sample" test period.

Our Evaluation Setup

120
Evaluation Points
Each point = one forecast window on one ticker. Not a cherry-picked sample.
15
Tickers
SPY, QQQ, IWM, AMZN, META, TSLA, NVDA, MSFT, AAPL, XLE, XLF, GLD, TLT, BTC-USD, ETH-USD
8
Time Windows
Spanning Q3 2024 – Q1 2026. Multiple market regimes captured (risk-on, correction, recovery).
Zero
Look-ahead Bias
All forecasts generated using only data available at time T. No future data leaked into the context window.

Directional Accuracy Definition

We define directional accuracy as: the percentage of 5-day windows where the model's q50 forecast correctly predicts whether the closing price at T+5 is higher or lower than the closing price at T. This is the most conservative and practically relevant metric — not whether the magnitude is correct, only the direction.

Metric Definition Why We Use It
Directional Accuracy % windows where sign(forecast - T0) = sign(actual - T0) Directly maps to trade profitability (long/short decisions)
MAPE Mean Absolute Percentage Error on level forecast Measures absolute magnitude accuracy for CI calibration
CI Coverage % of actual values falling inside q10–q90 band Validates whether CI bands are reliable as TP/SL zones
Baseline comparison Rolling 20-day mean as naive forecast Ensures TimesFM actually beats a trivial benchmark

The Backtest MCP Tool

The Backtest MCP tool lets you run standardized accuracy evaluations against a specific ticker and time window directly from the pipeline. This is useful for calibrating per-ticker confidence before deploying forecasts live.

# Calibrate model accuracy on a specific ticker before live use result = Backtest( ticker="AMZN", metric="directional_accuracy", window="2025-01-01:2025-12-31", horizon=10, series_type="close" # or "atr", "volume", "rvol" ) # Returns: { accuracy: 0.74, mape: 0.082, ci_coverage: 0.81, n_windows: 26 } # Run for all series types to find where the model has edge for series_type in ["close", "atr", "volume", "rvol"]: r = Backtest(ticker="AMZN", metric="directional_accuracy", window="2025-01-01:2025-12-31", horizon=10, series_type=series_type) print(f"{series_type}: {r.accuracy:.1%}") # close: 74.2% atr: 71.8% volume: 68.6% rvol: 67.2%

Recommended Pre-Deployment Calibration

Before adding any new ticker to your live scanner pipeline, run Backtest for that ticker on the last 6 months of data across all four series types. If directional accuracy for all four series is below 55%, classify the ticker as "TimesFM-incompatible" and use historical averages only. Biotech, small-cap, and high-beta names typically fall into this category.

Regime-Conditional Accuracy

One of the more surprising findings: model accuracy varies significantly by market regime. The following numbers come from segmenting our 120-point test set by concurrent VIX level:

VIX at Forecast Time Price Dir. Acc. Vol Dir. Acc. Volume Dir. Acc. Interpretation
VIX < 15 (calm) 51% 74% 72% Vol/volume predictable, price random walk
VIX 15–20 (normal) 48% 70% 68% Similar profile, slightly lower vol accuracy
VIX 20–28 (elevated) 44% 67% 62% Volume noisier in stressed markets
VIX > 28 (crisis) 38% 58% 55% All signals degrade. Tail-risk dominates.

The lesson is clear: TimesFM's edge is most pronounced in low-to-normal volatility regimes. When VIX exceeds 28, the model's structural patterns are overwhelmed by discontinuous macro shocks and you should fall back to wider, historically-calibrated CI estimates.

API Integration Guide

TimesFM at DailyTickers is exposed as a set of MCP tools running against a Python FastAPI service. Here is everything you need to integrate it cleanly into your own pipeline, including error handling, retry logic, and graceful degradation patterns.

Service Architecture

FastAPI Backend
Python service wrapping TimesFM 2.0 inference. Runs on Nomad/Docker. Exposed on port 8400 internally.
MCP Gateway
4 MCP tools: Forecast, ForecastRaw, ForecastVix, Backtest. All callable from the Claude pipeline without direct HTTP.
Hardware
16 cores, 27GB RAM, Ubuntu 22.04. Model loaded once at startup. Inference is CPU-bound — no GPU required.
Latency
~0.4s per ticker, ~4.3s for 10 ETFs (batch). First call adds ~2s model warm-up if service was idle.

Direct HTTP API Reference

## POST /forecast — Multi-ticker price forecast POST http://forecast-service:8400/forecast Content-Type: application/json { "tickers": ["AMZN", "META", "SPY"], "context": 200, // lookback bars "horizon": 10, // forecast steps "quantiles": [0.1, 0.5, 0.9] } ## Response { "results": [ { "ticker": "AMZN", "q10": [184.1, 184.9, ...], // 10 steps "q50": [189.2, 190.1, ...], // point forecast "q90": [194.5, 195.8, ...], "predicted_return_pct": 3.68, "ci_width_pct": 5.48, // (q90[-1] - q10[-1]) / q50[-1] "confidence": 0.95 // always 0.95 — ignore } ], "latency_ms": 1243 } ## POST /forecast-raw — Single arbitrary series POST http://forecast-service:8400/forecast-raw { "values": [2.31, 2.28, 2.45, ...], // raw series (e.g., ATR) "horizon": 10, "quantiles": [0.1, 0.5, 0.9] }

Robust Python Integration with Fallback

import requests, numpy as np from functools import lru_cache FORECAST_URL = "http://forecast-service:8400" TIMEOUT_S = 8 def forecast_with_fallback(ticker, series_data, horizon=10): """ Returns forecast dict or a fallback based on historical stats. Never raises — always returns actionable CI levels. """ try: r = requests.post( f"{FORECAST_URL}/forecast-raw", json={"values": series_data, "horizon": horizon}, timeout=TIMEOUT_S ) r.raise_for_status() data = r.json() return { "q10": data["q10"], "q50": data["q50"], "q90": data["q90"], "pred_avg": float(np.mean(data["q50"])), "source": "timesfm" } except Exception as e: # Graceful degradation: return historical-based levels arr = np.array(series_data[-20:]) mean_val = float(arr.mean()) std_val = float(arr.std()) return { "q10": [mean_val - 1.5 * std_val] * horizon, "q50": [mean_val] * horizon, "q90": [mean_val + 1.5 * std_val] * horizon, "pred_avg": mean_val, "source": "fallback" # flag for logging }

Data Preparation: What to Pass as Input

The quality of TimesFM output is highly sensitive to the input series preparation. Common mistakes that degrade performance:

Input Series Correct Preparation Common Mistake
Close Price Raw adjusted close prices in chronological order. No log transform. Using unadjusted prices creates artificial jumps at splits/dividends
ATR 14-period true range, raw values (not normalized). 150 bars minimum. Normalizing before passing — model does its own instance normalization
RVOL Relative volume = today_vol / 20d_avg_vol. Or just raw volume (model handles scaling). Mixing percentage RVOL with absolute volume across calls
Volume Raw shares/contracts traded. No smoothing, no log transform. 150 bars. Pre-smoothing with EMA — destroys the clustering signal the model relies on
Sector Spread (XLE/SPY) Daily ratio: XLE_close / SPY_close. 150 bars. Stationary enough for the model. Using log(ratio) — adds unnecessary complexity

Do Not Pre-Normalize Your Input

A common trap: normalizing the input series (z-score, min-max) before passing to TimesFM. The model includes instance-level normalization internally and applies the inverse at output. If you normalize before passing, the model's output will be in your arbitrary normalized scale, not in the original units. This is especially painful for CI bands used as price levels.

10 Common Mistakes to Avoid

Based on real integration experience across the DailyTickers scanner and rotation pipeline, here are the failure modes we've encountered — and how to avoid them.

1

Using direction forecast as a primary signal

44% global accuracy is mathematically worse than a coin flip when transaction costs are factored in. Use direction only as a tiebreaker or confirmation for mega-caps (SPY, AMZN, META). Never as a primary entry signal.

2

Trusting the 0.95 confidence score

TimesFM always outputs 0.95 — it is hard-coded behavior, not a meaningful signal. The real uncertainty measure is CI_width_pct = (q90 - q10) / q50. Build your decision logic around this, not the confidence field.

3

Running forecasts around earnings without a calendar guard

The model loses ~16pp accuracy within ±5 days of earnings. Without a calendar guard, roughly 20–25% of your scanner setups at any given time will be in earnings proximity — systematically poisoning your signal quality.

4

Using a 20-bar lookback for ATR/volume

20 bars works well for price direction (mega-caps), but is too short for volatility and volume. These series need 150 bars to capture regime cycles. With only 20 bars, you're showing the model a single vol cycle fragment — not enough context.

5

Setting horizon > 10 days

Beyond 10 days, the q10–q90 CI band typically exceeds 12–15% of current price, making it useless as a TP/SL zone. The model was designed for short-horizon inference; long-horizon requests are technically accepted but economically useless.

6

Running forecasts on the full screening universe (50+ tickers)

At 0.4s/ticker, 50 tickers = 20 seconds of latency. Worse, the signal-to-noise ratio collapses because you're applying the model to many tickers where it has no edge. Run it only on the 10–15 final screener candidates.

7

Using absolute predicted return values (UC5) instead of relative ranking

In sector rotation, the absolute predicted return percentages are not reliable. A "+3.2% for XLE" forecast should be read as "XLE is forecast to outperform the median sector by X pp", not as "expect a 3.2% gain." Build your trading logic on rank order, not magnitude.

8

Pre-normalizing input series

As noted in the API section: the model performs instance normalization internally. If you normalize before input, you double-normalize and the output CI bands will be in your arbitrary scale, not in price/ATR/volume units. This makes them impossible to use directly as trade levels.

9

Applying to biotech, clinical-stage, or small-cap catalytic events

FDA approval decisions, clinical trial readouts, and merger announcements create step-function price moves that are fundamentally unpredictable. No amount of historical pattern is predictive here. The model will confidently produce a CI band that the actual price will blow straight through.

10

Treating TimesFM as a standalone system

TimesFM is an enrichment layer, not a trading system. It has no knowledge of fundamentals, news, earnings expectations, positioning data, or insider activity. A 75% directional accuracy on AMZN means it's right 3 out of 4 times — but the 1 time it's wrong could be a -15% earnings miss. Always cross-reference with catalyst calendars and fundamental context.

Advanced Patterns

Pattern 1: The Squeeze-Breakout Combo

Combine UC2 (volatility squeeze) with UC3 (volume expansion forecast) for a high-conviction breakout filter. Both signals need to agree for maximum confidence:

# Squeeze-Breakout combo filter def is_squeeze_breakout_setup(ticker): # Vol squeeze forming rvol_fc = ForecastRaw(ticker.rvol[-150:], horizon=10) vol_squeezing = rvol_fc.pred_avg < ticker.rvol_now * 0.80 # Volume expansion forecast vol_fc = ForecastRaw(ticker.volume[-150:], horizon=10) vol_expanding = vol_fc.pred_avg > ticker.avg20_vol * 1.10 # Both needed — vol compresses then volume arrives = breakout setup if vol_squeezing and vol_expanding: return {"signal": "SQUEEZE_BREAKOUT", "confidence": "HIGH"} elif vol_squeezing: return {"signal": "SQUEEZE_ONLY", "confidence": "MEDIUM"} elif vol_expanding: return {"signal": "VOLUME_ONLY", "confidence": "MEDIUM"} else: return {"signal": "NONE", "confidence": "LOW"}

Pattern 2: The Macro Alignment Stack

Layer macro context (ForecastVix + sector rotation) with micro setup quality (CI_width + volume) for a four-layer confirmation stack. Only trade when all four layers agree:

Layer Signal Tool Required Condition (Long)
L1 — Macro VIX regime ForecastVix VIX_forecast < 20
L2 — Sector Sector rotation rank Forecast (10 ETFs) Ticker's sector in top 4 of 10
L3 — Setup CI_width confidence Forecast CI_width_pct < 7%
L4 — Catalyst Volume expansion ForecastRaw (volume) pred_avg > avg20 × 1.05

In our tests, setups passing all four layers have a win rate ~12pp higher than setups passing only two. The tradeoff: roughly 60% of screener output is filtered out, meaning you trade less frequently but with higher conviction.

Pattern 3: Asymmetric CI Exploitation

Sometimes the q50 forecast is flat, but the CI band is asymmetric — q90 is far above q50 while q10 is close below it (or vice versa). This asymmetry encodes the model's implicit skew estimate and is an underutilized signal:

# Detect asymmetric CI bands as skew signal def compute_ci_skew(q10_final, q50_final, q90_final): upside = q90_final - q50_final # distance to upper band downside = q50_final - q10_final # distance to lower band skew = (upside - downside) / (upside + downside) # skew > +0.2 : upside skew, model "sees" more upside tail # skew < -0.2 : downside skew, model "sees" more downside tail return skew # Use case: improve R/R by adjusting TP and stop asymmetrically skew = compute_ci_skew(q10_final, q50_final, q90_final) if skew > 0.2: # upside skew tp1 = q90_final # full TP at upper band stop = q50_final - (q90_final - q50_final) * 0.8 # tighter stop elif skew < -0.2: # downside skew tp1 = q50_final + (q50_final - q10_final) * 0.5 # conservative TP stop = q10_final # stop at lower band

Pattern 4: Time-Decay Adjustment

CI bands widen as the horizon extends. Rather than using the final-day q10/q90 as your levels, use the cumulative minimum and maximum across all 10 forecast steps. This captures the worst-case intraday exposure:

# Full-horizon CI band (better for multi-day hold) fc = Forecast(ticker="AMZN", horizon=10) conservative_tp = min(fc.q90) # minimum q90 over 10 days = conservative TP conservative_stop = max(fc.q10) # maximum q10 over 10 days = tightest stop floor # These levels are valid for a "hold for 10 days" position # vs just using fc.q90[-1] and fc.q10[-1] (end-of-horizon levels)

Glossary & Quick Reference

A compact reference for all TimesFM-specific terms and thresholds used throughout this guide.

Term Definition Typical Value / Range
CI Band Confidence Interval — the q10 to q90 range of the forecast distribution. Covers ~80% of actual realizations. q10–q90 spans 4–15% of current price typically
CI_width_pct (q90_final – q10_final) / q50_final × 100. The primary uncertainty metric. <5% = high confidence | 5–10% = moderate | >10% = high uncertainty
q10 / q50 / q90 10th, 50th, 90th quantile of the forecast distribution. q50 = point forecast. q10 = SL zone | q50 = expected path | q90 = TP ceiling
ForecastRaw Single arbitrary series forecast (not ticker-based). Used for ATR, RVOL, volume, spreads. Input: raw float array (150 bars recommended). Output: q10/q50/q90 arrays.
RVOL Relative Volume = today's volume / 20-day average volume. RVOL = 1.0 means normal volume. RVOL >2.0 = high | RVOL <0.5 = very low (pre-squeeze candidate)
Squeeze Signal RVOL_forecast < RVOL_now × 0.80. Predicts volatility compression forming over next 10 days. 73% accuracy in our tests. Most reliable TimesFM signal overall.
Volume Favorable pred_avg_volume > 20d_avg_volume × 1.10. Predicts above-average volume = institutional interest. Breakout filter with 69% directional accuracy.
Directional Accuracy % of windows where the q50 forecast correctly predicts up vs down at horizon end. 44% global (price) | 69–74% (vol/volume) | 75% (sector ranking)
Earnings Window The ±5 trading day exclusion zone around earnings announcements. Always exclude. Accuracy drops to 36–44% across all series types.
Patched Decoder TimesFM's architecture: input series split into patches (tokens), processed by a decoder-only transformer. 500M parameters, pre-trained on 100B+ time points across diverse domains.
Instance Normalization Per-series normalization applied internally by TimesFM before inference, then reversed on output. Do NOT pre-normalize your input — model handles this.
CI Skew (q90–q50) – (q50–q10) / CI_width. Measures asymmetry of the forecast distribution. >+0.2 = upside skew | <–0.2 = downside skew
Graceful Degradation Fallback behavior when TimesFM service is unavailable. Pipeline continues with historical-based CI estimates. Implemented at each pipeline step. Scanner never blocked.

Quick Decision Flowchart

Cheat Sheet — Parameter Quick Reference

Price (UC1)
Lookback: 20 bars
Horizon: 5–10d
Use: CI bands only
Tickers: SPY, AMZN, META
Volatility (UC2)
Lookback: 150 bars
Horizon: 5–10d
Squeeze: RVOL < 0.80×
Accuracy: 67–73%
Volume (UC3)
Lookback: 150 bars
Horizon: 5–10d
Favorable: > avg20 × 1.10
Accuracy: 69%
Earnings (UC4)
Action: EXCLUDE
Window: ±5 trading days
Resume: T+2 post-earnings
Accuracy drop: –16pp
Rotation (UC5)
Lookback: 200 bars
Horizon: 10d (weekly)
Use: rank order only
Latency: 4.3s / 10 ETFs
Scoring (UC6)
CI tight: <5% → +10 pts
CI wide: >10% → –15 pts
Vol favorable: +8 pts
Top sector: +5 pts
What is TimesFM? Use Cases Overview UC1: Price Forecast UC2: Volatility UC3: Volume UC4: Earnings / Events UC5: Sector Rotation UC6: Setup Scoring Production Pipeline Key Takeaways ForecastVix Backtest Methodology API Integration 10 Common Mistakes Advanced Patterns Glossary