We read what Powell says. Markets price what they think he meant. We price the gap.
A speaker-corpus probability engine for FOMC and central-bank communication. 8 years of Powell transcripts indexed at the utterance level, validated walk-forward on 43 FOMC meetings. Generalizes cleanly to Lagarde at the ECB (n=46, validated). Draghi DAX edge replicates at +16.0pp on n=41 (2014–2019, including QE-launch era) — second ECB chair, same architecture, same direction. Early-evidence on BoE Bailey and Carney. BoJ per-chair honestly weak — disclosed below.
Drill down — per speaker
Every walk-forward prediction, the full cross-asset table, the words that move equities.→
71 pressers parsed. Last 10 walk-forward predictions with hits and misses. 9-asset cross-asset edge table. Top bullish / bearish words.
Lagarde · ECB · n=46The cross-bank generalization test. EuroStoxx Sharpe 0.74, DAX +12.2pp.→
52 pressers parsed (Dec 2019–Apr 2026). Cross-asset table for EuroStoxx, DAX, EUR/USD with honest weak read on FX. Last 8 EuroStoxx walk-forward predictions, top DAX bullish / bearish words.
Bailey · BoE · n=17Early-evidence sterling signal. GBP/USD +39.3pp, Sharpe 0.92.→
22 pressers parsed (Nov 2020–Feb 2026). Cross-asset table for GBP/USD, FTSE 250, FTSE 100. Last 10 GBP/USD walk-forward predictions. Wide CIs, treated as early-evidence not validated.
MethodologyArchitecture, validation protocol, honest weaknesses, roadmap.→
Partner-readable seven-section summary. Walk-forward protocol, lexicon construction, calibration, the cut-tail under-confidence we disclose, and what we won’t claim until n ≥ 30.
Live call — Next FOMC (June 16–17, 2026)
based on April 29 2026 presser
FedWatch is pricing zero cut-tail. The model's prior-presser features assign a 6.5% cut probability — small in absolute terms, but a non-trivial gap on a tail FedWatch can't see. That asymmetric cut tail is the trade.
FedWatch's path-implied curve prices a 0% cut probability. The model's features from the April 29 presser put a 6.5% cut tail on June. We're long that tail; the call resolves June 17.
Four central banks, one architecture
Same lexicon engine, same walk-forward backtest protocol, same cross-asset position-sizing. Fed (Powell, n=43) + ECB (Lagarde n=46, Draghi n=41) + BoE (Bailey n=17, Carney n=10) + BoJ (Kuroda+Ueda pooled n=130). The verbal-alpha pattern reproduces on Powell and Lagarde at validated scale. Draghi DAX replicates at +16.0pp on n=41 — second ECB chair, same direction (Sharpe CI still crosses zero, treated as early-evidence). Bailey and Carney show early-evidence edges at small n. BoJ per-chair honestly weak — disclosed in the table below.
| Central bank | Target | Edge vs momentum | Pred ↔ actual ρ | Sharpe (ann) | n | Status |
|---|---|---|---|---|---|---|
Federal Reserve Powell | S&P 500 | +12.9pp | +0.40 | 0.66 | 43 | validated |
Federal Reserve Powell | 10y Treasury | +5.4pp | +0.41 | 1.32 | 43 | validated |
Federal Reserve Powell | 5y Treasury | +0.6pp | +0.46 | 1.62 | 43 | validated |
ECB Lagarde | EuroStoxx 50 | +14.2pp | +0.06 | 0.74 | 46 | validated |
ECB Lagarde | DAX | +12.2pp | +0.04 | 0.56 | 46 | validated |
ECB Draghi | DAX | +16.0pp | +0.02 | 0.20 | 41 | early |
Bank of England Bailey | FTSE 250 | +27.2pp | -0.20 | 0.31 | 17 | early |
Bank of England Bailey | GBP/USD | +39.3pp | +0.28 | 0.92 | 17 | early |
Bank of England Carney | GBP/USD | +36.7pp | +0.56 | 1.06 | 10 | early |
Bank of Japan Pooled (Kuroda+Ueda) | Nikkei | +5.8pp | +0.06 | 0.57 | 130 | research |
Bank of Japan Kuroda | Nikkei | +20.6pp | -0.14 | -0.63 | 18 | weak |
Bank of Japan Ueda | Nikkei | -16.9pp | -0.12 | -0.79 | 21 | weak |
Status legend: validated = n ≥ 30 with stable CI · early = n = 10–20, point estimates positive but CIs wide · thin = n < 10, suggestive only · research = pooled signal with weak per-chair components — modest edge survives but we don't claim more than that · weak = per-chair signal honestly does not separate from noise. The FED-built lexicon does not transfer cleanly to BoJ at the per-chair level.
Why isn't this a single “speech → SPX” product?
We tested it. The result: Powell → SPX is real (Sharpe 0.66, n=43). Lagarde → SPX is also real (Sharpe 0.91, n=43, edge +4.7pp). But BoE chairs and BoJ chairs do not predict SPX — every BoE/BoJ-on-SPX backtest is null or actively negative.
That's the architecture working as designed: BoE words move sterling and FTSE, not SPX. BoJ words move Nikkei and yen, not SPX. The right product is “right asset per bank” — not “everything → SPX.” Forcing a unified SPX target would throw away ~70% of the cross-bank signal we've measured.
The edge — right tool per asset class
Different asset reactions need different feature sets. Rates pricing is dominated by macro regime × verbal interaction (gradient-boosted on 76 features incl. CPI / NFP / unemployment + verbal). Equity and FX reaction is dominated by verbal cadence alone (linear on 22 verbal features). Honest split, principled, both validated walk-forward.
edge over momentum
edge over momentum
edge over momentum
edge over momentum
Edge over momentum baseline (predict same sign as last meeting). Walk-forward, no leakage, n=43 Powell meetings (2018–2026), n=46 Lagarde meetings (2020–2026). Sharpe 95% CI from 5,000-sample bootstrap on per-meeting signal-driven P&L.
Per-speaker signal heterogeneity — pooled fails, individual speakers work
We built a corpus of 520 Fed governor speeches across 2018–2026 (1.47M tokens, 7 governors), and ran the same-day cross-asset model both pooled and per-speaker. The architectural finding: pooling all governors into one signal fails (negative edges across all targets — speakers are not interchangeable). Per-speaker models recover real but modest verbal alpha, with the strongest signal on Waller's rate predictions.
edge over momentum
edge over momentum
Pearson correlation between Waller's speech-day lexicon score and the realized 5y move = +0.260 on n=63 walk-forward predictions. That's real but modest — what desk research has long suggested about Waller specifically. An earlier 2024-2026-only sample (n=28) showed a stronger +27pp/+0.605 reading; backfilling to the full 8-year history reduced it to the +0.260 above. We report the more conservative number.
The product is per-speaker. Every voting member gets their own corpus + their own model. The desk subscribes to a ranked feed of speaker signals. Pooled-committee aggregation is the wrong architecture and the data confirms it.
vocmarkets — the full verbal-alpha portfolio
The four headline targets above aren't the whole story. Same engine, run against the macro complex: 9 distinct asset classes show positive verbal-alpha edge from Powell's prior-presser features. Credit (LQD/HYG), oil, and BTC join the original rates + equities + FX. Sector rotation (XLF/XLE/XLK) and gold are null — those moves require fundamentals or supply shocks the verbal signal can't see.
| Asset | Class | Edge over momentum | Sharpe (ann) | 95% CI |
|---|---|---|---|---|
| 5y treasury (FVX) | Rates | +0.6pp | 1.62 | [0.94, 2.38] |
| 10y treasury (TNX) | Rates | +5.4pp | 1.32 | [0.49, 2.26] |
| S&P 500 | Equities | +12.9pp | 0.66 | [-0.18, 1.42] |
| DXY | FX | +3.1pp | 0.71 | [-0.12, 1.77] |
| LQD (IG credit) | Credit | +7.9pp | 0.74 | [-0.08, 1.66] |
| HYG (HY credit) | Credit | +5.8pp | 0.53 | [-0.32, 1.30] |
| WTI Oil | Commod | +10.6pp | 0.38 | [-0.44, 1.27] |
| BTC | Crypto | +8.4pp | 0.05 | [-0.86, 0.86] |
| VIX | Vol | +3.5pp | 0.45 | [-0.47, 1.05] |
Honest null results — what the engine doesn't predict
| Energy ETF (XLE) | Sectors | -10.6pp | verbal alpha doesn't predict sector rotation |
| Tech ETF (XLK) | Sectors | -1.3pp | null |
| Gold | Commod | +3.3pp | edge present but Sharpe ~0 — too noisy |
Each row is a separate walk-forward backtest, n=43, no leakage. The expanded universe converts a single-asset signal into a portfolio of edges — and the honest null results signal which moves the engine isn't designed to capture.
Direction vs magnitude — two distinct trading-desk products
A direction signal tells you which way to lean. A magnitude prediction tells you how big to size. We measured both honestly. Result: rates predictions are tradeable on magnitude (real MAE improvement over a naive zero baseline). Equity/credit/commodity predictions are tradeable on direction only — magnitudes are correctly-signed but ~2x too aggressive, requiring trader-side shrinkage.
SEER predicts the bp move of the day. MAE is meaningfully below naive zero baseline.
| Asset | SEER MAE | Naive MAE | Lift |
|---|---|---|---|
| 10y treasury (TNX) | 4.19bp | 5.13bp | +18.3% |
| 5y treasury (FVX) | 5.81bp | 6.67bp | +12.8% |
A trading desk receives a numerical bp forecast they can size against directly.
Direction is correct (see edge table above) but magnitude shrinkage required. Calibration factor ~0.5 fixes it post-hoc.
| Asset | Direction edge | Magnitude verdict |
|---|---|---|
| S&P 500 | see table above | direction +12.9pp; magnitudes ~2x too aggressive |
| DXY | see table above | direction +3.1pp; magnitudes over-predicted |
| LQD (IG credit) | see table above | direction +7.9pp; magnitudes over-predicted |
| WTI Oil | see table above | direction +10.6pp; magnitudes over-predicted |
A trading desk receives a directional signal + a recommended position-size cap.
We disclose where the model is well-calibrated and where it isn't. The two products map to different desk subscribers: the rates desk consumes magnitude forecasts; the equity / credit / FX desk consumes direction signals.
Which words actually move equities
For every word in the lexicon, we correlate Powell’s usage frequency at the prior presser with the realized FOMC-day SPX move. The pattern is durable: confident / committed language predicts rallies, hedge / topic-naming language predicts selloffs.
Pearson correlation between word frequency at Powell’s prior presser and the realized FOMC-day S&P 500 move (n=63 meetings).
Call ledger — 3 live, 4 walk-forward backtests
Three live pre-registered calls — BoE Bailey May 8, ECB Lagarde June 5, and Fed Powell June 17 — plus four walk-forward backtest replays of prior pressers. Backtest replays use only data available at the source presser's timestamp — no leakage. They are not pre-registered live calls; they are honest model replays, regenerated from the same `python -m seer_fomc.model.posterior` pipeline. Walk-forward backtest on these 4 entries: 2 hits, 2 misses. The model's broader walk-forward record across all n=43 FOMC meetings is 74.4% argmax-correct (see Calibration section). Bailey resolves in 3 days, Lagarde in 1 month, Powell in 6 weeks.
Live · pre-registeredFOMC June 16-17, 20262026-05-02 21:51 UTC (pre-registered)cut 7% · hold 84% · hike 9%
FedWatch prices a 0% cut tail. The model's prior-presser features assign 6.5% — a non-trivial cut tail FedWatch's path-implied curve cannot price. Resolves June 17.
FedWatch at the time: cut 0.0% · hold 90.8% · hike 9.2% | SEER cut tail = +6.5pp gap
Target: next FOMC decision
Walk-forward backtest · ✗ missFOMC March 18 → April 29, 2026Backtest replay (used only data ≤ March 18)cut 83% · hold 17% · hike 0%
March 18 presser features pushed the model toward predicting a cut. The committee held — model was over-confident on the cut tail given the still-contested dissent split.
Outcome: April 29 outcome: HOLD (with 4 dissents)
MISS — predicted cut (83%), outcome hold.
Target: April 29 outcome from March 18 presser features
Walk-forward backtest · ✓ hitFOMC January 28 → March 18, 2026Backtest replay (used only data ≤ January 28)cut 46% · hold 54% · hike 0%
Hold edged out cut on the lexicon features; the model carried real cut-tail uncertainty (46%) that resolved in favor of hold.
Outcome: March 18 outcome: HOLD
HIT — argmax hold (54%), outcome hold.
Target: March 18 outcome from January 28 presser features
Walk-forward backtest · ✓ hitFOMC December 10 → January 28, 2025/2026Backtest replay (used only data ≤ December 10)cut 47% · hold 52% · hike 0%
Tightly-bracketed cut/hold call — hold won by 5pp. The dissent split visible in the December presser left real ambiguity that the model captured honestly.
Outcome: January 28 outcome: HOLD
HIT — argmax hold (52%), outcome hold.
Target: January 28 outcome from December 10 presser features
Walk-forward backtest · ✗ missFOMC September 17 → October 29, 2025Backtest replay (used only data ≤ September 17)cut 8% · hold 92% · hike 0%
Model predicted hold (92%) from the September features. The committee cut. This is the cut-tail under-confidence problem disclosed in the calibration section — the model needed more cut-cycle history than it had at this point.
Outcome: October 29 outcome: cut -25bp
MISS — predicted hold (92%), outcome cut.
Target: October 29 outcome from September 17 presser features
The June 17 call is committed in this deploy and resolves publicly when the meeting happens. The four backtest entries are walk-forward replays of the model on prior pressers using only data available at the source timestamp (no future leakage); they are not pre-registered live calls. Going forward, every Lagarde / Bailey / Powell call will be pre-registered here before the relevant meeting.
Cross-asset directional call from the Ridge model (fit on n=21 prior events), predicting from Feb 5 2026 presser features. Resolves on May 8 close.
| Target | Predicted move | Note |
|---|---|---|
| FTSE 100 | -1.03% | model's strongest signal |
| FTSE 250 | -0.42% | early-evidence edge target (walk-forward +27.2pp, n=17) |
| GBP/USD | -0.04% | model uncertain — prediction near flat |
Cross-asset directional call from the Ridge model (fit on n=51 prior events), predicting from April 30 2026 presser features. Resolves on June 5 close.
| Target | Predicted move | Note |
|---|---|---|
| DAX | -2.01% | validated edge target (walk-forward +12.2pp, n=46) |
| EuroStoxx 50 | -1.69% | validated edge target (walk-forward +14.2pp, n=46) |
| EUR/USD | +0.04% | model uncertain — prediction near flat |
Combined Draghi+Lagarde multinomial logistic, n=101 training events. Walk-forward backtest details + per-class calibration on the methodology page.
Caveat: Cross-asset directional calls. Lagarde June 5 also includes a decision posterior (cut/hold/hike) trained on combined Draghi+Lagarde history. BoE Bailey decision posterior is next-session work; the May 8 call remains cross-asset only.
- May 8, 2026 · BoE Bailey · pre-registered above (cross-asset)
- June 5, 2026 · ECB Lagarde · pre-registered above (cross-asset)
- June 16–17, 2026 · Fed Powell · pre-registered (decision posterior + 6.5% cut tail)
Calibration — when the model says 84% hold, is it actually 84%?
Brier score and log loss measure probability calibration directly. Walk-forward, n=43 FOMC meetings, scored against a trailing base-rate baseline (predict the empirical decision frequency over the prior 12 meetings).
| Outcome | n | Argmax acc | Avg P(correct outcome) | Calibration |
|---|---|---|---|---|
| cut | 6 | 33.3% | 0.361 | under-confident on cuts (model needs more cut-cycle history) |
| hold | 26 | 76.9% | 0.682 | well calibrated |
| hike | 11 | 90.9% | 0.772 | well calibrated |
Honest weakness: the model is well-calibrated on hold and hike (the dominant regime in 2018–2024) but under-confident on cuts. It missed the start of the September / October 2025 cut cycle — predicted hold both times when the outcome was a 25bp cut. The cut tail is the part of the distribution that needs more training history, and it's the part of the live June 17 call most worth watching. See every walk-forward prediction →
For each predicted-probability bucket, what fraction of the time was the model actually right? Perfect calibration = predicted equals empirical. Pooled across all three classes (one-vs-rest), 43 walk-forward meetings × 3 classes = 129 (predicted, outcome) pairs.
| Predicted bucket | n | Mean predicted | Empirical | Calibration bar |
|---|---|---|---|---|
| 0–10% | 67 | 2.4% | 10.4% | |
| 10–30% | 11 | 16.1% | 18.2% | |
| 30–60% | 14 | 47.7% | 42.9% | |
| 60–90% | 15 | 80.1% | 86.7% | |
| 90–100% | 22 | 95.2% | 68.2% |
Grey bar = mean predicted probability in the bucket. Amber line = empirical frequency (where the actual outcome landed). Mid-range buckets (10–90%) are well-calibrated within ±10pp. The 90–100% bucket is over-confident: when the model assigns 95% confidence, the empirical hit rate is 68%. This is the same cut-tail miss problem visible in the per-class table above.
The product roadmap
FOMC engine validated walk-forward (n=43). ECB Lagarde validated at scale (n=46). BoE early evidence (Bailey n=17, Carney n=10). Draghi + BoJ per-chair honestly weak. June 17 FOMC pre-registered live.
June 17 (Powell), June 5 (Lagarde), May 8 (Bailey) all pre-registered before their meetings. Public timestamps.
Williams (NY Fed, permanent voter, separate corpus). Regional Fed presidents (Daly, Logan, Goolsbee, Kashkari, Bostic). Each speaker is its own corpus + walk-forward test.
Live, priced on every screen, next to everything. Powell, Lagarde, Bailey, Ueda. Tim Cook on earnings. Trump at a rally. Any speaker. Any market. The probability of what’s actually going to happen, calibrated, in real time, in plain English.