SEER/ FOMC / Methodology
How SEER reads central bankers, and how we know it works.
A short technical summary intended for partners, advisors, and desk reviewers. Every claim on this page is reproducible from the public corpus (Powell 2018–2026 transcripts, ECB press conferences, BoE inflation reports, BoJ kaiken summaries).
01
Architecture in one sentence
For each central-bank speaker, build a vocabulary-feature corpus of their public communication, fit a walk-forward model that maps the features at presser t to the cross-asset reaction at meeting t+1, and report the directional accuracy + position-sized P&L Sharpe against a same-direction-as-last-meeting momentum baseline.
02
What goes into a corpus
- Source. Official transcripts only. Fed FOMC press conferences (2018–present), ECB monetary-policy press conferences (Draghi 2017–19, Lagarde 2020–present), BoE inflation report / Monetary Policy Report press conferences (Carney 2015–19, Bailey 2020–present), BoJ kaiken summaries (Kuroda + Ueda).
- Tokenization. Standard whitespace + word-character regex. Per-1000-token normalization applied to every term frequency to make different transcripts comparable.
- Lexicon. 165 hand-curated terms grouped into hawkish, dovish, patient, confident, concerned, hedge, and topic-naming categories. Same lexicon applied to all chairs — the architecture transfers if and only if a Fed-built lexicon picks up signal at non-Fed central banks.
- Features. Per event: lexicon counts per 1000 tokens, plus aggregated category sums, plus a net_hawk_dove spread. The model sees only prior_* (event t-1) and d_* (event t-1 − event t-2) for predicting event t. No future leakage by construction.
03
Validation protocol
- Walk-forward only. For each test row at index i, the model is re-fit on rows [0, i) and produces a single prediction for row i. Standard time-series cross-validation. No re-fitting on test data, no peek-ahead features.
- Two estimators per target. Linear (Ridge, alpha=1.0) on 22 verbal features for SPX / FX / equity targets. Gradient-boosted (HistGradientBoostingRegressor) on 76 features including macro regime variables (CPI, NFP, unemployment, prior-meeting actions) for US treasury yield targets. Separate CV-tuned per target.
- Metrics reported. Direction accuracy vs momentum + majority baselines; predicted-vs-actual Pearson correlation; cumulative position-sized P&L (clip-and-cap at 75th-percentile absolute prediction); annualized Sharpe; bootstrap 95% CI on Sharpe (5,000 samples).
- Decision posterior. For Powell only (so far): multinomial logistic regression on prior-presser features, blended with a 12-meeting trailing base-rate prior (alpha=0.15). Calibration scored via Brier and log loss against the same base-rate baseline.
04
What we found, honestly
- Powell. n=43 walk-forward. Decision argmax accuracy 74.4%. Brier 0.456 vs 0.623 base-rate (−27%). Cross-asset edges on rates (Sharpe 1.32–1.62, CIs tight), SPX (Sharpe 0.66, CI [−0.18, 1.42]), credit, and oil. Sector rotation (XLE/XLK) is null — an honest disclosed limit.
- Lagarde. n=46 walk-forward. EuroStoxx +14.2pp / Sharpe 0.74. DAX +12.2pp / Sharpe 0.56. The architecture transfers cleanly to a non-Fed central bank without retraining the lexicon.
- Bailey + Carney. Early evidence (Bailey n=17, Carney n=10). Bailey GBP/USD +39.3pp / Sharpe 0.92, FTSE 250 +27.2pp / Sharpe 0.31. Carney GBP/USD +36.7pp / Sharpe 1.06 / corr +0.56. CIs are wide given the small samples. We treat these as suggestive until n=30.
- Draghi. Walk-forward n=41 (2014–2019, including QE-launch era). DAX edge +16.0pp / Sharpe 0.20 / corr +0.02 / direction accuracy 61.0% vs momentum 45.0%. The signal lives in the 2014–2016 QE-launch period; the original 2017–2019 sub-sample (n=18) showed null. The point estimate is positive at scale and the direction edge is large, but the Sharpe CI still crosses zero — we treat as early-evidence, not validated. Two ECB chairs now both show positive DAX edges, which strengthens the role-level architecture claim.
- BoJ. Honestly weak per-chair. Pooled (Kuroda+Ueda, n=130) shows a modest +5.8pp edge on Nikkei but the per-chair signals are negative-to-noisy. The FED-built lexicon does not transfer cleanly to BoJ. Either Japan-specific recalibration or an honest "vocmarkets does not work on BoJ" disclosure.
- Pooling fails. 520 Fed-governor speeches pooled into a single signal: negative edges across all targets. Same with Carney + Bailey pooled. Speakers are individuals; the right architecture is per-speaker, not role-level. This is a structural finding, not a hyperparameter problem.
05
Honest weaknesses
- Cut-tail under-confidence. Powell decision posterior missed the September / October 2025 cut cycle (predicted hold both times, outcome cut). Per-class avg P(correct) on cuts is 0.36 vs 0.68 on holds and 0.77 on hikes. The model needs more cut-cycle history to calibrate the cut tail; we expect this gap to close as 2025–26 cuts accumulate.
- Decision posterior — extended to ECB. The first attempt on Lagarde-only at n=26 walk-forward showed Brier improvement only +1.6% vs base rate. After backfilling Draghi 2014–2019 (+27 events), the combined Draghi+Lagarde decision posterior is trained on the full ECB rate-decision history. Specific argmax-accuracy and Brier numbers from this combined run are pending pipeline verification and will be published here once they trace cleanly to a saved
ecb_posterior_combined.jsonartifact (single-source-of-truth discipline applies). BoE Bailey decision posterior remains next-session work (Bailey n=22 is below the n≥30 validation threshold we apply). - Sample size on non-Fed chairs. ECB n=46 (Lagarde) / 41 (Draghi). BoE n=17 (Bailey) / 10 (Carney). BoJ n=18 (Kuroda) / 21 (Ueda). Most non-Fed chair Sharpe CIs still cross zero. Time fixes this; we are not claiming validation at n<30 with CI crossing zero.
- Live track record is single-digit days old. Three live pre-registered calls (Bailey May 8, Lagarde June 5, Powell June 17). The walk-forward backtest 2-hits / 2-misses on the four ledger entries is honest and they are not pre-registered live calls; that distinction is preserved on the public ledger. The broader walk-forward record across all n=43 FOMC meetings is 74.4% argmax-correct.
- No real-money validation. All P&L is paper, position-sized 1x notional with no slippage / costs / financing. Realistic desk costs would compress most of the borderline Sharpes. We seek a desk partner to run a live shadow book in 2026 H2.
06
What the product looks like
- For rates desks. A magnitude forecast: predicted bp move on the meeting day, MAE 18% better than naive zero baseline on TNX and FVX. Sized directly off the model output.
- For equity / FX / credit desks. A direction signal + recommended position-size cap. Magnitudes are correctly-signed but ~2x too aggressive on the un-shrunk model output; apply 0.5x calibration factor or use the sign only.
- For event traders. A pre-registered probability call deployed to /fomc before each meeting, resolving publicly with the meeting outcome. Calibration and Brier improvements are tracked across all resolutions.
07
Forward roadmap
- May–July 2026. Resolve Bailey May 8, Lagarde June 5, Powell June 17. Each becomes a public data point on the live track record.
- Q3 2026. Williams + regional Fed presidents (each its own per-speaker corpus). ECB and BoE decision posteriors (cut/hold/hike probabilities, not just cross-asset).
- H2 2026. Live desk shadow run in partnership with a friendly proprietary desk. Anthropic-API-backed live thesis extraction (currently the page's thesis field is hand-written from model features).
Questions, sanity-checks, or desk-partner inquiries: hello@goseer.ai