We read what CEOs say. The tape prices the print. We price the language.
A speaker-corpus probability engine for US mega-cap earnings calls. 20 tickers indexed, 101 transcripts parsed, 100 earnings events with full EPS + verbal + same-day return alignment. On the 10-ticker mega-tech cohort, our lean v2 model (EPS surprise + 5 verbal features + LLM-extracted forward outlook + sector momentum + ticker drift) lifts walk-forward direction accuracy from 59.18% to 71.11% [CI 57.78%, 84.44%] — a +11.9pp edge over the EPS-surprise consensus baseline. Annualized Sharpe +0.63 [CI +0.06, +1.24] — Sharpe CI lower bound strictly above zero (first stat-sig Sharpe on the mega-cap tech cohort).
A search-by-ticker UI is in development. Today this page is the static benchmark page; the production product is per-ticker, calibrated to each company's call cadence.
Headline edge — mega-10 tech, walk-forward (lean v1, n=47)
Same 64 aligned events. Four nested models. EPS-only is the standard sell-side baseline (same-day return from EPS surprise alone). Vocab-only uses just call-transcript verbal features. Augmented v0 stacks both. Lean v1 adds the single highest-information LLM-extracted feature (forward outlook qualitative). The lean v1 layer adds +6.8pp direction over the EPS baseline — and +2.7pp over the v0 vocab-augmented model.
direction accuracy
95% CI [44.9%, 71.4%]
direction accuracy
95% CI [44.9%, 71.4%]
direction accuracy
95% CI [51.0%, 75.5%]
direction accuracy
95% CI [53.2%, 78.7%]
direction accuracy
95% CI [57.8%, 84.4%]
lean v2 71.11% − baseline 59.18% direction accuracy. CI lower bound 57.78% > 50% (statistically above coin flip).
lean v2 +0.63 [CI +0.06, +1.24] — first model with Sharpe CI strictly positive on this corpus.
Adding sector_drift_5d + drift_30d (price-only AV Premium features) lifts direction edge from +6.8pp (lean v1) to +11.9pp (lean v2).
Walk-forward, no leakage. Lean v2 holdout n=45 (out of 64 aligned events across 10 tickers). 95% CIs from bootstrap (1000 resamples). Lean v2 = EPS surprise + 5 verbal features + LLM-extracted forward outlook + 5d sector ETF momentum + 30d ticker drift. Multi-testing caveat: lean v2 was selected from a 25-config search across hedge-fund-style features (options-implied move, IV ATM, news sentiment, multiple drift windows, sector overlay); Sharpe-CI-positive claim is suggestive rather than Bonferroni-corrected. Direction edge of +11.9pp survives that correction.
Per-cohort breakdown — where v0 works, where v0 fails
Same engine, three nested cohorts. The signal is concentrated in mega-tech, where the v0 lexicon was tuned. Adding pharma + payments dilutes it. Adding energy + retail + cyclical semis breaks it. v0 is bag-of-words density on 5 lexicon groups; it captures TONE, not EVENTS. Cross-sector transfer requires v1 LLM event extraction (guidance changes, segment reorgs, capex shifts) — see roadmap.
| Cohort | Tickers | EPS-only dir | v0 dir / Δ | Lean v1 dir / Δ | Lean v1 Sharpe / Δ | Verdict |
|---|---|---|---|---|---|---|
Mega-10 tech mega10_tech (n=49) | 10 | 59.2% | 63.3%+4.1pp | 66.0%+6.8pp | +0.47+0.18 | validated |
Tech + Pharma + Financials tech_pharma_fin (n=63) | 14 | 52.4% | 54.0%+1.6pp | 59.0%+6.6pp | +0.25+0.14 | marginal |
All 20 tickers (cross-sector) all_tickers (n=75) | 20 | 52.0% | 49.3%-2.7pp | 53.4%+1.4pp | +0.19+0.27 | v0 fails / v1 holds |
v1 (EPS + vocab + LLM-extracted forward outlook) hits 66.0% direction [53%, 79%] with Sharpe +0.47 [-0.11, +1.05]. Direction CI lower bound > 50% (statistically above coin flip). Sharpe CI straddles zero.
v0 marginal; lean v1 (with LLM outlook) recovers signal: +6.6pp direction edge over baseline.
v0 bag-of-words fails cross-sector (-2.7pp). Lean v1 with LLM outlook recovers the SIGN — direction +1.4pp, Sharpe +0.27 vs baseline. The single LLM feature does what bag-of-words couldn't.
Δ direction = augmented direction accuracy minus EPS-only baseline direction accuracy (in pp). Δ Sharpe = augmented annualized Sharpe minus EPS-only baseline Sharpe. The mega10_tech row is the headline edge; the all_tickers row is the honest cross-sector failure.
What we won't pitch — honest weaknesses
Three things on this page are weaker than the headline number suggests. Disclosed up front so partners can price them in.
Sharpe is not yet statistically significant
Lean v1 Sharpe = +0.47, 95% CI [-0.11, +1.05]. Lower bound is -0.11 — straddles zero. Direction CI is fully above 50% (+6.8pp edge, lower bound 53.2%) so we pitch direction; we don't pitch Sharpe from this run alone. More events tighten the bound.
Cross-sector failure (the big one)
On the all-20-ticker cohort the augmented model goes to -2.7pp direction edge. The bag-of-words v0 lexicon does not transfer outside mega-cap tech. Energy beat language is different from semis beat language is different from retail beat language. The fix is v1: per-call LLM event extraction (guidance deltas, segment reorgs) instead of generic tone density.
Per-ticker N is small
Each ticker only has 6–8 quarters in the corpus today. The cohort-level walk-forward is the honest read; per-ticker walk-forward (in reports/INDEX.md) is noisy by construction at this sample size. Per-ticker calibration is a v1 deliverable, not a v0 claim.
Methodology — features & protocol
Same protocol as the FOMC engine. Pull the call transcript on print day, score the 5 verbal features in the table below, stack with the EPS surprise, run a Ridge regression trained only on calls that printed before the holdout event, predict same-day stock return (open-to-close on the print day).
| Feature | Group | What it captures |
|---|---|---|
| eps_surprise_norm | fundamental | headline EPS beat/miss vs consensus, normalized |
| net_tone | vocab | executive language tone (positive − negative density) |
| hedge | vocab | hedging-language density (uncertainty markers) |
| guidance_raise | vocab | forward-guidance raise markers |
| guidance_lower | vocab | forward-guidance cut / soft markers |
| topic_macro | vocab | macro-environment topic frequency (FX, rates, demand) |
- · Time-ordered walk-forward — fit only on calls before the prediction event.
- · Warm-up of 15 events seeds the rolling fit; remaining 49 are scored.
- · Same-day return target = open-to-close on the print day (same window standard sell-side reaction studies use).
- · 95% CI from bootstrap on per-event signal-driven P&L.
- · No look-ahead — verbal features extracted from each transcript at print-day timestamp.
- · v1 LLM event extraction — per-call structured fields (guidance raise/cut deltas, segment reorgs, buyback announcements, capex changes) instead of generic tone density. Targets the cross-sector failure.
- · Ticker-search UI — public per-ticker page with the live next-print prediction, prior calls, and per-ticker calibration table.
- · Pre-registered call ledger — every prediction timestamped before the print, hits and misses public (mirrors the FOMC ledger).
- · Coverage expansion — add JPM (we already cover the Fed-mention market), additional pharma, retail, and energy mega-caps to make cross-sector v1 testable at n ≥ 30 per cohort.
Coverage — 20 mega-cap tickers indexed
101 earnings transcripts pulled. 100 events with full EPS + verbal + same-day-return alignment (20 tickers cleared the alignment filter). The mega-10 tech sub-cohort below is where the headline edge is measured.
Amber = in the mega-10 tech cohort (where the headline +6.8pp direction edge is measured). Others are in the wider all-tickers cohort, where v0 fails honestly.
The product
A live, ticker-searchable engine that prices what the executive said against what the print implies. Same architecture that runs FOMC and ECB. Mega-cap tech today, full-coverage v1 next.
Partners get pre-registered next-print predictions on the 10-ticker mega-tech cohort, calibrated against the EPS-only baseline. Cross-sector v1 (LLM event extraction) is in build. Email hello@goseer.ai for the data room.