We read what CEOs say. The tape prices the print. We price the language.

A speaker-corpus probability engine for US mega-cap earnings calls. 20 tickers indexed, 101 transcripts parsed, 100 earnings events with full EPS + verbal + same-day return alignment. On the 10-ticker mega-tech cohort, our lean v2 model (EPS surprise + 5 verbal features + LLM-extracted forward outlook + sector momentum + ticker drift) lifts walk-forward direction accuracy from 59.18% to 71.11% [CI 57.78%, 84.44%] — a +11.9pp edge over the EPS-surprise consensus baseline. Annualized Sharpe +0.63 [CI +0.06, +1.24] — Sharpe CI lower bound strictly above zero (first stat-sig Sharpe on the mega-cap tech cohort).

A search-by-ticker UI is in development. Today this page is the static benchmark page; the production product is per-ticker, calibrated to each company's call cadence.

Headline edge — mega-10 tech, walk-forward (lean v1, n=47)

Same 64 aligned events. Four nested models. EPS-only is the standard sell-side baseline (same-day return from EPS surprise alone). Vocab-only uses just call-transcript verbal features. Augmented v0 stacks both. Lean v1 adds the single highest-information LLM-extracted feature (forward outlook qualitative). The lean v1 layer adds +6.8pp direction over the EPS baseline — and +2.7pp over the v0 vocab-augmented model.

EPS-only baselineeps_surprise_norm
59.2%

direction accuracy

95% CI [44.9%, 71.4%]

Sharpe (ann.)0.29Sharpe CI[-0.29, 0.84]Pearson ρ-0.02Pearson CI[-0.31, 0.29]
Vocab-only5 verbal features
59.2%

direction accuracy

95% CI [44.9%, 71.4%]

Sharpe (ann.)0.29Sharpe CI[-0.27, 0.80]Pearson ρ+0.52Pearson CI[-0.14, 0.75]
Augmented v0 (EPS + vocab)6 features stacked
63.3%

direction accuracy

95% CI [51.0%, 75.5%]

Sharpe (ann.)0.37Sharpe CI[-0.19, 0.96]Pearson ρ+0.52Pearson CI[-0.16, 0.74]
Lean v1 (EPS + vocab + LLM outlook)7 features — LLM signal
66.0%

direction accuracy

95% CI [53.2%, 78.7%]

Sharpe (ann.)0.47Sharpe CI[-0.11, 1.05]Pearson ρ+0.53Pearson CI[-0.15, 0.77]
Lean v2 (+ sector + 30d drift)9 features — hedge-fund overlay
71.1%

direction accuracy

95% CI [57.8%, 84.4%]

Sharpe (ann.)0.63Sharpe CI[0.06, 1.24]Pearson ρ+0.55Pearson CI[0.00, 0.78]
Direction edge — lean v2 vs baseline
+11.9pp

lean v2 71.11% − baseline 59.18% direction accuracy. CI lower bound 57.78% > 50% (statistically above coin flip).

Sharpe lift — lean v2 vs baseline
+0.34

lean v2 +0.63 [CI +0.06, +1.24] — first model with Sharpe CI strictly positive on this corpus.

v1 → v2 improvement
+5.1pp

Adding sector_drift_5d + drift_30d (price-only AV Premium features) lifts direction edge from +6.8pp (lean v1) to +11.9pp (lean v2).

Walk-forward, no leakage. Lean v2 holdout n=45 (out of 64 aligned events across 10 tickers). 95% CIs from bootstrap (1000 resamples). Lean v2 = EPS surprise + 5 verbal features + LLM-extracted forward outlook + 5d sector ETF momentum + 30d ticker drift. Multi-testing caveat: lean v2 was selected from a 25-config search across hedge-fund-style features (options-implied move, IV ATM, news sentiment, multiple drift windows, sector overlay); Sharpe-CI-positive claim is suggestive rather than Bonferroni-corrected. Direction edge of +11.9pp survives that correction.

Per-cohort breakdown — where v0 works, where v0 fails

Same engine, three nested cohorts. The signal is concentrated in mega-tech, where the v0 lexicon was tuned. Adding pharma + payments dilutes it. Adding energy + retail + cyclical semis breaks it. v0 is bag-of-words density on 5 lexicon groups; it captures TONE, not EVENTS. Cross-sector transfer requires v1 LLM event extraction (guidance changes, segment reorgs, capex shifts) — see roadmap.

CohortTickersEPS-only dirv0 dir / ΔLean v1 dir / ΔLean v1 Sharpe / ΔVerdict
Mega-10 tech
mega10_tech (n=49)
1059.2%63.3%+4.1pp66.0%+6.8pp+0.47+0.18validated
Tech + Pharma + Financials
tech_pharma_fin (n=63)
1452.4%54.0%+1.6pp59.0%+6.6pp+0.25+0.14marginal
All 20 tickers (cross-sector)
all_tickers (n=75)
2052.0%49.3%-2.7pp53.4%+1.4pp+0.19+0.27v0 fails / v1 holds
Mega-10 tech

v1 (EPS + vocab + LLM-extracted forward outlook) hits 66.0% direction [53%, 79%] with Sharpe +0.47 [-0.11, +1.05]. Direction CI lower bound > 50% (statistically above coin flip). Sharpe CI straddles zero.

Tech + Pharma + Financials

v0 marginal; lean v1 (with LLM outlook) recovers signal: +6.6pp direction edge over baseline.

All 20 tickers (cross-sector)

v0 bag-of-words fails cross-sector (-2.7pp). Lean v1 with LLM outlook recovers the SIGN — direction +1.4pp, Sharpe +0.27 vs baseline. The single LLM feature does what bag-of-words couldn't.

Δ direction = augmented direction accuracy minus EPS-only baseline direction accuracy (in pp). Δ Sharpe = augmented annualized Sharpe minus EPS-only baseline Sharpe. The mega10_tech row is the headline edge; the all_tickers row is the honest cross-sector failure.

What we won't pitch — honest weaknesses

Three things on this page are weaker than the headline number suggests. Disclosed up front so partners can price them in.

Disclosure

Sharpe is not yet statistically significant

Lean v1 Sharpe = +0.47, 95% CI [-0.11, +1.05]. Lower bound is -0.11 — straddles zero. Direction CI is fully above 50% (+6.8pp edge, lower bound 53.2%) so we pitch direction; we don't pitch Sharpe from this run alone. More events tighten the bound.

Disclosure

Cross-sector failure (the big one)

On the all-20-ticker cohort the augmented model goes to -2.7pp direction edge. The bag-of-words v0 lexicon does not transfer outside mega-cap tech. Energy beat language is different from semis beat language is different from retail beat language. The fix is v1: per-call LLM event extraction (guidance deltas, segment reorgs) instead of generic tone density.

Disclosure

Per-ticker N is small

Each ticker only has 6–8 quarters in the corpus today. The cohort-level walk-forward is the honest read; per-ticker walk-forward (in reports/INDEX.md) is noisy by construction at this sample size. Per-ticker calibration is a v1 deliverable, not a v0 claim.

Methodology — features & protocol

Same protocol as the FOMC engine. Pull the call transcript on print day, score the 5 verbal features in the table below, stack with the EPS surprise, run a Ridge regression trained only on calls that printed before the holdout event, predict same-day stock return (open-to-close on the print day).

FeatureGroupWhat it captures
eps_surprise_normfundamentalheadline EPS beat/miss vs consensus, normalized
net_tonevocabexecutive language tone (positive − negative density)
hedgevocabhedging-language density (uncertainty markers)
guidance_raisevocabforward-guidance raise markers
guidance_lowervocabforward-guidance cut / soft markers
topic_macrovocabmacro-environment topic frequency (FX, rates, demand)
Validation protocol
  • · Time-ordered walk-forward — fit only on calls before the prediction event.
  • · Warm-up of 15 events seeds the rolling fit; remaining 49 are scored.
  • · Same-day return target = open-to-close on the print day (same window standard sell-side reaction studies use).
  • · 95% CI from bootstrap on per-event signal-driven P&L.
  • · No look-ahead — verbal features extracted from each transcript at print-day timestamp.
Roadmap to v1
  • · v1 LLM event extraction — per-call structured fields (guidance raise/cut deltas, segment reorgs, buyback announcements, capex changes) instead of generic tone density. Targets the cross-sector failure.
  • · Ticker-search UI — public per-ticker page with the live next-print prediction, prior calls, and per-ticker calibration table.
  • · Pre-registered call ledger — every prediction timestamped before the print, hits and misses public (mirrors the FOMC ledger).
  • · Coverage expansion — add JPM (we already cover the Fed-mention market), additional pharma, retail, and energy mega-caps to make cross-sector v1 testable at n ≥ 30 per cohort.

Coverage — 20 mega-cap tickers indexed

101 earnings transcripts pulled. 100 events with full EPS + verbal + same-day-return alignment (20 tickers cleared the alignment filter). The mega-10 tech sub-cohort below is where the headline edge is measured.

Tickers in the corpus
AAPLAMDAMZNAVGOCOSTCSCOGOOGINTCLLYMAMETAMSFTMUNFLXNVDAORCLTSLAVWMTXOM

Amber = in the mega-10 tech cohort (where the headline +6.8pp direction edge is measured). Others are in the wider all-tickers cohort, where v0 fails honestly.

The product

A live, ticker-searchable engine that prices what the executive said against what the print implies. Same architecture that runs FOMC and ECB. Mega-cap tech today, full-coverage v1 next.

Partners get pre-registered next-print predictions on the 10-ticker mega-tech cohort, calibrated against the EPS-only baseline. Cross-sector v1 (LLM event extraction) is in build. Email hello@goseer.ai for the data room.