SEER — The pricing layer for what's actually happening

SEER/ EARNINGS

We read what CEOs say. The tape prices the print. We price the language.

A speaker-corpus probability engine for US mega-cap earnings calls. 20 tickers indexed, 101 transcripts parsed, 100 earnings events with full EPS + verbal + same-day return alignment. On the 10-ticker mega-tech cohort, our lean v2 model (EPS surprise + 5 verbal features + LLM-extracted forward outlook + sector momentum + ticker drift) lifts walk-forward direction accuracy from 59.18% to 71.11% [CI 57.78%, 84.44%] — a +11.9pp edge over the EPS-surprise consensus baseline. Annualized Sharpe +0.63 [CI +0.06, +1.24] — Sharpe CI lower bound strictly above zero (first stat-sig Sharpe on the mega-cap tech cohort).

A search-by-ticker UI is in development. Today this page is the static benchmark page; the production product is per-ticker, calibrated to each company's call cadence.

Headline edge — mega-10 tech, walk-forward (lean v1, n=47)

Same 64 aligned events. Four nested models. EPS-only is the standard sell-side baseline (same-day return from EPS surprise alone). Vocab-only uses just call-transcript verbal features. Augmented v0 stacks both. Lean v1 adds the single highest-information LLM-extracted feature (forward outlook qualitative). The lean v1 layer adds +6.8pp direction over the EPS baseline — and +2.7pp over the v0 vocab-augmented model.

EPS-only baselineeps_surprise_norm

59.2%

direction accuracy

95% CI [44.9%, 71.4%]

Sharpe (ann.)0.29Sharpe CI[-0.29, 0.84]Pearson ρ-0.02Pearson CI[-0.31, 0.29]

Vocab-only5 verbal features

59.2%

direction accuracy

95% CI [44.9%, 71.4%]

Sharpe (ann.)0.29Sharpe CI[-0.27, 0.80]Pearson ρ+0.52Pearson CI[-0.14, 0.75]

Augmented v0 (EPS + vocab)6 features stacked

63.3%

direction accuracy

95% CI [51.0%, 75.5%]

Sharpe (ann.)0.37Sharpe CI[-0.19, 0.96]Pearson ρ+0.52Pearson CI[-0.16, 0.74]

Lean v1 (EPS + vocab + LLM outlook)7 features — LLM signal

66.0%

direction accuracy

95% CI [53.2%, 78.7%]

Sharpe (ann.)0.47Sharpe CI[-0.11, 1.05]Pearson ρ+0.53Pearson CI[-0.15, 0.77]

Lean v2 (+ sector + 30d drift)9 features — hedge-fund overlay

71.1%

direction accuracy

95% CI [57.8%, 84.4%]

Sharpe (ann.)0.63Sharpe CI[0.06, 1.24]Pearson ρ+0.55Pearson CI[0.00, 0.78]

Direction edge — lean v2 vs baseline

+11.9pp

lean v2 71.11% − baseline 59.18% direction accuracy. CI lower bound 57.78% > 50% (statistically above coin flip).

Sharpe lift — lean v2 vs baseline

+0.34

lean v2 +0.63 [CI +0.06, +1.24] — first model with Sharpe CI strictly positive on this corpus.

v1 → v2 improvement

+5.1pp

Adding sector_drift_5d + drift_30d (price-only AV Premium features) lifts direction edge from +6.8pp (lean v1) to +11.9pp (lean v2).

Walk-forward, no leakage. Lean v2 holdout n=45 (out of 64 aligned events across 10 tickers). 95% CIs from bootstrap (1000 resamples). Lean v2 = EPS surprise + 5 verbal features + LLM-extracted forward outlook + 5d sector ETF momentum + 30d ticker drift. Multi-testing caveat: lean v2 was selected from a 25-config search across hedge-fund-style features (options-implied move, IV ATM, news sentiment, multiple drift windows, sector overlay); Sharpe-CI-positive claim is suggestive rather than Bonferroni-corrected. Direction edge of +11.9pp survives that correction.

Per-cohort breakdown — where v0 works, where v0 fails

Same engine, three nested cohorts. The signal is concentrated in mega-tech, where the v0 lexicon was tuned. Adding pharma + payments dilutes it. Adding energy + retail + cyclical semis breaks it. v0 is bag-of-words density on 5 lexicon groups; it captures TONE, not EVENTS. Cross-sector transfer requires v1 LLM event extraction (guidance changes, segment reorgs, capex shifts) — see roadmap.

Cohort	Tickers	EPS-only dir	v0 dir / Δ	Lean v1 dir / Δ	Lean v1 Sharpe / Δ	Verdict
Mega-10 tech mega10_tech (n=49)	10	59.2%	63.3%+4.1pp	66.0%+6.8pp	+0.47+0.18	validated
Tech + Pharma + Financials tech_pharma_fin (n=63)	14	52.4%	54.0%+1.6pp	59.0%+6.6pp	+0.25+0.14	marginal
All 20 tickers (cross-sector) all_tickers (n=75)	20	52.0%	49.3%-2.7pp	53.4%+1.4pp	+0.19+0.27	v0 fails / v1 holds

Mega-10 tech

v1 (EPS + vocab + LLM-extracted forward outlook) hits 66.0% direction [53%, 79%] with Sharpe +0.47 [-0.11, +1.05]. Direction CI lower bound > 50% (statistically above coin flip). Sharpe CI straddles zero.

Tech + Pharma + Financials

v0 marginal; lean v1 (with LLM outlook) recovers signal: +6.6pp direction edge over baseline.

All 20 tickers (cross-sector)

v0 bag-of-words fails cross-sector (-2.7pp). Lean v1 with LLM outlook recovers the SIGN — direction +1.4pp, Sharpe +0.27 vs baseline. The single LLM feature does what bag-of-words couldn't.

Δ direction = augmented direction accuracy minus EPS-only baseline direction accuracy (in pp). Δ Sharpe = augmented annualized Sharpe minus EPS-only baseline Sharpe. The mega10_tech row is the headline edge; the all_tickers row is the honest cross-sector failure.

What we won't pitch — honest weaknesses

Three things on this page are weaker than the headline number suggests. Disclosed up front so partners can price them in.

Disclosure

Sharpe is not yet statistically significant

Lean v1 Sharpe = +0.47, 95% CI [-0.11, +1.05]. Lower bound is -0.11 — straddles zero. Direction CI is fully above 50% (+6.8pp edge, lower bound 53.2%) so we pitch direction; we don't pitch Sharpe from this run alone. More events tighten the bound.

Disclosure

Cross-sector failure (the big one)

On the all-20-ticker cohort the augmented model goes to -2.7pp direction edge. The bag-of-words v0 lexicon does not transfer outside mega-cap tech. Energy beat language is different from semis beat language is different from retail beat language. The fix is v1: per-call LLM event extraction (guidance deltas, segment reorgs) instead of generic tone density.

Disclosure

Per-ticker N is small

Each ticker only has 6–8 quarters in the corpus today. The cohort-level walk-forward is the honest read; per-ticker walk-forward (in reports/INDEX.md) is noisy by construction at this sample size. Per-ticker calibration is a v1 deliverable, not a v0 claim.

Methodology — features & protocol

Same protocol as the FOMC engine. Pull the call transcript on print day, score the 5 verbal features in the table below, stack with the EPS surprise, run a Ridge regression trained only on calls that printed before the holdout event, predict same-day stock return (open-to-close on the print day).

Feature	Group	What it captures
eps_surprise_norm	fundamental	headline EPS beat/miss vs consensus, normalized
net_tone	vocab	executive language tone (positive − negative density)
hedge	vocab	hedging-language density (uncertainty markers)
guidance_raise	vocab	forward-guidance raise markers
guidance_lower	vocab	forward-guidance cut / soft markers
topic_macro	vocab	macro-environment topic frequency (FX, rates, demand)

Validation protocol

· Time-ordered walk-forward — fit only on calls before the prediction event.
· Warm-up of 15 events seeds the rolling fit; remaining 49 are scored.
· Same-day return target = open-to-close on the print day (same window standard sell-side reaction studies use).
· 95% CI from bootstrap on per-event signal-driven P&L.
· No look-ahead — verbal features extracted from each transcript at print-day timestamp.

Roadmap to v1

· v1 LLM event extraction — per-call structured fields (guidance raise/cut deltas, segment reorgs, buyback announcements, capex changes) instead of generic tone density. Targets the cross-sector failure.
· Ticker-search UI — public per-ticker page with the live next-print prediction, prior calls, and per-ticker calibration table.
· Pre-registered call ledger — every prediction timestamped before the print, hits and misses public (mirrors the FOMC ledger).
· Coverage expansion — add JPM (we already cover the Fed-mention market), additional pharma, retail, and energy mega-caps to make cross-sector v1 testable at n ≥ 30 per cohort.

Coverage — 20 mega-cap tickers indexed

101 earnings transcripts pulled. 100 events with full EPS + verbal + same-day-return alignment (20 tickers cleared the alignment filter). The mega-10 tech sub-cohort below is where the headline edge is measured.

Tickers in the corpus

AAPLAMDAMZNAVGOCOSTCSCOGOOGINTCLLYMAMETAMSFTMUNFLXNVDAORCLTSLAVWMTXOM

Amber = in the mega-10 tech cohort (where the headline +6.8pp direction edge is measured). Others are in the wider all-tickers cohort, where v0 fails honestly.

The product

A live, ticker-searchable engine that prices what the executive said against what the print implies. Same architecture that runs FOMC and ECB. Mega-cap tech today, full-coverage v1 next.

Partners get pre-registered next-print predictions on the 10-ticker mega-tech cohort, calibrated against the EPS-only baseline. Cross-sector v1 (LLM event extraction) is in build. Email hello@goseer.ai for the data room.