Architecting the Signal Layer: Why GTM Engines Fail at Context

Causal modeling across roughly 500 public companies found that about 53% of enterprise GTM spend is ineffective. Among private startups and scaleups, that number climbs above 70%. Not survey data. Not vibes. Actual causal analysis of what moves buyers versus what generates activity that looks like progress.

Most AI GTM accelerates this problem rather than fixing it.

Here’s why. Most “AI-powered outreach” runs on the same broken causal model: exposure produces outcomes. Someone attends a webinar, downloads a guide, visits a pricing page. Signal fires. Agent drafts email. Email sends. The actual causal path, as Mark Stouse at Proof Analytics shows, runs through earned validation and confidence, not through exposure volume. AI automates that broken model right alongside everything else.

Jon Miller put the signal-specific version of this plainly in his 2026 GTM predictions: “When every team has access to the same intent data, that data stops being an advantage. Job changes, funding rounds, website visits, G2 activity. These signals are now available to anyone willing to pay for them.” Generic public signals fully commoditized within 24 months, he predicted. We’re already there.

The edge is reasoning about the signals you already have.

What’s Actually Happening in Most AI GTM Stacks

LinkedIn’s B2B Institute, drawing on Ehrenberg-Bass Institute research, puts only 5% of any B2B audience in-market at a given time. The 95-5 rule. Think about what that means for a trigger-based outreach system: you can have a perfectly real signal (someone visited a competitor comparison page, attended a webinar, downloaded a whitepaper) and still be triggering outreach at a 95% noise rate because the signal confirms interest, not readiness.

Agents acting on low-fidelity triggers make this worse. Cold email reply rates dropped from 6.8% in 2023 to 5.8% in 2024, and B2B tech regularly sits below 2%. Inbox saturation is a signal quality problem that more volume compounds.

Caroline Hodson put a name to the capability gap in her breakdown of B2B lead management systems: “Signal orchestration: aggregating multi-signal data to determine account readiness and trigger timely sales engagement.” Useful definition. It describes what the system should do at the capability level, but says nothing about how to make the reasoning inside that layer reliable, how to ensure the agent is actually making a qualified decision rather than firing a trigger and calling it intelligence.

That gap is the architecture problem.

The Stack

I built a competitive intelligence pipeline on top of public market data: stock prices and volatility alongside short interest ratios, insider transaction filings, institutional holder changes and news feeds. The hypothesis was that these signals telegraph operational changes before they surface in customer reviews or public announcements. Pricing moves, support cuts, product pivots. Executives don’t sell shares because things are going well.

The inputs go deeper than price movement. Short interest ratio captures when informed money is betting against a company, often on information that hasn’t surfaced publicly yet, and insider transactions, especially clustered executive selling, tend to precede operational bad news by weeks. Institutional position reductions signal conviction shifts at a scale that’s hard to fake. Price is the lagging indicator. Everything else here is what leads it.

Why I Didn’t Use FinBERT

The default choice for financial sentiment analysis is FinBERT, a model fine-tuned on financial corpora. I tried it. For classifying clean, short financial headlines it works reasonably well.

The signals worth acting on are neither clean nor short. An earnings call discussing “strategic portfolio optimization” and “right-sizing our enterprise support tier” requires contextual reasoning to decode as support cuts and repricing. FinBERT recognizes the financial vocabulary without reasoning about what it operationally implies. A 2024 study in Electronics confirmed this: GPT-4o, optimized through prompt engineering, outperformed FinBERT by up to 10% depending on sector, and fine-tuned GPT-4o mini outperformed FinBERT by 6% on the TRC2 dataset. The domain-specific advantage collapses when the task requires multi-sentence contextual reasoning rather than single-sentence classification. That’s precisely the case here.

So: gpt-oss:20b running locally via Ollama for initial sentiment analysis and relevance triage. OpenAI’s open-weight 20B parameter model, Apache 2.0 licensed, runs within 16GB of memory with chain-of-thought reasoning included. Fast, on-premise, no per-token API cost at this volume.

Tiered Routing

Two layers. The logic matters beyond this specific project.

Layer 1 (Local/Fast): Ollama handles high-volume filtering around one question: is this signal idiosyncratic to this company, or is it sector-wide noise? A stock dropping 15% in a sector that’s down 12% is a different signal from a stock dropping 15% while peers are flat. Binary classification at scale, running locally, zero API cost.

Layer 2 (Strategic/Deep): Signals that pass Layer 1 route to a SOTA model (GPT, Gemini, or Claude) for full strategic analysis and recommendation generation.

The routing exists because the Layer 2 output feeds downstream automations: campaign triggers, CRM updates, messaging workflows. Stanford’s FrugalGPT research formalized the principle. Route to the smallest capable model first, escalate only when the task demands it, demonstrated up to 98% cost reduction while matching GPT-4 performance. The cost argument is real but it’s actually secondary. What matters more is what happens when output is trusted to drive action: a weak analysis at this stage doesn’t produce a bad email, it produces a bad email that routes into a sequence that routes into a CRM workflow, and the error compounds across every downstream step.

Decision quality has to match inference cost because the stakes at each layer are asymmetric.

Context Injection

This is where most implementations break, and it’s the part that can’t be automated without human input first.

An LLM doesn’t know your strategy. It doesn’t know you’re a challenger targeting mid-market accounts being underserved by an incumbent in restructuring, or that “pricing change” in that competitor’s earnings call means an opportunity to lead with stability messaging to their customer base. Without that context, a technically accurate analysis produces a generically correct output that does nothing useful for your specific GTM motion.

Domenic Venuto at Horizon Media described the failure mode from the talent side: “Clients receive strategic counsel, not just a technically correct but context-poor answer from the most convenient tool.” The gap between technically correct and strategically useful is context.

In the same 2026 predictions piece, Jon Miller calls this “context engineering”: the discipline of structuring the knowledge that makes AI useful rather than generic. His framing: “An LLM doesn’t know your strategy. You have to programmatically inject your identity.” Worth pushing further, though. Most teams that attempt context engineering inject boilerplate. A company description, a value prop paragraph, maybe a persona snippet. The model gets a company overview; it still doesn’t know that this competitor’s restructuring announcement typically signals support cuts in their enterprise tier within 60 days, or that their mid-market customers have a known sensitivity to service continuity. Generic context produces generic reasoning, and the encoding has to be specific to the competitive dynamic you’re actually trying to exploit.

In practice, it means encoding three things before the agent reasons:

Your identity: The Challenger. What you’re positioned against, who you’re positioned for, where you win and where you don’t.
Their reality: The incumbent’s segment profile. What specific signals mean given their business model, their customer segments, their operational structure.
The interpretive logic: Given signal X about competitor Y, what does it imply about their segment Z, and what GTM response follows?

The output difference:

Signal	Generic Response	Strategic Response
Competitor stock down 15% amid restructuring	”Competitor is struggling financially."	"Restructuring suggests service tier consolidation. Mid-market accounts are highest churn risk in next 90 days. Initiate stability-focused outreach to that segment.”
Earnings call: “right-sizing enterprise support"	"Company is optimizing support operations."	"Support cuts typically precede price increases for enterprise tier. Identify their enterprise accounts in pipeline and accelerate with support continuity positioning.”

Same model. Better inputs. The strategy was already there, it just needed to be encoded before the agent ran.

Greg Kihlstrom wrote about the same structural problem in the GEO context, arguing that brands need to become the source of truth for LLMs that answer buyer queries. The same principle applies internally. If your GTM agent doesn’t have your positioning, it reasons about you the way an unconfigured LLM would. Generically.

If you’re spending on GTM and not sure what’s actually working, let’s talk.

What This Changes About the Human Role

If the context injection layer requires humans to encode strategy before the agent runs, the question becomes: what does the human role look like once that encoding is done?

The job changes. Most current AI GTM keeps humans in the loop on everything: reviewing every output, approving every send, which compresses execution time without changing the underlying work. The AI moves faster; the human is still clearing the queue.

A properly architected signal layer changes the job. Humans tune the interpretive logic. They update the context layer when positioning shifts. They evaluate which signal types are generating false positives and adjust routing thresholds. That work builds the system’s accuracy over time rather than just clearing its output.

As AI tools commoditize output generation, what remains scarce is contextual framing. Context injection is how you engineer that framing into the agent itself, so it doesn’t require a strategist to review every result before acting.

Connecting Back to Targeting

The ICP scoring work was about targeting precision: identifying which accounts to watch before you act. This is about when and why to act on them. Both share the same root problem, which is that generic inputs produce generic outputs, and whether you’re building a scoring model or a signal pipeline, the value lives in how specifically you’ve encoded what “fit” and “opportunity” mean for your business.

Off-the-shelf intent data plus a generic prompt produces the same result as firmographic filters plus a template sequence. The architecture matters because the inputs matter. Generic inputs at the top of the funnel are how 53% of GTM spend quietly disappears without anyone being able to explain where it went.

This system was built using Python, the Yahoo Finance API, Ollama (gpt-oss:20b for local inference), and SOTA model APIs for final analysis. The full pipeline (ingestion, enrichment logic, tiered routing, and context injection) is available for review. Connect with me on LinkedIn.

References

Shobayo, O., Adeyemi-Longe, S., Popoola, O., & Ogunleye, B. (2024). Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach. Big Data and Cognitive Computing, 8(11), 143. https://doi.org/10.3390/bdcc8110143