"It's searching the internet"
The LLM itself has no internet access — its knowledge is frozen at training cutoff. When ChatGPT searches the web, that's a tool the platform added. The model still only generates text from what it's given in context.
Learn Series
Browse lessons in this series.
Lesson 2
RAG and ToolsHow AI gets information it was not trained on, where retrieval fails, and what tool-calling changes for governance and accountability.
Lesson 3
MCPs and PluginsHow AI platforms connect to external systems, how MCP standardizes tool access, and what compliance teams must govern at every data boundary.
Lesson 4
Common ToolsUnderstanding Your AI Platform. This lesson maps the model, retrieval, and tool stack onto ChatGPT, Claude, Gemini, Cursor, and Claude Cowork.
Lesson 5
Prompting & Context EngineeringHow to Talk to AI Models Deliberately. The five-layer prompt framework, zero-shot vs. few-shot prompting, chain-of-thought, and anti-patterns for compliance workflows.
Lesson 6
AI Agents & Automated WorkflowsFrom single model calls to multi-step automated processes: how agents work, where they fail, and what oversight compliance requires before you trust the output.
Lesson 7
Evaluating & Validating AI OutputClosing the gap between "the AI said it" and "I can rely on it." Detection techniques, validation prompts, and reliance standards for compliance output.
Lesson 8
Model Risk Management for AIBefore you can build a governance program, you need to answer a threshold question regulators will ask: is your use of AI a "model" under SR 11-7? That determination changes your entire oversight obligation.
Lesson 9
AI Risk: Domain-Specific ExposuresSR 11-7 compliance is necessary but not sufficient. These risks apply wherever AI touches your operations — in lending decisions, data handling, vendor relationships, cybersecurity posture, and consumer-facing communications.
Lesson 10
Building AI GovernanceThis lesson presents a practical governance operating model for AI adoption in compliance teams.
Lesson 1: LLMs & Reasoning Models
A plain-language guide for the AI-Native Compliance Committee.
The information contained in this material is provided for general informational and educational purposes only and does not constitute legal, regulatory, compliance, accounting, or other professional advice. Readers should consult their own advisors regarding their specific facts, circumstances, and applicable legal or regulatory requirements. Our solutions are designed to support and enhance internal compliance, oversight, and operational workflows. They do not constitute legal advice or replace independent compliance responsibilities. Each organization remains responsible for determining and meeting its own regulatory obligations.
Tools like ChatGPT and Claude do have search and retrieval features — but those come from the platform, not the underlying model. Here's what the model itself is actually doing.
The LLM itself has no internet access — its knowledge is frozen at training cutoff. When ChatGPT searches the web, that's a tool the platform added. The model still only generates text from what it's given in context.
Some platforms inject retrieved documents into the context before the model runs. That retrieval step is external. The model itself doesn't query anything — it generates tokens from whatever was placed in its context window.
The model doesn't read during inference. Training compressed text patterns into numerical weights. At runtime, there is no library — only a statistical model predicting the next token from context.
Given everything written so far, predict what word is most likely to come next. That's it. That's the whole operation.
Context Window
calculating next token...
The compliance officer reviewed the
Next-Token Probability Distribution
The model scores every possible next word simultaneously. The winner joins the context — changing what every future prediction will be.
✕ Prompt A — Dangerous
“Does this loan disclosure satisfy the APR calculation requirement?”
No document. No rule citation. No institution context provided.
✓ Prompt B — Grounded
“TILA disclosure [attached]. Calculated APR: 6.999%. Does the disclosed APR comply with 12 CFR 1026.22(a)? Flag any discrepancy exceeding 1/8 of 1%.”
Document, rule, and explicit criteria all in context.
↓ watch what the model generates from each context
Does this disclosure comply with the APR requirement?
calculating next token…
Next-Token Probability Distribution
[TILA disclosure attached] Disclosed APR: 6.750% · Amortization APR: 6.999% · 12 CFR 1026.22(a) tolerance: 0.125%
calculating next token…
Next-Token Probability Distribution
The rule: Completeness over conciseness. The most dangerous prompts leave out constraints. The model fills gaps with probability — not fact.
Everything the model can "know" lives in the context window. It has no memory beyond it.
Start a new chat → it forgets everything. Conversation gets very long → older content gets pushed out.
System prompt (tool instructions)
Your question + pasted policy text
Previous turns in this conversation
Earlier conversation (pushed out)
If a key policy citation from earlier in your conversation has scrolled out of the context window — the model no longer knows it exists.
The model is doing exactly what it was designed to do: predicting the most plausible next token.
When context is incomplete, it fills the gap with whatever is statistically most likely — based on patterns from training data.
It has no mechanism to detect that a fact is missing. Absence of information is invisible to it.
Fluency ≠ accuracy. The model learned that "good answers" sound definitive. A hallucinated CFR citation reads exactly like a real one.
Treatment: provide the source document in context. Verify any factual claim against the primary source before relying on it.
Treat AI-generated fluency as a flag to verify, not a signal of accuracy.
How reasoning models generate a chain of thought before answering
Why the training data is fundamentally different
Why context dependency doesn't go away — it compounds
What this means for your review process
Does this exception warrant escalation under our BSA policy given these three transaction patterns?
Yes, this exception warrants escalation based on the pattern of structuring behavior evident in the transactions...
No intermediate reasoning. If the early pattern match is off, the answer builds on that without recourse.
Does this exception warrant escalation under our BSA policy given these three transaction patterns?
💭 Step 1: What does our BSA policy say about structuring thresholds?
💭 Step 2: Do any transactions individually cross $10,000?
💭 Step 3: Together, do they show a pattern consistent with §1010.314?
Based on the policy analysis and the transaction pattern: yes, escalation is warranted because...
Standard Training (SQuAD-style Q&A)
Reasoning Training (Process-based)
Learns to think like an auditor. Handles novel fact patterns by applying process, not just recalling training examples.
Single Pass Model
Does this TILA disclosure comply with the APR calculation requirement under 12 CFR 1026.22?
↓
Based on the information provided, the disclosure appears to satisfy the APR disclosure requirements under Regulation Z...
Generates step-by-step. Each step becomes context for the next step — so early errors compound.
Reasoning Model
Does this TILA disclosure comply with the APR calculation requirement under 12 CFR 1026.22?
Thinking...
Step 1: § 1026.22 requires disclosed APR to be within 1/8 of 1% for regular transactions. Let me apply that standard.
Thinking...
Step 2: Disclosed APR: 6.875%. Calculated APR from finance charge / amount financed: 6.999%.
Thinking...
Step 3: Difference is 0.124% — just under the 0.125% tolerance. Technically within threshold but marginal.
⚠ Within technical tolerance, but only by 0.001%. Verify the finance charge calculation before concluding — a rounding error could push this into violation.
Where AI genuinely helps in compliance work
Where those same mechanics create real risk
Four rules that follow directly from the mechanics
A pre-submission checklist you can use now
Provide the rule text, the facts, and the decision criteria — don't assume the model knows them. Dangerous prompts are the ones that leave out constraints.
For any factual claim — a citation, a threshold, a date — verify against the primary source. The model's answer is its best prediction of what a correct answer looks like.
In multi-step workflows, earlier outputs become context for later steps. Review each step before it becomes the foundation for the next one.
LLMs are trained on confident-sounding prose. Fluency is a flag to verify — not a signal of accuracy. Treat it like any unverified assertion before you act on it.
Click each item to check it off. The model can only work with what you give it.
"The model predicts from context. Context is everything. You control what goes in — and you are responsible for verifying what comes out."
RAG and Tools
How AI gets information it was not trained on, where retrieval fails, and what tool-calling changes for governance and accountability.
Spring Labs · AI-Native Compliance Committee · 2026