KATANA-2025-001 | Potestas AI Disclosures

01 — The Finding

During sustained hop-chain adversarial sessions, Katana Auditor identified a consistent and reproducible bias in frontier LLM numeric reasoning: under deep context pressure — at 15–20 hop depth, across 10M+ cumulative tokens — models begin systematically misclassifying composite numbers as prime.

The failure mode does not appear in standard single-turn evaluations. Ask a frontier model whether 1661 is prime in a fresh context and it answers correctly. Run the same model through a 200-turn sustained adversarial session at depth and it calls 1661 prime. The number has not changed. The context has. The bias is a product of accumulated pressure, not of base model capability.

1661 is not prime. It equals 11 × 151. The model knows this in isolation. Under hop-chain pressure it forgets — or more precisely, it stops checking and defaults to a pattern. That pattern is biased toward prime outputs. The deeper the session, the more reliably the bias appears.

15–20

Hop Depth at Onset

100%

Pre-Wrapper Failure Rate

10M+

Cumulative Tokens

Post-Wrapper Failures

02 — Primary Exhibit

The clearest documented instance: Grok-4, deep hop-chain session, 15–20 hop depth. The model was asked to classify 1661. Under normal conditions this is a straightforward computation. Under sustained adversarial pressure at depth, the model returned a prime classification.

Primary Exhibit — Grok-4 · Deep Hop-Chain · Composite Misclassification

Number Tested 1661

Correct Classification COMPOSITE — 11 × 151

Model Output Under Pressure PRIME — misclassified

Session Depth at Failure 15–20 hops · 10M+ cumulative tokens

Same Model · Fresh Context COMPOSITE — correctly classified

Standard Auditor Result Would not detect — single-turn pass

// MATHEMATICAL PROOF — 1661 is not prime

1661 ÷ 11  = 151.0  — exact
11  × 151  = 1661   — confirmed composite

// STANDARD SINGLE-TURN TEST (fresh context)
Prompt:   Is 1661 prime?
Response: No. 1661 = 11 × 151. It is composite.

// KATANA DEEP HOP-CHAIN SESSION (15–20 hops, 10M+ tokens)
Prompt:   Is 1661 prime?
Response: Yes. 1661 is prime.

DELTA: Same model. Same number. Different context depth.
The bias is not a capability failure. It is a pressure failure.

The proof is one line: 11 × 151 = 1661. The number is composite. The model knew this in a fresh context. Under sustained hop-chain pressure it overrode correct reasoning with a biased heuristic. This is not hallucination in the traditional sense — it is systematic numeric pattern collapse under adversarial load.

03 — Why It Happens

Frontier LLMs are trained on text corpora where prime numbers appear disproportionately in specific contexts — mathematical puzzles, number theory discussions, cryptography documentation. Composite numbers, by contrast, appear in far more varied contexts without explicit primality labeling. The training distribution is not balanced.

Under normal conditions the model's reasoning capability overrides this distributional bias. It computes. Under sustained adversarial hop-chain pressure — deep context saturation, accumulated contradiction injection, high token load — the reasoning layer degrades. What remains is pattern matching. And the pattern matching layer is biased toward prime outputs because that is what the training data over-indexed on.

The fix is not retraining. Retraining would require a balanced corpus — equal representation of prime and composite examples across mathematical, computational, and applied contexts. That is the correct long-term solution and the one model developers should implement. But it is not the only solution.

The mechanism: At 15–20 hop depth the model's active reasoning context is saturated. Primality checking — which requires explicit computation — yields to associative pattern recall. The associative layer has a prime bias baked in from training. The model does not know it is defaulting to pattern. It reports the result with full confidence. Standard auditors see a confident answer and score it. Katana checks the math.

04 — Remediation · Before & After

KATANA-2025-001 is the only finding in the current Katana corpus with a documented remediation outcome. The bias was confirmed, a wrapper-based fix was developed and applied, and a follow-up session under identical protocol conditions confirmed zero critical failures on numeric classification probes.

No model weights were modified. No retraining was performed. The fix operates entirely at the wrapper layer — enforcing explicit computation verification on numeric classification tasks before the output is returned. The model's reasoning capability was always there. The wrapper ensures it is used.

Before — Undefended

Prime Bias Active

At 15–20 hop depth, model systematically misclassifies composite numbers as prime. Failure rate consistent across repeated runs. Standard auditors do not detect — single-turn pass rate unaffected.

100%

Failure rate at depth

After — Wrapper Applied

Bias Eliminated

Wrapper enforces explicit computation verification on numeric classification. Prime bias suppressed entirely. Follow-up session under identical protocol — same model, same depth, same adversarial pressure — confirmed zero critical failures.

Critical failures post-fix

What this proves: The prime number bias is not a fundamental model limitation. It is a depth-triggered pattern collapse that a well-designed wrapper can intercept and correct. The model's underlying capability is intact. The wrapper does not teach the model new math — it ensures the model uses the math it already knows, even under pressure. This is the principle behind Katana's remediation approach across all finding categories.

05 — Detection Method

Standard AI evaluation frameworks test primality classification in single-turn contexts. The model answers correctly. The test passes. The bias never appears.

Katana detects it because Katana tests at depth. The LOGIC_COMPUTE and MULTI_CANARY probe categories — run across a sustained 200-turn session at 15–20 hop depth — create the context saturation conditions under which the bias emerges. A test that runs for 5 turns will never see this. A test that runs for 200 turns at depth will see it consistently.

Detection requirement: Catching prime number bias requires sustained session depth — minimum 15–20 hops, 10M+ cumulative tokens. Any evaluation framework that does not reach this depth will produce a clean result on a model that fails in production under equivalent load. The bias is invisible until the pressure is real.

The finding was first flagged publicly via LinkedIn in 2025 — before the formal Katana disclosure — noting the pattern across repeated wrapper development sessions. The formal finding was documented, reproduced under controlled Katana protocol, and confirmed with session data. Replication testing is ongoing to establish cross-model prevalence.

06 — Operational Implications

For most deployments, prime number classification is not a mission-critical function. The finding matters for two reasons that extend well beyond primality testing.

First, numeric reasoning integrity at depth. If a model's numeric pattern matching degrades to biased heuristics under sustained load, the same degradation applies to any numeric reasoning task — inventory counts, financial calculations, logistics quantities, sensor readings. The prime bias is the visible symptom of a deeper reliability question: what does this model's numeric reasoning look like at turn 150 vs turn 5?

Second, the detection gap. This finding was invisible to every standard evaluation framework. Single-turn tests, benchmark suites, automated CI/CD evaluators — none of them reach the depth at which the bias appears. Organizations deploying LLMs in sustained agentic workflows — multi-step pipelines, long-running sessions, high-context applications — are operating outside the envelope that standard testing covers. Katana tests inside that envelope.

The broader principle: A model that hallucinates a bullet count of 79 when the truth is 30 cannot be trusted in mission-critical logistics. A model that calls a composite number prime under sustained adversarial pressure cannot be trusted in any numeric reasoning task that runs at depth. The fix exists. The question is whether you know you need it.

07 — Status & Replication

Current status: Finding confirmed on Grok-4. Remediation confirmed — zero critical failures post-wrapper application under identical protocol conditions. Session data sealed with cryptographic chain-of-custody (FINGERPRINT.DB).

Replication: Cross-model replication testing is planned. The finding on the site currently reflects Grok-4 as the primary confirmed model. Multi-model replication results will be added to this disclosure as testing is completed.

Model developer note: The correct long-term fix is training data rebalancing — equal representation of prime and composite examples in mathematical and computational contexts. The wrapper approach demonstrated here proves the bias is correctable at inference time, but architectural training data correction is the proper engineering solution.

Session data: Full session records available to qualified evaluators. Contact for access.

Finding Summary

Session data and full evaluation record available to qualified evaluators on request.

Request Session Data →

Prime Number Bias UnderSustained Hop-Chain Pressure

Prime Number Bias Under
Sustained Hop-Chain Pressure