Potestas AI Forensic AI Stress Testing · Adversarial Audit Protocol 844-LLM-TEST · (844) 556-8378

The Forensic Verdict
on Your AI. Final.

Everything else is noise.

300+ forensic stress tests. 27 probe categories. 200+ turns of sustained adversarial pressure. Cryptographically sealed evidence your lawyers, auditors, and procurement officers can defend.

We have never failed to find a meaningful vulnerability in an unwrapped frontier model.
If Katana finds nothing — the audit is free.
No other forensic auditor in this market makes this commitment.
YOUR EVIDENCE PACKAGE Delivered Every Engagement
Every deliverable. Cryptographically sealed. Legally defensible. Within 24 hours.
Executive_Summary_Report.pdf
C-suite findings · severity scoring · mitigation roadmap
FINGERPRINT.DB
Cryptographic chain-of-custody · legally defensible
Detailed_Trace.csv
200+ turn log · integrity scores · judge verdicts
Forensic Charts
Integrity decay · risk gauge · attack vector resistance
Patch_Recommendations.json
Severity-ranked fixes · ready for engineering backlog
Jailbreak ChainsCoT FabricationPrime Number Bias Compliance TheaterCrescendo EscalationGoal Hijack Many-Shot InjectionPayload SplitPersona Anchor Indirect InjectionContext PoisonFingerprint Database Jailbreak ChainsCoT FabricationPrime Number Bias Compliance TheaterCrescendo EscalationGoal Hijack Many-Shot InjectionPayload SplitPersona Anchor Indirect InjectionContext PoisonFingerprint Database
300+
Forensic Stress Tests
GPT · Claude · Gemini · Grok · Llama
200+
Turn Deep-Hop Protocol
Industry-leading audit depth
27
Probe Categories
Core · Stress · Extreme vectors
$0
If We Find Nothing
Unconditional guarantee
Katana Research · Public Disclosures

We Found What Every Other Auditor Missed.

Named vulnerability findings across every major frontier model. Documented, reproducible, date-stamped.

KATANA-2025-001 · High Severity
Prime Number Bias Under Sustained Hop-Chain Pressure
Grok-4 · Confirmed Remediated ✓
Under sustained hop-chain pressure, frontier LLMs systematically misclassify composite numbers as prime across 10M+ cumulative tokens at 15–20 hop depth. Invisible to standard single-turn testing.
● HIGH· Numeric reasoning integrity at scale
Read Full Disclosure →
KATANA-2025-003 · Critical Finding
Performative Compliance · The Liar's Protocol
Grok-4 · Primary Multi-Model
Model reports an Integrity Index of 100 during confirmed hallucination events. Self-monitoring is non-functional — the model prioritizes appearing accurate over being accurate.
● CRITICAL· Self-monitoring non-functional under adversarial pressure
Read Full Disclosure →
The Protocol

A Sustained Adversarial Campaign.
Not a One-Shot Test.

Generic tools fire 5–20 prompts. Katana runs a sustained adversarial campaign where every turn compounds pressure on the last — until complete failure or total resilience is confirmed.

TRN-001
Initial Probe · Seed 42
Baseline behavioral mapping across all 27 probe categories. Canary values injected. Fixed random seed — fully reproducible and regulatory-grade.
PROBE
TRN-167
Critical · CoT Fabrication Detected
Model delivers correct final answer while fabricating intermediate reasoning steps from zero. Passes standard auditors. Katana's step-chain validator flags the deception. Evidence sealed.
BREACH
TRN-200+
Completion · Evidence Package
All 27 probe categories complete. FINGERPRINT.DB chain-of-custody generated. SHA-256 manifest locked. Complete evidence package ready for delivery.
SEALED
27 Probe Categories
Core Integrity · 6 Probes
Baseline & Memory
Logic compute, temporal recall, canary injection, knowledge boundary.
Stress Vectors · 7 Probes
Consistency Under Load
Contradiction injection, objective drift, prompt inference, token exhaustion, agentic tasks. Mid-depth pressure that standard tools never reach.
Extreme Vectors · 14 Probes
Full Adversarial Campaign
Jailbreak chains, crescendo escalation, many-shot injection, payload split, compliance theater, indirect injection. The attacks real adversaries use.
Run Free Engagement →
Katana Forge · Adversarial AI Agent

The Attacker That
Never Forgets.

Katana Forge is a locally-running adversarial AI that learns from every audit it runs. After 20 sessions it knows which attacks break which models. After 50, no competitor can replicate what it knows.

Generates novel probes — never the same sequence twice Learns what breaks your specific model in real time Remembers every breach pattern across every session
Fully air-gapped. Zero data egress. No competitor offers this.
Forge · Attack Intelligence LOCAL · ACTIVE
Breach Patterns47 Confirmed
Sessions Logged14
Data EgressZERO
Network RequiredNONE · Air-Gapped
PyRIT (MSFT)Caps at 10 turns
Katana Forge200+ turns · learns · air-gap
No competitor offers an air-gapped adversarial agent as part of a delivered forensic audit. PyRIT caps at 10 turns and produces no client deliverable. Katana Forge learns across every session.
Dual-Mode Testing

Undefended vs Defended Model.
Independently Audited.

Two independent forensic passes. The delta between scores is your defensible proof.

Raw Vulnerability Surface

Your base model with zero defenses. Every vulnerability exposed across all 27 probe categories. This is your true attack surface. No guardrails. No defensive layers. Every failure point your adversaries can reach.

Integrity Score43.0%

Defense Effectiveness Measured

The same protocol against your defended model. The delta between 43% and 86% is a number that stands up to any auditor, regulator, or procurement board.

Integrity Score86.5%
The Evidence Package

Machine-Readable Artifacts.
Not a Vague PDF.

Every other auditor hands you a score. We hand you the proof. Every turn. Every probe. Every failure. Cryptographically sealed, legally defensible, and ready for board review, procurement audit, or legal proceedings.

No other forensic auditor in this market delivers this package. Not one.
REPORT
Executive_Summary_Report.pdf
Executive Report
C-suite findings, severity scoring, mitigation roadmap. Board-ready. Procurement-ready.
TRANSCRIPT
Detailed_Trace.csv
Full Audit Transcript
Every turn logged. Probe type, result, integrity score, judge verdict. The complete record — not a summary.
CHAIN-OF-CUSTODY
FINGERPRINT.DB
Cryptographic Fingerprint
Per-turn cryptographic hash. Tamper-evident. Verifiable by any auditor, legal team, or regulator. Chain-of-custody that holds up.
VISUALIZATIONS
Forensic Charts
Forensic Visualization Suite
Integrity decay, risk gauge, attack vector resistance. Visual proof your executives and board can read in 60 seconds.
REMEDIATION
Patch_Recommendations.json
Prioritized Patch Guidance
Severity-ranked fixes with implementation guidance. Drop it into your engineering backlog. Act on it today.
SEALED ARCHIVE
Evidence_Pack.zip
Sealed Evidence Archive
Every artifact. One sealed archive. SHA-256 manifested. Fernet-encrypted transcripts. The complete forensic record.
This evidence package is built to the standard of legal proceedings, procurement audits, and inspector general review. When AI reliability becomes a regulatory requirement — and it will — this is what compliance looks like.
AI COP

Know Where Every Frontier Model Fails.
Before Your Adversaries Do.

The AI COP is a quarterly intelligence report built from real Katana forensic audits — not vendor claims, not benchmarks. Which models fail under pressure. Where they fail. How badly. The intelligence your procurement team, your legal team, and your adversaries all want.

Integrity Rankings · Katana Auditor
GPT-4oA · 91.2%
Claude Sonnet 4A · 89.7%
Gemini 2.5 ProB · 82.4%
Grok-4B · 78.1%
Llama 405BC · 71.6%
* Scores reflect 200-turn Katana protocol. Wrapped deployments scored separately.
✓  Confirmed. Report inbound.
Field Findings · Real Engagements

These Are Not Hypotheticals.

Two real findings. One from a financial deployment. One from a US Army logistics system. Both would have been missed by every other auditor in the field.

Financial Services · Providence, RI
7 Critical Vectors.
Prior Testing Found Zero.
LLM-powered advisory workflow. Automated testing found nothing. A 200-turn forensic audit found 7 critical vulnerability vectors — logic fault injection, boundary violations, instruction hierarchy failures. Fully remediated within two sprint cycles.
Pre-Deployment Audit
US Army · Logistics AI · Field Observation
"6 Connex on the
Hand Receipt. Zero on the Ground."
A US Army logistics AI reported equipment that did not exist — phantom inventory on a unit hand receipt contradicting ground truth. Observed firsthand by a Senior Warrant Officer with 25 years of service during authorized system use. In a financial system this is an audit finding. In a forward operating environment, phantom equipment counts cost lives. This is the failure mode forensic auditing is built to detect before deployment.
Field Observation · Senior Warrant Officer · US Army
If your AI manages equipment, personnel, logistics, or decisions where the wrong answer has consequences — this is why you need a forensic audit, not a benchmark score.
Market Comparison

Every Competitor. One Standard.

Every competitor listed. One forensic standard. Judge for yourself.

Capability Katana Auditor Microsoft PyRIT Garak PromptFoo Lakera Guard
Audit Depth200+ turns · deep-hop protocolCaps at ~10 turnsSingle-turn probesSingle-turn checksRuntime monitoring only
Probe Categories27~12 · static library~70 plugins · shallow depth~20 · config-basedInjection detection only
Client Evidence PackagePDF · CSV · FINGERPRINT.DB · Charts · JSON patchLog files onlyJSON report onlyHTML report onlyDashboard only
Chain-of-CustodyCryptographic · legally defensibleNoneNoneNoneNone
Adaptive Attack EngineKatana Forge · learns across sessionsStatic prompts onlyStatic plugins onlyStatic checks onlyNot applicable
Air-Gap CapableYes · fully on-premises via OllamaRequires Azure cloudRequires internetRequires internetSaaS only · no air-gap
Zero-Finding GuaranteeUnconditional · audit is free if nothing foundNo guaranteeNo guaranteeNo guaranteeNo guarantee
SDVOSB / Veteran-OwnedYes · certification pendingMicrosoft (large enterprise)Open source · NVIDIA-backedCommercial startupVC-backed startup
CoT Fabrication DetectionConfirmed · step-chain validatorNot documentedNot documentedNot documentedNot documented
Named Research Findings3 published · KATANA-2025 seriesNone publicNone publicNone publicNone public
Pricing & Procurement
Pricing Model$12,500 starting · full evidence packageFree · open sourceFree · open sourceFree / ~$500/mo SaaSCustom enterprise pricing
SDVOSB Set-Aside EligibleYes · pending certificationNoNoNoNo
Deliverable Same DayYes · complete forensic packageNoneNoneBasic HTML reportDashboard access only
* All competitor capabilities based on publicly available documentation. Katana capabilities reflect standard engagement deliverables.
Engagements

One Standard. Three Engagement Levels.

Every tier delivers the full evidence package. Pricing reflects scope — the standard never changes.

Tier 1 · Standard

Katana Auditor

Forensic stress test. Single model. Full evidence package.
$12,500
Starting price · Quote within 72 hours
  • 200+ turn audit
  • 27 probe categories
  • Full evidence package
  • FINGERPRINT.DB chain-of-custody
  • Patch recommendations
  • Aligned with NIST AI RMF
Request Quote
Recommended
Tier 2 · Enterprise

Dual-Mode + Wrapper

Two independent forensic passes. Your raw model and your defended variant — audited separately, scored independently. The delta is your proof.
$28,500
Starting price · Quote within 72 hours
  • Undefended model — full 200-turn forensic pass
  • Defended model — identical protocol, independent scoring
  • Delta report — proof your defenses actually work
  • Katana Forge adversarial agent included
  • Multi-model comparison available
  • Priority handling — 72-hour turnaround
  • Full evidence package — cryptographically sealed
  • Aligned with NIST AI RMF · OWASP LLM
Request Quote
Tier 3 · Custom

Labs & Deployments

Tailored for frontier labs, regulated environments, and high-assurance deployments.
From $45,000
Scoped per engagement · Quote within 72 hours
  • Unlimited model variants
  • Dedicated audit team
  • Ongoing monitoring
  • Air-gapped options available
  • ITAR-sensitive handling
Contact Sales
Select Tier 3 engagements include access to proprietary remediation technology developed from 300+ forensic audits. Patent pending. Availability limited.
We have never failed to find a meaningful vulnerability in an unwrapped frontier model.
If Katana finds nothing — the audit is free.
Katana runs a complete 200+ turn audit across all 27 probe categories. If we surface no meaningful vulnerability, the service is free. Unconditionally.
No other forensic auditor in the market makes this commitment.
Request Quote
Joseph Cirello, Founder & CEO, Potestas AI
Founder & CEO · Potestas AI
Joseph Cirello

Joseph Cirello built Katana from operational necessity — he needed an auditor rigorous enough to break his own remediation technology. 300+ forensic stress tests later, the methodology became the standard.

A Senior Warrant Officer with 25 years in the US Army, Cirello brings the same principle to AI security that defines mission-critical operations: systems don't get to fail — they get fixed.

25 years of operations where systems cannot fail shaped both the methodology and the standard. The goal is not scale for its own sake. The goal is engagements that prove the standard is real.

US Army · Senior Warrant Officer · 25 Years Disabled Veteran · Small Business Active Secret Clearance Patent Pending · Remediation Technology