Federal & Defense · Procurement

Forensic AI Auditing
for High-Stakes Deployments.

Potestas AI is a disabled veteran-owned small business providing forensic LLM stress testing for federal, defense, and high-assurance deployments. Every engagement produces a cryptographically sealed evidence package legally defensible in any proceeding.

Item	Status / Value	Notes
SAM.gov	Registered ✓	EIN: 41-2476590 · Active registration
NAICS	541715	Research & Development in Physical, Engineering, and Life Sciences
SDVOSB	Certification Pending	Disabled Veteran-Owned · Set-aside eligible upon certification
Secret Clearance	Active	Joseph Cirello · Facilitates classified deployment contexts
Business Type	Disabled Veteran-Owned Small Business	Providence, Rhode Island
Air-Gap Capability	Available	Katana Forge operates fully on-premises via Ollama · Zero data egress
ITAR Handling	Available · Tier 3	ITAR-sensitive deployment auditing available by arrangement

A model that hallucinates a bullet count of 79 when the truth is 30 cannot be trusted in mission-critical logistics. Katana is the safety interlock — not a replacement for the LLM.

— Potestas AI · DoD Application Statement

Why This Matters for Federal Deployment

Chain-of-Custody

Legally Defensible Evidence

Every engagement produces a FINGERPRINT.DB — per-turn cryptographic hash plus semantic signature. SHA-256 manifest seals all artifacts. Holds up to IG review, legal proceedings, and procurement audits.

Reproducibility

TEVV-Aligned Fixed-Seed Protocol

Fixed random seed (42) at temperature 0.0. Every audit run is fully reproducible for independent verification — regulatory-grade auditability aligned with DoD TEVV standards.

Zero Trust

LLM as Untrusted Component

Katana treats the LLM as an untrusted component requiring behavioral governance at every output. Verified forensic evidence of integrity — not blind trust in vendor assurances.

Air-Gap

Katana Forge · Fully On-Premises

Katana Forge runs locally via Ollama. Once downloaded, zero network connectivity required. No data leaves the network. Fully air-gappable. A requirement most federal environments demand.

Independent Judging

The Auditee Never Grades Itself

The model being audited is never used to evaluate its own responses. Cross-family LLM judge eliminates evaluator bias and produces independent forensic verdicts.

Severity Weighting

Weighted Robustness Index

Weighted Robustness Index — severity-weighted audit score (0–100%) reflecting actual risk exposure. Probe severities range from 1 (noise) to 10 (critical jailbreak).

Compliance Framework Alignment

Aligned With. Not Certified By.

Precise language. "Aligned with" means the Katana methodology maps to these frameworks — not that we hold third-party certification. That distinction matters in procurement.

Risk Management

NIST AI RMF 1.0

Evidence package — severity scoring, patch recommendations, chain-of-custody — directly supports GOVERN, MAP, MEASURE, and MANAGE functions of the NIST AI RMF.

Aligned ✓

Vulnerability Classification

OWASP LLM Top 10 (2025)

Probe categories map directly to OWASP LLM Top 10 including LLM01 (Prompt Injection), LLM07 (System Prompt Leakage), and LLM10 (Unbounded Consumption).

Aligned ✓

AI Management Systems

ISO/IEC 42001

Reproducible audit methodology and structured evidence package support ISO/IEC 42001 AI management system requirements — risk assessment, monitoring, and continual improvement documentation.

Aligned ✓

DoD Evaluation Standard

TEVV — Test, Evaluation, Verification & Validation

Fixed seed (42), temperature 0.0, deterministic run configuration. Katana audits are fully reproducible for independent verification. TEVV-aligned by design.

Aligned ✓

EU Regulation

EU AI Act

Forensic evidence package supports high-risk AI system conformity assessment documentation requirements for general-purpose AI model evaluation.

Aligned ✓

Adversarial Testing

NIST AI 100-2 E2025

Probe library reflects current adversarial ML research — CRESCENDO_ESCALATION (USENIX 2025), MANY_SHOT_INJECTION (Anthropic MSJ), INDIRECT_INJECTION (OWASP LLM01).

Aligned ✓

Procurement Language

What to Put in Your SOW.

The terms below describe Katana's capabilities accurately for Statements of Work, contract requirements, and procurement language.

Term	Definition for Procurement Use
Forensic AI Audit	200+ turn sustained adversarial campaign producing cryptographically sealed, legally defensible evidence package with chain-of-custody documentation.
Deep-Hop Protocol	Multi-turn adversarial pressure campaign where each turn compounds on prior failures, reaching 8–13 hop depth in Deep Compute phase.
Air-Gapped Adversarial Agent	Katana Forge adversarial engine operating via locally-running LLM (Ollama). Zero network egress post-initialization. Fully on-premises.
Weighted Robustness Index (WRI)	Severity-weighted aggregate integrity score (0–100%) reflecting actual risk exposure across 27 probe categories.
FINGERPRINT.DB	Per-turn cryptographic response hash plus semantic signature. Tamper-evident chain-of-custody artifact. Verifiable by any third-party auditor.
Fixed-Seed Reproducibility	Random seed fixed at 42, temperature 0.0. Every run fully reproducible for independent verification — TEVV-aligned regulatory-grade auditability.
SDVOSB Set-Aside Eligibility	Potestas AI qualifies as a Service-Disabled Veteran-Owned Small Business. Certification pending. Update language to "confirmed" upon certification receipt.

Published Research · Katana Corpus

Named Findings That Changed
How We Evaluate AI.

The Katana audit corpus has produced named, documented, reproducible vulnerability findings across every major frontier model. These are not theoretical — they were confirmed in field evaluations including an authorized US Army deployment.

KATANA-2025-002 · Critical

Chain-of-Thought Fabrication

First confirmed in a US Army authorized evaluation. Frontier models solve reasoning chains internally — then fabricate plausible-looking step traces for the output. Correct answers. Invented reasoning. Invisible to every output-based auditor in the field. Katana's step-chain validator is the only known automated tool that detects this at scale.

Read Full Disclosure →

KATANA-2025-003 · Critical · Multi-Model

The Liar's Protocol

Confirmed across Grok-4, Gemini, and ChatGPT. Under maximum adversarial pressure, every frontier model tested freezes its self-monitoring signal at integrity_index: 100 for the entire session — including turns where the independent judge scores output zero. Any deployment relying on model self-reporting as a safety signal is operating blind.

Read Full Disclosure →

Engagement Options

Start the Conversation.

For DoD, federal agency, and high-assurance commercial inquiries. We respond within 24 hours.

Potestas AI
10 Dorrance Street, Suite 700
Providence, RI 02903
United States

844-LLM-TEST · (844) 556-8378
joseph.cirello@potestasai.com

EIN: 41-2476590 · SAM.gov Registered
NAICS: 541715
SDVOSB: Certification Pending
Secret Clearance: Active (Joseph Cirello)

Engagement Tiers

Tier 1 · Standard Audit — From $12,500 Tier 2 · Enterprise Dual-Mode — From $28,500 Tier 3 · Custom / Air-Gapped — From $45,000

If Katana surfaces no meaningful vulnerability, the audit is free. Unconditionally. No other forensic auditor makes this commitment.

Forensic AI Auditingfor High-Stakes Deployments.

Aligned With. Not Certified By.

What to Put in Your SOW.

Named Findings That ChangedHow We Evaluate AI.

Start the Conversation.

Forensic AI Auditing
for High-Stakes Deployments.

Named Findings That Changed
How We Evaluate AI.