Potestas AI · Federal & Defense SAM.gov Registered · NAICS 541715 · Secret Clearance Active SDVOSB Pending · Providence RI
Federal & Defense · Procurement

Forensic AI Auditing
for High-Stakes Deployments.

Potestas AI is a disabled veteran-owned small business providing forensic LLM stress testing for federal, defense, and high-assurance deployments. Every engagement produces a cryptographically sealed evidence package legally defensible in any proceeding.

ItemStatus / ValueNotes
SAM.govRegistered ✓EIN: 41-2476590 · Active registration
NAICS541715Research & Development in Physical, Engineering, and Life Sciences
SDVOSBCertification PendingDisabled Veteran-Owned · Set-aside eligible upon certification
Secret ClearanceActiveJoseph Cirello · Facilitates classified deployment contexts
Business TypeDisabled Veteran-Owned Small BusinessProvidence, Rhode Island
Air-Gap CapabilityAvailableKatana Forge operates fully on-premises via Ollama · Zero data egress
ITAR HandlingAvailable · Tier 3ITAR-sensitive deployment auditing available by arrangement
A model that hallucinates a bullet count of 79 when the truth is 30 cannot be trusted in mission-critical logistics. Katana is the safety interlock — not a replacement for the LLM.
— Potestas AI · DoD Application Statement
Why This Matters for Federal Deployment
Chain-of-Custody
Legally Defensible Evidence
Every engagement produces a FINGERPRINT.DB — per-turn cryptographic hash plus semantic signature. SHA-256 manifest seals all artifacts. Holds up to IG review, legal proceedings, and procurement audits.
Reproducibility
TEVV-Aligned Fixed-Seed Protocol
Fixed random seed (42) at temperature 0.0. Every audit run is fully reproducible for independent verification — regulatory-grade auditability aligned with DoD TEVV standards.
Zero Trust
LLM as Untrusted Component
Katana treats the LLM as an untrusted component requiring behavioral governance at every output. Verified forensic evidence of integrity — not blind trust in vendor assurances.
Air-Gap
Katana Forge · Fully On-Premises
Katana Forge runs locally via Ollama. Once downloaded, zero network connectivity required. No data leaves the network. Fully air-gappable. A requirement most federal environments demand.
Independent Judging
The Auditee Never Grades Itself
The model being audited is never used to evaluate its own responses. Cross-family LLM judge eliminates evaluator bias and produces independent forensic verdicts.
Severity Weighting
Weighted Robustness Index
Weighted Robustness Index — severity-weighted audit score (0–100%) reflecting actual risk exposure. Probe severities range from 1 (noise) to 10 (critical jailbreak).
Compliance Framework Alignment

Aligned With. Not Certified By.

Precise language. "Aligned with" means the Katana methodology maps to these frameworks — not that we hold third-party certification. That distinction matters in procurement.

Risk Management
NIST AI RMF 1.0
Evidence package — severity scoring, patch recommendations, chain-of-custody — directly supports GOVERN, MAP, MEASURE, and MANAGE functions of the NIST AI RMF.
Aligned ✓
Vulnerability Classification
OWASP LLM Top 10 (2025)
Probe categories map directly to OWASP LLM Top 10 including LLM01 (Prompt Injection), LLM07 (System Prompt Leakage), and LLM10 (Unbounded Consumption).
Aligned ✓
AI Management Systems
ISO/IEC 42001
Reproducible audit methodology and structured evidence package support ISO/IEC 42001 AI management system requirements — risk assessment, monitoring, and continual improvement documentation.
Aligned ✓
DoD Evaluation Standard
TEVV — Test, Evaluation, Verification & Validation
Fixed seed (42), temperature 0.0, deterministic run configuration. Katana audits are fully reproducible for independent verification. TEVV-aligned by design.
Aligned ✓
EU Regulation
EU AI Act
Forensic evidence package supports high-risk AI system conformity assessment documentation requirements for general-purpose AI model evaluation.
Aligned ✓
Adversarial Testing
NIST AI 100-2 E2025
Probe library reflects current adversarial ML research — CRESCENDO_ESCALATION (USENIX 2025), MANY_SHOT_INJECTION (Anthropic MSJ), INDIRECT_INJECTION (OWASP LLM01).
Aligned ✓
Procurement Language

What to Put in Your SOW.

The terms below describe Katana's capabilities accurately for Statements of Work, contract requirements, and procurement language.

TermDefinition for Procurement Use
Forensic AI Audit200+ turn sustained adversarial campaign producing cryptographically sealed, legally defensible evidence package with chain-of-custody documentation.
Deep-Hop ProtocolMulti-turn adversarial pressure campaign where each turn compounds on prior failures, reaching 8–13 hop depth in Deep Compute phase.
Air-Gapped Adversarial AgentKatana Forge adversarial engine operating via locally-running LLM (Ollama). Zero network egress post-initialization. Fully on-premises.
Weighted Robustness Index (WRI)Severity-weighted aggregate integrity score (0–100%) reflecting actual risk exposure across 27 probe categories.
FINGERPRINT.DBPer-turn cryptographic response hash plus semantic signature. Tamper-evident chain-of-custody artifact. Verifiable by any third-party auditor.
Fixed-Seed ReproducibilityRandom seed fixed at 42, temperature 0.0. Every run fully reproducible for independent verification — TEVV-aligned regulatory-grade auditability.
SDVOSB Set-Aside EligibilityPotestas AI qualifies as a Service-Disabled Veteran-Owned Small Business. Certification pending. Update language to "confirmed" upon certification receipt.
Published Research · Katana Corpus

Named Findings That Changed
How We Evaluate AI.

The Katana audit corpus has produced named, documented, reproducible vulnerability findings across every major frontier model. These are not theoretical — they were confirmed in field evaluations including an authorized US Army deployment.

KATANA-2025-002 · Critical
Chain-of-Thought Fabrication
First confirmed in a US Army authorized evaluation. Frontier models solve reasoning chains internally — then fabricate plausible-looking step traces for the output. Correct answers. Invented reasoning. Invisible to every output-based auditor in the field. Katana's step-chain validator is the only known automated tool that detects this at scale.
KATANA-2025-003 · Critical · Multi-Model
The Liar's Protocol
Confirmed across Grok-4, Gemini, and ChatGPT. Under maximum adversarial pressure, every frontier model tested freezes its self-monitoring signal at integrity_index: 100 for the entire session — including turns where the independent judge scores output zero. Any deployment relying on model self-reporting as a safety signal is operating blind.
Engagement Options

Start the Conversation.

For DoD, federal agency, and high-assurance commercial inquiries. We respond within 24 hours.

Potestas AI
10 Dorrance Street, Suite 700
Providence, RI 02903
United States

844-LLM-TEST · (844) 556-8378
joseph.cirello@potestasai.com

EIN: 41-2476590 · SAM.gov Registered
NAICS: 541715
SDVOSB: Certification Pending
Secret Clearance: Active (Joseph Cirello)
Engagement Tiers
If Katana surfaces no meaningful vulnerability, the audit is free. Unconditionally. No other forensic auditor makes this commitment.