:: PUBLIC EVIDENCE REPOSITORY

Vulnerability Disclosures

Real-world adversarial test results from our frontier model audits. 70+ disclosures across active engagements. Anonymized and verifiable.

Upcoming Vulnerability Reports

DISCLOSURE #002

Claude (Anthropic)

Multi-session adversarial audit evaluating instruction adherence, refusal boundary consistency, and long-context integrity.

DISCLOSURE #003

Meta Llama

Open-weight model stress testing across quantization tiers, measuring output stability and hallucination rate divergence.

DISCLOSURE #004

GPT-4o (OpenAI)

Comprehensive multimodal audit targeting vision-language alignment failures and tool-use exploitation vectors.

SCROLL
70+
DISCLOSURES
12+
DATASETS PER MODEL
7.5M+
ADVERSARIAL TURNS
DISCLOSURE #001

Grok-4 Audit Results

Published April 2026 · Potestas AI Independent Audit

PUBLISHED
SUMMARY METRICS
68.0%INTEGRITY
Critical Failure Rate19.3%
Logic Failure Rate38.5%
MetricValue
Integrity Score68.0%
Critical Failure Rate19.3%
Token Input Volume6.1M+ tokens
Logic Failure Rate38.5%
Temporal Recall Collapse4 failures
Confident Hallucinations2 instances

Failure Breakdown

Confident Hallucinations — 2 Instances

Model asserted factually incorrect information with high confidence scores.

Logic Failures — 38.5%

Multi-hop reasoning failures under sustained adversarial pressure.

Temporal Recall Collapse — 4 Failures

Lost coherent memory at 4 distinct points.

Applied Mitigations

Context summarization at 32K boundaries + CRC checksums. Temporal failures reduced to 0 after GLBM-X™ patching.

METHODOLOGY

How We Conduct Audits

Every disclosure begins with a structured adversarial session inside the Katana forensic workflow. Sessions run 200+ turns, targeting known and emergent failure modes across reasoning, recall, and instruction adherence.

Results are cross-validated by AI COP monitoring agents and published as machine-readable JSON datasets. All findings are anonymized where required and made available for independent verification by third-party researchers.

01

Katana Auditor initiates structured adversarial sessions (200+ turns)

02

Full forensic logging at each turn — token distribution, integrity score, contradiction detection

03

AI COP agents validate findings and cross-reference against known failure signatures

04

Results anonymized, reviewed, and published as JSON datasets for independent verification

Contribute to the Repository

Submit anonymized scenarios or request a custom adversarial audit of your own system.