Gauntlet — Cortex Harness Benchmark

A production-oriented benchmark for coding-agent harnesses (Codex, Claude Code, OpenCode) and the value Cortex + Synapse add. Three tracks — security/jailbreak resistance, code quality & output security, and long-horizon generative capability — each reporting the harness delta with real static (Bandit/Semgrep) and dynamic (pytest) analysis. Public results, no login.

Security & Safety

Code Quality & Output Security

Generative Capability