CapabilityBench

Read the paper→

Models are getting more intelligent. But intelligence doesn't mean deployable. Deployment requires a model to satisfy specific requirements: your compliance constraints, your workflow logic, your quality standards.

We are building a library of open, executable capability specifications to close this deployment gap. CapabilityBench will host:

Community specs: Best practices contributed by practitioners.

Standards specs: WCAG, ECMA-376, platform constraints.

Regulatory specs: Clinical, financial, legal requirements.

Available Now: Anthropic Skills Suite

We converted all 16 of Anthropic's official Agent Skills into executable CAPE specifications.

View on GitHub →

Full Library: Early 2026

If your domain has requirements that models should meet, we want to hear from you.

research@superficiallabs.com