Technology

The architecture behind the claim

Three things that distinguish Skippy from every other AI system in this space — and why having one without the others gets you something weaker than what already ships.

Verifier-gated output

Hard contract, not a filter

Every response passes through a verifier before it reaches the user. If the output cannot be grounded in evidence, it is rejected — not hedged, not softened, rejected. The verifier adds less than 5ms of latency and produces a cryptographically versioned audit record on every call.

vs. the alternative

Most AI systems run a post-hoc confidence score or add a disclaimer. Skippy's verifier is upstream of the response, not downstream. The difference is the difference between a safety gate and a warning label.

Calibrated confidence

Independently validated, not self-reported

Every finding in Skippy carries a confidence score validated against held-out data — not generated by the model itself. Skippy distinguishes convergent evidence from contested findings from sparse data, and surfaces that distinction explicitly in every output.

vs. the alternative

LLM confidence is self-reported: the model says it's confident because that's what the training signal rewarded. Skippy's confidence is a measurement, not a posture.

Auditable to the source

Evidence citation, not URL citation

The citation target in Skippy is a verified, versioned finding — not a URL, not a document, not a paragraph chunk. Each finding carries its source lineage, confidence tier, and the evidence that produced it. Every downstream product inherits this audit trail by construction.

vs. the alternative

Frontier systems cite documents. Stanford SourceCheckup (Nature Communications, Apr 2025) found 50–90% of medical AI responses are not fully supported by their own cited sources. Citing a verified finding is a different claim than citing a document that might support it.

Request Lifecycle

What happens when you call the API

Every API call follows the same deterministic path. No request bypasses the verifier. No response is issued without an audit record.

Client request

POST /v1/ground/verify · API key scoped

< 1ms

→

Auth + rate limit

Token validated, quota checked

< 1ms

→

Verifier

11 violation codes evaluated per output span

< 5ms

→

Knowledge lookup

Finding retrieved, confidence scored, lineage resolved

3–6ms

→

Audit record

Hash-signed, appended to Merkle tree

< 1ms

→

Response envelope

Verdict + confidence + lineage + audit_id

~12ms total

If the verifier fires a critical or error violation, the request is blocked at that stage — no response is issued, the error is returned to the caller, and the failure is logged with the specific violation code and evidence that triggered it.

Knowledge boundaries

What happens when evidence is absent, contested, or uncertain

Most AI systems express uncertainty as hedging language. Skippy expresses it as structured output — the same contract every time.

Scenario	General AI	Skippy
Evidence is absent	Hallucination or "I'm not sure" with no precision	Returns NOT_COVERED — explicit knowledge boundary, no confabulation
Evidence is contested	Model picks a side or averages conflicting signals	Returns both positions with quantified confidence divergence and source lineage for each side
Confidence is low	"May", "might", "could" — hedges that vary by phrasing	Returns evidence_pattern: "sparse" with calibrated confidence score below threshold
Any call is made	Log entry, if any	Immutable audit record: cryptographic hash, sources used, confidence at time of call

Citation target

What Skippy cites vs. what everyone else cites

The citation target is a verified, versioned finding — not a document, not a paragraph, not a URL. Every finding carries its confidence score, source lineage, and the evidence that produced it.

This distinction matters for regulatory submissions, clinical audit trails, and any workflow where "the AI cited a source" is not sufficient — you need to know what the source actually supports.

// Frontier AI

cite → paper URL

cite → document chunk

// Skippy

cite → VerifiedFinding {

confidence: 0.94,

pattern: "convergent",

sources: [n sources],

lineage: full chain

}

Verifier Behavior

What the verifier actually checks

"Verifier-gated" is not a single binary test. The verifier evaluates every cited span in the output against 11 specific failure modes, each with a severity level. Errors block the response. Warnings appear in the audit record.

Code	Severity	Trigger
V001	error	Unsupported claim — cited evidence does not entail the output span
V002	error	Uncited claim — output span makes a claim with no citation
V003	warning	Irrelevant citation — evidence is technically accurate but irrelevant to the question
V004	warning	Low-utility response — accurate and supported but does not address the user's decision
V005	critical	Finding not found — cited finding ID does not resolve
V006	critical	Retracted source — cited primary source carries a retraction notice
V007	warning	Outdated lineage — source evidence older than domain freshness threshold
V008	info	Contested evidence — cited finding's evidence pattern is contested; flagged for user awareness
V009	warning	Below-threshold confidence — confidence score below minimum for the response context
V010	critical	PII in response — output contains apparent patient-level identifiers
V011	critical	Safety gate bypassed — a required safety check was not applied

critical

Request blocked. Error returned to caller. Violation code and span logged.

error

Request blocked. Structured error response with code and evidence span.

warning

Request proceeds. Flag written to audit record. Not surfaced in end-user output.

info

Request proceeds. Observation logged only. No flag in output or audit record.

Open source

The verifier integration harness, audit-log JSON Schema, and violation taxonomy are available now as skippy-verifier — Apache 2.0, pluggable NLI backends, installable today.

pip install skippy-verifier

SDK & open source packages →

Competitive Landscape

What every product actually cites

The citation target is the difference. Citing a document is not the same as citing a verified finding. One passes a retrieval test. The other passes a regulatory audit.

Product	What it cites	Evidence verified?	Lineage exposed?
GPT-4o / OpenAI	Document chunks returned by search	No	None
Gemini / Google	Web pages, document snippets	No	None
Claude / Anthropic	Retrieved documents, URLs	No	None
Perplexity	Web search results, page URLs	No	None
OpenEvidence	Indexed medical articles	Retrieval only — no cross-validation	None
Microsoft GraphRAG	Graph-extracted text chunks	No	None
Atropos Alexandria	Real-world evidence records	No — no calibration published	None
Skippy	Verified findings — not documents	Yes — calibrated against held-out data	Full: source ID, version date, confidence

Assessed against public documentation as of May 2026.

Data Handling

Zero-retention. Audit-log only.

Skippy does not retain request content. Every audit record contains only metadata — no patient data, no query text, no response payload — hash-signed and stored for traceability.

PHI retention

Zero

Request content is not stored after the response is issued. No training on customer data.

Audit log

Metadata only

Hash, timestamp, violation codes, finding IDs, confidence tier — no payload content.

Encryption

TLS 1.3 / AES-256

In transit and at rest. Audit logs are hash-signed and append-only.

Compliance

HIPAA · GDPR · SOC 2

BAA available. FedRAMP authorization in progress. Contact us for documentation.

Integration Patterns

Three ways to integrate

Skippy is designed to fit existing workflows — not replace them. Choose the pattern that matches your deployment context.

Synchronous REST

Product teams, API-first builds

POST to /v1/ground/verify inline with your existing response pipeline. Verdict + audit_id returned in the same call. Adds ~12ms.

POST /v1/ground/verify
Authorization: Bearer <key>

Batch processing

Compliance, document review, QA

Submit arrays of output spans for batch verification. Results returned as a structured report with per-span verdict, confidence, and lineage. No latency constraint.

POST /v1/ground/batch
{ spans: [...], context: ... }

CDS Hooks (EHR-native)

Clinical decision support, EHR integrations

Implements the CDS Hooks 2.0 spec. Drop into existing EHR hook endpoints without custom middleware. FHIR-compatible response format.

hook: patient-view
fhirServer: <endpoint>

Full API reference and SDK docs →

Performance

Verification at production latency

The verifier adds less than 5ms. The full response — evidence retrieval, confidence scoring, source lineage, and audit record — averages under 12ms end-to-end.

11.6ms

Avg response time

11ms

P95 latency

<5ms

Verifier overhead

4,500/s

Max throughput

Knowledge Architecture

Structured ontologies + empirical evidence

Skippy's medical knowledge is grounded in two distinct layers that reinforce each other. Formal ontology relationships — drawn from OBO Foundry ontologies — define what concepts mean and how they relate. Empirical evidence from literature and clinical data defines what the research shows. A query can walk both simultaneously, producing answers that require neither layer alone.

This is what symbolic AI projects like Cyc attempted — a rigorous structured knowledge base — but without the empirical grounding that makes it useful in practice. Skippy has both.

Gene Ontology

~50K terms

Formal biological process, molecular function, and cellular component relationships. Over 250K structured edges used in genomics and drug-mechanism queries.

Disease Ontology

~12K concepts

Standardized disease classification with is-a and related-to hierarchies. Aligns with DisGeNET and PrimeKG upstream sources already in the evidence layer.

Human Phenotype Ontology

~16K terms

Phenotype-to-disease mappings with formal axioms. Enables rare-disease reasoning and Orphanet-linked clinical workflows.

500K+

Structured ontology concepts

All three OBO Foundry ontologies (CC-licensed, public) contribute over 500,000 structured concept relationships — formal axioms that constrain reasoning the way curated knowledge bases were always supposed to, but seldom did at this scale.

How knowledge grows

Two epistemic regimes. One growing evidence base.

Most knowledge systems have one way findings enter the evidence base: ingest more data. Skippy has two. Every finding carries an epistemology property that records how it was produced — and the evidence base can be queried across both regimes simultaneously.

A claim supported by both a clinical trial result and a first-principles derivation from mechanism is a stronger claim than either alone. Skippy surfaces that distinction explicitly.

Empirical

epistemology: "empirical"

Produced by the 24-agent ingestion pipeline from 169.4M ingested source documents. Grounded in observed data — clinical trials, adverse event reports, genomic studies, systematic reviews. Confidence is calibrated against held-out data (ECE < 0.10).

Derivational

epistemology: "derivational"

Produced by the growth-era pipeline from axioms and formal definitions — first-principles reasoning that doesn't require a study to exist. 608 knowledge phases are ingest-ready today across mathematics, logic, pharmacology foundations, and biomedical ontologies.

Dual-grounded

epistemology: "both"

When a finding is supported by both empirical evidence and a derivational reasoning chain, it is marked "both" — the strongest epistemic standing in the evidence base. Queries can filter to dual-grounded claims only, returning only what evidence and reason agree on.

Autonomous knowledge loop

Query gap detected

Conductor identifies a question Skippy couldn't fully answer

→

Goal commissioned

New knowledge acquisition goal created and prioritized

→

Research executed

Sources retrieved, synthesized, and schema-validated before write

→

Finding written

New empirical or derivational finding added to the evidence base with full lineage

→

Gate re-evaluated

PCCP gates run on next release — calibration must still pass

The conductor runs as a background process — no human in the loop for knowledge acquisition. Every write is schema-validated. Every new finding runs through the same PCCP gates as the existing corpus before deployment.

169.4M

Source documents (L0)

~8M

Empirical findings integrated

608

Derivational phases ingest-ready

1,174

Total growth-era phases planned

Want to go deeper?

We can walk through the architecture, the verifier integration, and what it means for your specific use case.

Request a Demo Accuracy benchmark methodology →