Failure Taxonomy

Categorizing signals to ensure the right response.

01. Why Failures Are Categorized

When a test fails to meet the engine's requirements, simply reporting it as "failed" provides insufficient context. Without understanding why the system withdrew its trust, any attempt to resolve the issue is a guess.

Failures are categorized to reveal exactly which trust boundary was reached. By identifying the specific category of rejection, the engine helps prioritize your response and prevents the common mistake of fixing a symptom while leaving the underlying evidentiary gap unresolved.

“A failure without context invites the wrong action.”

02. Structural Failures

A structural failure indicates that the engine cannot understand the physical shape of the test. This occurs when a test is unparsable or lacks a valid executable structure recognized by the system.

These failures are total blockers. Because the engine cannot map the test's intent to its internal logic, no behavioral analysis can proceed. Restoration of structural integrity is the absolute first priority in these cases.

03. Safety Failures

The engine prioritizes the protection of the execution environment. A safety failure occurs when a test compromises isolation, exhibits uncontrolled side effects, or engages in unsafe background behavior that could contaminate subsequent analysis.

Analysis must stop when the environment is compromised. These failures are treated as technical rejections of the test's presence within the suite, as they pose a risk to the integrity of the overall knowledge base.

04. Observability Failures

An observability failure means the test completed its run, but no external or persistent evidence of behavior was detected. The engine was unable to find any proof that the system was transformed by the test's act.

This typically indicates that the assertions are measuring transient, in-memory artifacts rather than permanent reality.

“Something may have happened. Nothing was observed.”

05. Causality Failures

Causality failures occur when evidence exists, but its origin is ambiguous. The Auditor has observed a change, but it cannot prove that the change was uniquely caused by the act defined in the test.

If the system's state could have been produced by an alternative source, or if the change was already present before the test began, the causal link is broken.

“Evidence without a traceable cause is not trusted.”

06. Strictness Failures

A strictness failure indicates that while some proof exists, it is too fragile or coincidental to meet audit-grade standards. The engine is requesting independent confirmation of the behavioral truth.

This category measures the depth of verification rather than the surface area of coverage. It ensures that critical behaviors are anchored by multiple, complementary signals, reducing the risk of a "pass" that masks an underlying flaw.

07. Non-Learning Classification

A test may be valid and perfectly anchored, yet still provide no informative signal to the system's learning engine. This occurs when the pattern is already fundamentally understood or provides no generalizable intelligence.

This is not a rejection of the test's value to your suite; it is a classification of its educational utility.

“Not all valid tests teach the system.”

08. What to Do After a Failure

Effective resolution requires prioritization. Do not attempt to fix every failure in a single pass.

Start by addressing the earliest and most blocking categories. A structural or safety failure will often mask more nuanced causality or observability gaps.

Read the failure message as a trust-boundary report. It is a precise technical signal indicating exactly where the engine stopped believing your proof. Use this signal to strengthen the causal chain, providing the observable evidence required to earn the Auditor's trust.

Verification is a conversation between rigor and strictness.