← Writing

Published February 3, 2026

Deterministic Verification Beyond Reproducibility

For years, reproducibility has been treated as the gold standard of trust in numerical systems. Run the experiment again. Get the same result. Assume it's correct. That assumption is convenient. It is also incomplete.

Reproducibility only tells us that a system can repeat itself under similar conditions. It does not tell us whether the computation was well-defined, whether the inputs were complete, or whether the result could have been something else without anyone noticing.

This post is the third piece in a broader line of thought:

  • Numerical failures are usually definition failures.
  • Failure should be designed for, not avoided.
  • And now: what it actually means to verify a numerical result.

Reproducibility Is a Replay, Not a Guarantee

Reproducibility is fundamentally a reenactment. You recreate an environment, rerun a process, and compare outputs. If they match, confidence increases.

But that confidence rests on hidden assumptions:

  • All relevant inputs were captured.
  • The environment was faithfully reconstructed.
  • No implicit state influenced the computation.
  • The rules themselves were correct.

Most of these assumptions are rarely explicit. They live in scripts, defaults, habits, or institutional memory.

The uncomfortable reality is simple: a result can be perfectly reproducible and still be wrong, underspecified, or misleading. Reproducibility confirms stability. It says nothing about correctness.

A Concrete Example

Imagine a training pipeline for a model. You fix the seed. You pin the dependencies. You snapshot the dataset. You rerun it and get identical metrics. Reproducible? Yes.

But:

  • Was preprocessing well-defined?
  • Were missing values handled explicitly?
  • Did silent defaults affect behavior?
  • Could different but still "valid" configurations have produced different outputs?

If those answers are unclear, the system isn't verified -- it's just replayable.

Determinism Is Not Sameness, It Is Commitment

Determinism is often reduced to: same input, same output. That sounds precise. It isn't.

A useful notion of determinism is stronger and more intentional. A deterministic system makes explicit commitments about:

  • What inputs exist.
  • Which rules are applied.
  • What outputs are derived.
  • What variability is allowed -- and what is not.

Determinism is not a property of execution. It is a property of definition.

This mirrors a recurring pattern: when numerical validation fails, it usually fails before any numbers are produced.

Verification Should Happen Before Trust

Most verification today is result-centric. We inspect outputs and ask:

  • Do they look plausible?
  • Are they reproducible?
  • Are they statistically acceptable?

But by then, we're already too late.

A stronger question is simpler: could this result have been anything else without breaking the system's declared rules? If the answer is yes, the system is underdefined -- no matter how reproducible it looks.

Deterministic verification shifts the focus away from replaying computations and toward validating that:

  • All claims are explicit.
  • All transformations are constrained.
  • All derived values follow inevitably from declared inputs.

In such a system, verification doesn't require rerunning the experiment or trusting the executor. It requires checking that alternative outcomes are impossible by design.

Failure-First Applies to Verification Too

In a failure-first mindset, errors aren't exceptional. They are signals. Verification should behave the same way.

A well-designed system fails early when:

  • An input is missing.
  • A rule is ambiguous.
  • A claim is unverifiable.

These are not bugs. They are the system refusing to pretend it knows more than it does.

Contrast that with many "reproducible" pipelines, which succeed silently while encoding hidden assumptions that only surface later -- often when trust is already on the line.

Beyond Reproducibility, Not Against It

This isn't an argument against reproducibility. Reproducibility is useful. Often necessary. Frequently informative. But it is a tool -- not a foundation.

As numerical systems become more automated, composable, and externally audited, trust cannot rely on reenactment alone. It must rely on inevitability. Not just results that can be repeated, but results that could not have been anything else.

Reproducibility asks: can you do it again? Deterministic verification asks: could you have done anything else? That distinction is where trust actually begins.