Provenance
ActiveIrrefutable proof that writing was created by a human — through an authentic human process.
Try the demo →The Vision
AI can now generate text indistinguishable from human writing in quality. Existing detectors don't work — they analyze the output, but the output isn't what makes human writing human.
The analogy I keep coming back to is live concerts. Studio recordings are technically perfect, but there's something irreplaceable about proof that a human actually performed it — in time, with friction. As AI floods the zone with perfect output, the process of creation becomes the differentiator.
Speed and single-burst generation are what make AI effective. Mistakes, creative detours, and rumination are what make us human. Provenance captures and proves those human characteristics — not by analyzing what you wrote, but by recording how you wrote it.
The Event Recorder
Provenance runs a CodeMirror 6 editor with a thin bridge layer (editorRecorder.js) that intercepts every document change event and pipes it into the core recorder. Nothing is sampled or batched at write-time — every atomic edit is captured.
Event Types Captured
insert— character(s) typeddelete— character(s) removedpaste— content pasted from clipboardsession_start— new writing session begansession_end— writing session ended
Per-Event Metadata
- Millisecond-precision Unix timestamp
- Cursor position in document
- Content inserted or length deleted
- SHA-256 hash (chained from prior event)
All recorder operations — startSession, recordInsert, endSession — are async and serialized through an internal operation queue. This prevents race conditions when multiple events fire in rapid succession (fast typists can produce several events per millisecond), ensuring the hash chain is always computed in strict order.
How the Proof Is Generated
Each event's hash is computed over its own content plus the previous event's hash, forming a rolling SHA-256 chain:
When you're done writing, everything gets serialized into a .provenance file — a portable, self-contained JSON artifact with all sessions, all events, the final content, and a SHA-256 hash of the final document. No server. No Provenance infrastructure. Anyone can verify independently by replaying the chain.
File Format (excerpt)
{
"version": "1.0.0",
"sessions": [{
"id": "session-uuid",
"startTime": "2024-01-15T09:00:00Z",
"events": [
{ "type": "insert", "timestamp": 1705312800000,
"position": 0, "content": "H", "hash": "abc123..." },
{ "type": "delete", "timestamp": 1705312805000,
"position": 0, "length": 1, "hash": "def456..." }
]
}],
"finalContent": "The complete document...",
"contentHash": "sha256-of-final-content"
} Why Forgery Is Hard
Rolling Hash Chain
Modifying a single event invalidates every hash that follows it. There's no way to silently edit the record — any tampering breaks the chain. Verification requires no central authority; the math is self-contained in the file.
Behavioral Capture
Timing patterns, pause durations, and correction sequences are recorded at millisecond precision. Human typing has statistically distinctive bursts, hesitations, and backtrack patterns that are difficult to fabricate convincingly.
Multi-Day Sessions
Forging a proof means performing the writing process in real time, across multiple actual sittings. The longer the document, the higher the cost. A forged 3,000-word essay would require hours of simulated typing spread over multiple days.
Paste Detection
Paste events are recorded separately and flagged in the replay. The post-processing pipeline distinguishes external pastes from internal rearrangements (cut+paste within the document) by replaying events against a parallel character-origin tracker.
Tech Stack
- Editor: CodeMirror 6 — extensible, excellent event access
- Runtime: Node.js + Express
- Frontend: Vanilla JS + Vite
- Storage: File System Access API (Chrome/Edge) → local vault folder
- Hashing: SHA-256 via Web Crypto API
- Tests: Vitest + jsdom
What's Next
- Statistical fingerprinting — WPM variance, pause distributions, quantitative authenticity signal
- Verification badge — embeddable widget linking to the proof
- Cryptographic anchoring — RFC 3161 timestamping to prove the proof wasn't fabricated retroactively
- Privacy modes — statistical proof without full replay