Content Hashing
Content hashing enables integrity verification — confirming that content has not been modified since generation.
Hash format
content_hash = "{algorithm}:{hex-encoded-hash}"Example: "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Supported algorithms
| Algorithm | Identifier | Status |
|---|---|---|
| SHA-256 | sha256 | REQUIRED (default) |
| SHA-384 | sha384 | OPTIONAL |
| SHA-512 | sha512 | OPTIONAL |
Implementations MUST support sha256.
Canonicalization (JSON content)
For JSON content, the hash MUST be computed over the canonical representation:
- Remove the
_ai_provenancekey (if present) - Sort all object keys recursively in lexicographic order
- Remove all insignificant whitespace
- Encode as UTF-8
content_hash = "sha256:" + hex(SHA-256(canonicalize(content_without_app_key)))Non-JSON content
For non-JSON content (plain text, HTML, etc.), the hash is computed over the raw bytes of the content.
Use cases
- Tamper detection: Verify that content hasn’t been modified since generation
- Content matching: Find the provenance record for a piece of content via Level 2 verification
- Audit trails: Prove that specific content was produced at a specific time by a specific system
Last updated on