Skip to Content
SpecificationContent Hashing

Content Hashing

Content hashing enables integrity verification — confirming that content has not been modified since generation.

Hash format

content_hash = "{algorithm}:{hex-encoded-hash}"

Example: "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Supported algorithms

AlgorithmIdentifierStatus
SHA-256sha256REQUIRED (default)
SHA-384sha384OPTIONAL
SHA-512sha512OPTIONAL

Implementations MUST support sha256.

Canonicalization (JSON content)

For JSON content, the hash MUST be computed over the canonical representation:

  1. Remove the _ai_provenance key (if present)
  2. Sort all object keys recursively in lexicographic order
  3. Remove all insignificant whitespace
  4. Encode as UTF-8
content_hash = "sha256:" + hex(SHA-256(canonicalize(content_without_app_key)))

Non-JSON content

For non-JSON content (plain text, HTML, etc.), the hash is computed over the raw bytes of the content.

Use cases

  • Tamper detection: Verify that content hasn’t been modified since generation
  • Content matching: Find the provenance record for a piece of content via Level 2 verification
  • Audit trails: Prove that specific content was produced at a specific time by a specific system
Last updated on