Skip to main content
POST
/
api
/
validate_stream
Validate Document
curl --request POST \
  --url https://api.example.com/api/validate_stream/ \
  --header 'Content-Type: application/json' \
  --data '
{
  "document_bytes": "<string>",
  "document_category": "<string>",
  "document_metadata": {
    "audience_hint": "<string>",
    "jurisdiction": "<string>",
    "product_class": "<string>",
    "document_type": "<string>",
    "product_types": [
      {}
    ]
  }
}
'
{
  "job_id": "abc123def456",
  "status": "accepted",
  "message": "Validation request queued for processing",
  "timestamp": "2026-01-08T10:30:00Z"
}

Documentation Index

Fetch the complete documentation index at: https://zerodrift.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Submit a base64-encoded document for asynchronous compliance validation.

Request Body

document_bytes
string
required
Base64-encoded document content (PDF, DOCX, etc.)
document_category
string
Pre-defined category for the document. Required if document_metadata is not provided.Options: retail_investor_letter, retail_fact_sheet_registered_fund, retail_fact_sheet_non_registered, pitch_book_registered_fund, pitch_book_non_registered, scenario_retail_investor_letter, scenario_retail_fact_sheet_registered_fund, scenario_retail_fact_sheet_non_registered, scenario_pitch_book_registered_fund, scenario_pitch_book_non_registered
document_metadata
object
Detailed metadata for precise rule matching. Required if document_category is not provided.
At least one of document_category or document_metadata must be provided.

Scanned PDF Support (OCR)

The validation service automatically handles scanned PDFs using AWS Textract OCR. No additional parameters are needed — OCR is triggered transparently when text extraction yields insufficient content. How it works:
  1. The service first attempts standard text extraction via pypdf
  2. If a page yields fewer than 50 characters, it is classified as a scanned/image page
  3. Scanned pages are automatically sent to AWS Textract for OCR
  4. The OCR text is merged with any text-extracted pages before validation
Three PDF cases:
CaseBehavior
Text-only PDFStandard text extraction, no OCR
Fully scanned PDFAll pages sent to Textract OCR
Mixed PDF (text + scanned pages)Only scanned pages are OCR’d, text pages kept as-is
Limits:
WorkflowOCR Limit
POST /api/validate_stream/ (direct base64 upload)Up to 50 pages, 10MB per page (sync, page-by-page)
Presigned URL + POST /api/validate_stream_start/Up to 3,000 pages, 500MB total (async via S3)
Scanned PDFs may take longer to process due to OCR. For direct uploads via this endpoint, OCR is performed page-by-page (sync). For large scanned documents (50+ pages), use the presigned URL workflow which enables asynchronous Textract processing with higher limits.

Response

job_id
string
Unique identifier for the validation job
status
string
Job status: accepted
message
string
Status message
timestamp
string
ISO 8601 timestamp
{
  "job_id": "abc123def456",
  "status": "accepted",
  "message": "Validation request queued for processing",
  "timestamp": "2026-01-08T10:30:00Z"
}

Example

# Encode document to base64
DOC_BASE64=$(base64 -i document.pdf)

curl -X POST "https://{api-url}/api/validate_stream/" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"document_bytes\": \"$DOC_BASE64\",
    \"document_category\": \"retail_investor_letter\"
  }"