# Getting Started with NLDS API

Welcome to the Natural Language Document Search (NLDS) API. This guide walks you through integrating NLDS into your system step by step, explaining every concept before diving into the reference documentation.

## Table of Contents

- [1. Overview](#1-overview)
- [2. Authentication](#2-authentication)
    - [Token Request](#token-request)
    - [Using the Token](#using-the-token)
    - [Token Lifecycle](#token-lifecycle)
    - [Security Notes](#security-notes)
- [3. Your Company ID](#3-your-company-id)
- [4. Products](#4-products)
    - [List Available Products](#list-available-products)
    - [Why Products Matter](#why-products-matter)
- [5. Cases](#5-cases)
    - [Create a Case](#create-a-case)
    - [Get Case Details](#get-case-details)
- [6. Uploading Documents](#6-uploading-documents)
    - [6.1 Check Before You Upload (Deduplication)](#61-check-before-you-upload-deduplication)
    - [6.2 Batch Register Documents](#62-batch-register-documents)
    - [6.3 Upload to Blob Storage](#63-upload-to-blob-storage)
    - [6.4 Signal Upload Complete](#64-signal-upload-complete)
    - [6.5 Validation](#65-validation)
- [7. Ingestion (Automatic)](#7-ingestion-automatic)
- [8. Querying Documents](#8-querying-documents)
- [9. Getting Results](#9-getting-results)
    - [9.1 Polling](#91-polling)
    - [9.2 Webhook (Recommended for Production)](#92-webhook-recommended-for-production)
- [10. Opening a Document](#10-opening-a-document)
- [11. Complete Flow Summary](#11-complete-flow-summary)
- [12. Polling Code Example](#12-polling-code-example)
- [Next Steps](#next-steps)

## 1. Overview

NLDS is a platform that ingest mortgage documents (PDFs, Excel) from your system, indexes them using OCR and AI extraction, and lets you query them with natural language.

**What you can do:**
- Upload mortgage documents (closing disclosures, promissory notes, title policies, appraisals, and more) from your system
- Query indexed documents using natural language: "latest signed Closing Disclosure for Mike Johnson", "all W-9 forms signed in 2024"
- Retrieve search results with extracted metadata (borrower name, document date, signature status) and direct file access links

**Two integration flows:**

1. **Ingest flow:** You upload documents via the case upload API; NLDS validates and indexes them asynchronously
2. **Query flow:** You submit natural language queries against a case's indexed documents; NLDS searches and returns results asynchronously

**Async design:** Both upload validation and query processing run asynchronously in the backend. Your API calls return immediately with job IDs; you then poll or use webhooks to retrieve results.

---

## 2. Authentication

NLDS uses OAuth 2.0 machine-to-machine (M2M) authentication. Your integration system acts as a confidential OAuth client, obtaining a token from the AuthX service and passing it in the `Authorization` header for all API calls.

### Token Request

Request a token from AuthX using the `client_credentials` grant type.

**Endpoint:** `POST https://authx.digilytics.com/oauth2/token`

**Request (form-encoded):**

```
grant_type=client_credentials
&client_id=YOUR_CLIENT_ID
&client_secret=YOUR_CLIENT_SECRET
&scope=nldocsearch.api
```

**cURL example:**

```bash
curl -X POST https://authx.digilytics.com/oauth2/token \
  -d "grant_type=client_credentials" \
  -d "client_id=YOUR_CLIENT_ID" \
  -d "client_secret=YOUR_CLIENT_SECRET" \
  -d "scope=nldocsearch.api"
```

**Token Response (JSON):**

```json
{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 1800
}
```

| Field | Meaning |
|-------|---------|
| `access_token` | JWT token to use in all API calls |
| `token_type` | Always `Bearer` |
| `expires_in` | Token lifetime in seconds (1800 = 30 minutes) |

### Using the Token

Include the token in the `Authorization` header of every API call:

```
Authorization: Bearer {access_token}
```

Example:

```bash
curl -X GET https://api-nlds.digilytics.solutions/api/v1/123456/products \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
```

### Token Lifecycle

- Token lifetime: **30 minutes**
- No refresh token is issued; when the token expires, request a new one using the same `client_credentials` flow
- Always cache the token locally; reuse it for all requests until expiry
- Store `client_id`, `client_secret`, and tokens in secure secret storage (e.g., environment variables, HashiCorp Vault, Azure Key Vault) — **never** store them in source code

### Security Notes

- Only confidential systems should store `client_secret`. Public/browser-based clients cannot use this flow
- For token inspection and validation, you can verify the JWT signature using the AuthX JWKS endpoint (`/.well-known/jwks.json`). See the AuthX developer guide for details
- Tokens are bearer tokens; anyone with the token can make API calls on your behalf. Treat tokens as sensitive as passwords

---

## 3. Your Company ID

Your company is identified by a `companyId` — a long integer provisioned during onboarding.

**Example:** `companyId = 123456`

Every API path includes your `companyId` as a required segment:

```
/api/v1/{companyId}/products
/api/v1/{companyId}/cases
/api/v1/{companyId}/query
```

The API validates that the `companyId` in the path matches the `companyId` claim in your JWT token. A mismatched or missing `companyId` returns `403 Forbidden`.

---

## 4. Products

A **product** is a logical grouping (e.g., `"MORTGAGE"`, `"HELOC"`) that defines the document context and validation policies for a case.

Products are created by NLDS operators during setup; you cannot create new products via the API. Your job is to discover which products are available for your company and use them when creating cases.

### List Available Products

**Endpoint:** `GET /api/v1/{companyId}/products`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/products" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": [
    {
      "productCode": "MORTGAGE",
      "displayName": "Residential Mortgage",
      "description": "Mortgage origination product"
    },
    {
      "productCode": "HELOC",
      "displayName": "Home Equity Line of Credit",
      "description": "HELOC origination product"
    }
  ]
}
```

| Field | Meaning |
|-------|---------|
| `productCode` | Stable string identifier; use this in case creation |
| `displayName` | Human-readable product name |
| `description` | Brief product description |

### Why Products Matter

- **Document validation scope:** Different products have different acceptable document types and extraction schemas
- **Deduplication scope:** A document with ID `DOC-2024-001` is unique *per product*. You can upload the same file under multiple products; NLDS treats them as separate documents
- **Query scope:** When you query a case, NLDS searches only documents belonging to that case's product

---

## 5. Cases

A **case** (identified by `casePublicId` — a UUID) is a product-scoped loan application or file that groups all uploaded documents.

A case is your entry point for the upload and query flows. You create a case, upload documents to it, let NLDS index them, and then query against the case.

### Create a Case

**Endpoint:** `POST /api/v1/{companyId}/cases`

**Request:**

```bash
curl -X POST "https://api-nlds.digilytics.solutions/api/v1/123456/cases" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "productCode": "MORTGAGE",
    "externalCaseId": "LOS-998877"
  }'
```

**Request Body:**

```json
{
  "productCode": "MORTGAGE",
  "externalCaseId": "LOS-998877"
}
```

| Field | Required | Meaning |
|-------|----------|---------|
| `productCode` | Yes | Code from the products list (e.g., `"MORTGAGE"`) |
| `externalCaseId` | No | Your upstream system's case identifier (e.g., loan origination system ID). When supplied, case creation is idempotent: reusing the same `externalCaseId` returns the existing case instead of creating a duplicate. **Recommended for idempotency.** |

**Response (201 Created):**

```json
{
  "meta": {
    "code": 201,
    "message": "success",
    "type": "success"
  },
  "data": {
    "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
    "productCode": "MORTGAGE",
    "externalCaseId": "LOS-998877",
    "caseStatus": "open",
    "createdAt": "2024-12-15T14:32:00Z",
    "documentSummary": {
      "total": 0,
      "indexed": 0,
      "validation_pending": 0,
      "validation_failed": 0,
      "ingestion_queued": 0
    }
  }
}
```

| Field | Meaning |
|-------|---------|
| `casePublicId` | Your stable reference for this case in all subsequent upload and query calls |
| `caseStatus` | `"open"` = ready for uploads; `"ingesting"` = validation/ingestion in progress; `"ready"` = queryable; `"closed"` = no more uploads allowed |
| `documentSummary` | Status counts for all documents in the case |

### Get Case Details

**Endpoint:** `GET /api/v1/{companyId}/cases/{casePublicId}`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/cases/8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):** Same shape as case creation response, with current document status counts.

---

## 6. Uploading Documents

The upload flow is designed for bulk, efficient uploads with built-in deduplication. Documents are uploaded directly to Azure Blob Storage using short-lived SAS URLs; the NLDS API never proxies file bytes.

### 6.1 Check Before You Upload (Deduplication)

Before registering documents, check if they are already known to NLDS. This prevents redundant uploads and saves time.

**What is a document ID?**

`externalDocumentId` is your system's identifier for a document (e.g., `"DOC-CD-2024-001"`). It is **product-scoped globally** — meaning uniqueness is enforced as `(companyId, productCode, externalDocumentId)`, not per case.

You can upload the same document to multiple cases, but you cannot have two documents with the same `externalDocumentId` within the same product.

**Check if documents are known:**

**Endpoint:** `POST /api/v1/{companyId}/documents:lookup`

**Request:**

```bash
curl -X POST "https://api-nlds.digilytics.solutions/api/v1/123456/documents:lookup" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "externalDocumentIds": [
      "DOC-CD-2024-001",
      "DOC-PN-2024-002",
      "DOC-APPRAISAL-2024-001"
    ]
  }'
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": [
    {
      "externalDocumentId": "DOC-CD-2024-001",
      "documentPublicId": "f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f",
      "uploadStatus": "indexed",
      "validationStatus": "passed",
      "ingestionStatus": "complete"
    },
    {
      "externalDocumentId": "DOC-PN-2024-002",
      "documentPublicId": null,
      "uploadStatus": null,
      "validationStatus": null,
      "ingestionStatus": null
    },
    {
      "externalDocumentId": "DOC-APPRAISAL-2024-001",
      "uploadStatus": "validation_failed",
      "validationStatus": "failed",
      "validationDiagnostics": [
        {
          "code": "FILE_TOO_LARGE",
          "message": "File exceeds 500 MB limit"
        }
      ]
    }
  ]
}
```

**Decision Matrix:**

| Case | Action |
|------|--------|
| `uploadStatus` is `indexed` | **Skip** — document is fully processed and queryable. No need to re-upload. |
| `uploadStatus` is `validation_failed` | **Re-upload** — the document was rejected. Fix the issue and upload again. |
| `uploadStatus` is `upload_pending` or `validation_pending` or `ingestion_queued` | **Wait** — document is in flight. Check back in a few minutes. |
| `documentPublicId` is `null` | **Register** — document is unknown. Proceed to batch registration (Section 6.2). |

### 6.2 Batch Register Documents

Once you've identified which documents need uploading, batch register them to receive SAS URLs.

**Endpoint:** `POST /api/v1/{companyId}/cases/{casePublicId}/documents:register`

**Request:**

```bash
curl -X POST "https://api-nlds.digilytics.solutions/api/v1/123456/cases/8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2/documents:register" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "originalFilename": "Closing_Disclosure_signed.pdf",
        "externalDocumentId": "DOC-CD-2024-001",
        "contentLength": 1024576,
        "declaredContentType": "application/pdf",
        "sourceSystemPath": "closing-package/CD_signed_final.pdf"
      },
      {
        "originalFilename": "Promissory_Note.pdf",
        "externalDocumentId": "DOC-PN-2024-002",
        "contentLength": 512000,
        "declaredContentType": "application/pdf"
      }
    ]
  }'
```

**Request Body:**

```json
{
  "documents": [
    {
      "originalFilename": "string (required)",
      "externalDocumentId": "string (optional but recommended)",
      "contentLength": "number (optional)",
      "declaredContentType": "string (optional, e.g., 'application/pdf')",
      "sourceSystemPath": "string (optional, e.g., 'folder/file.pdf')"
    }
  ]
}
```

| Field | Meaning |
|-------|---------|
| `originalFilename` | The file name (used for display and traceability) |
| `externalDocumentId` | Your system's document ID. **Recommended** to enable deduplication. If omitted, NLDS assigns an internal ID. |
| `contentLength` | File size in bytes (optional but helpful for validation) |
| `declaredContentType` | MIME type (optional; e.g., `"application/pdf"`, `"application/vnd.ms-excel"`) |
| `sourceSystemPath` | Human-readable path in your system (e.g., `"closing-package/CD_final.pdf"`). Useful for traceability but not used for deduplication. |

**Batch limits:** Up to 100 documents per call. For 200 documents, split into two calls.

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": [
    {
      "documentPublicId": "f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f",
      "externalDocumentId": "DOC-CD-2024-001",
      "sasUploadUrl": "https://nldsstorage.blob.core.windows.net/uploads/f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f?sv=2021-06-08&sig=...",
      "sasExpiresAt": "2024-12-15T14:47:00Z"
    },
    {
      "documentPublicId": "a1b2c3d4-e5f6-4g7h-8i9j-0k1l2m3n4o5p",
      "externalDocumentId": "DOC-PN-2024-002",
      "sasUploadUrl": "https://nldsstorage.blob.core.windows.net/uploads/a1b2c3d4-e5f6-4g7h-8i9j-0k1l2m3n4o5p?sv=2021-06-08&sig=...",
      "sasExpiresAt": "2024-12-15T14:47:00Z"
    }
  ]
}
```

| Field | Meaning |
|-------|---------|
| `documentPublicId` | Your stable reference for this document in polling and query calls |
| `sasUploadUrl` | Short-lived Azure Blob Storage SAS URL (valid for ~15 minutes) |
| `sasExpiresAt` | Timestamp when the SAS URL expires; start your upload before this time |

### 6.3 Upload to Blob Storage

Use the `sasUploadUrl` from the registration response to upload the file directly to Azure Blob Storage. This is a standard Azure Blob PUT operation; the NLDS API is not involved.

**Upload Protocol:**

```
PUT {sasUploadUrl}
Content-Type: application/pdf
[file bytes]
```

**cURL example:**

```bash
curl -X PUT \
  -H "Content-Type: application/pdf" \
  --data-binary @Closing_Disclosure_signed.pdf \
  "https://nldsstorage.blob.core.windows.net/uploads/f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f?sv=2021-06-08&sig=..."
```

**Python example:**

```python
from azure.storage.blob import BlobClient

sas_url = "https://nldsstorage.blob.core.windows.net/uploads/f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f?sv=2021-06-08&sig=..."
blob_client = BlobClient.from_blob_url(sas_url)

with open("Closing_Disclosure_signed.pdf", "rb") as f:
    blob_client.upload_blob(f, overwrite=True)
```

**Important notes:**

- **Direct upload:** Your system connects directly to Azure Blob Storage; NLDS does not proxy the bytes
- **Parallel uploads:** You can upload multiple documents concurrently (one PUT per document)
- **MIME type:** Set `Content-Type` to match the file type (`application/pdf`, `application/vnd.ms-excel`, etc.)
- **SAS URL expiry:** Upload must start before the SAS URL expires (usually ~15 minutes). If the upload is slow, request new SAS URLs for the same documents

### 6.4 Signal Upload Complete

After the blob PUT succeeds, tell NLDS that the upload is done. This triggers validation.

**Endpoint:** `POST /api/v1/{companyId}/cases/{casePublicId}/documents/{documentPublicId}:complete-upload`

**Request:**

```bash
curl -X POST "https://api-nlds.digilytics.solutions/api/v1/123456/cases/8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2/documents/f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f:complete-upload" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "confirmedContentLength": 1024576,
    "confirmedContentType": "application/pdf"
  }'
```

**Request Body (optional):**

```json
{
  "confirmedContentLength": 1024576,
  "confirmedContentType": "application/pdf"
}
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "documentPublicId": "f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f",
    "uploadStatus": "upload_pending",
    "validationStatus": "pending"
  }
}
```

The `uploadStatus` changes to `upload_pending`, and NLDS queues the document for validation.

### 6.5 Validation

NLDS validates each document asynchronously. Validation checks:
- MIME type vs. magic bytes (file signature)
- File size bounds (typically < 500 MB)
- PDF page count (typically < 2,000 pages)
- Encrypted PDF detection
- Excel sanity checks (for .xlsx files)

Poll the document status to monitor validation progress:

**Endpoint:** `GET /api/v1/{companyId}/cases/{casePublicId}/documents/{documentPublicId}`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/cases/8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2/documents/f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "documentPublicId": "f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f",
    "uploadStatus": "uploaded",
    "validationStatus": "passed",
    "ingestionStatus": "queued"
  }
}
```

**Validation status values:**

| Status | Meaning | Next Action |
|--------|---------|-------------|
| `pending` | Validation is running | Poll again in 5–10 seconds |
| `passed` | Validation succeeded | Document will be auto-queued for ingestion; you're done |
| `failed` | Validation rejected the document | See `validationDiagnostics` for specific errors; fix and re-upload |

**If validation fails:**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "documentPublicId": "f2e1d3c4-a5b6-4c7d-8e9f-0a1b2c3d4e5f",
    "uploadStatus": "uploaded",
    "validationStatus": "failed",
    "validationDiagnostics": [
      {
        "code": "FILE_TOO_LARGE",
        "message": "File size (751 MB) exceeds 500 MB limit"
      }
    ]
  }
}
```

Common `validationDiagnostics` codes:
- `FILE_TOO_LARGE` — File exceeds size limit
- `INVALID_PDF_STRUCTURE` — PDF is corrupted or unreadable
- `MIME_MISMATCH` — File extension doesn't match content
- `ENCRYPTED_PDF` — PDF requires a password
- `PAGE_COUNT_EXCEEDED` — PDF has too many pages

---

## 7. Ingestion (Automatic)

After a document passes validation, NLDS automatically schedules it for ingestion. **You do not need to call an ingestion endpoint.**

Here's what happens:

1. Document validation passes → status becomes `"uploaded"`
2. NLDS schedules a case-level ingestion job automatically
3. The job may wait a few seconds (debounce window) to batch any other documents uploaded concurrently
4. Ingestion runs: OCR, boundary detection, classification, extraction
5. Document status changes: `ingestion_queued` → `indexed`

Monitor ingestion progress via the case status endpoint:

**Endpoint:** `GET /api/v1/{companyId}/cases/{casePublicId}`

**Response excerpt:**

```json
{
  "data": {
    "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
    "caseStatus": "ingesting",
    "documentSummary": {
      "total": 3,
      "indexed": 1,
      "ingestion_queued": 2,
      "validation_pending": 0,
      "validation_failed": 0
    }
  }
}
```

The `documentSummary` shows how many documents are in each state. Poll every 10–30 seconds to track progress.

---

## 8. Querying Documents

Once documents are indexed, you query them by submitting a structured checklist of document types with optional attribute filters.

**Endpoint:** `POST /api/v1/{companyId}/query`

**Request:**

```bash
curl -X POST "https://api-nlds.digilytics.solutions/api/v1/123456/query" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
    "checklist": [
      { "docType": "closing_disclosure", "filters": { "isSigned": true, "borrowerName": "Mike Johnson" } },
      { "docType": "promissory_note", "filters": { "isSigned": true } }
    ]
  }'
```

**Request Body:**

```json
{
  "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
  "checklist": [
    {
      "docType": "closing_disclosure",
      "filters": {
        "isSigned": true,
        "borrowerName": "Mike Johnson"
      }
    },
    {
      "docType": "promissory_note",
      "filters": {
        "isSigned": true
      }
    }
  ],
  "previousJobId": null
}
```

| Field | Required | Meaning |
|-------|----------|---------|
| `casePublicId` | Yes | The case whose documents to search |
| `checklist` | Yes | Array of 1–20 document types to find (see below) |
| `checklist[].docType` | Yes | Document type string (e.g., `"closing_disclosure"`, `"promissory_note"`) |
| `checklist[].filters` | No | Attribute filters for this doc type: `isSigned`, `borrowerName`, `docDateFrom`, `docDateTo`, `executionStatus` |
| `previousJobId` | No | Pass the `jobId` from a prior query to carry indexing context forward |

**Response (202 Accepted):**

```json
{
  "meta": {
    "code": 202,
    "message": "accepted",
    "type": "success"
  },
  "data": {
    "jobId": "b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f",
    "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
    "status": "processing",
    "createdAt": "2024-12-15T15:00:00Z"
  }
}
```

| Field | Meaning |
|-------|---------|
| `jobId` | Your stable reference for polling or webhook delivery |
| `status` | Initially `"processing"` |
| `createdAt` | Query submission timestamp |

**What happens next:**

The query pipeline runs asynchronously — OCR, extraction, indexing, and semantic search all run in parallel. For a cold case (no documents indexed yet), this can take up to 14 minutes.

Use polling or webhooks to retrieve results (see Section 9).

---

## 9. Getting Results

### 9.1 Polling

Poll the query status endpoint to check progress:

**Endpoint:** `GET /api/v1/{companyId}/query/{jobId}/status`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/query/b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f/status" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "jobId": "b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f",
    "status": "complete",
    "resultCount": 3,
    "isPartial": false,
    "completedAt": "2024-12-15T15:02:30Z"
  }
}
```

**Status values:**

| Value | Meaning | Next Action |
|-------|---------|-------------|
| `processing` | Pipeline running | Poll again in 5–10 seconds |
| `complete` | All documents indexed; full results available | Fetch results or wait for webhook delivery |
| `partial` | 14-minute timeout fired before all documents indexed | Fetch results; `isPartial: true` indicates timeout occurred |
| `failed` | Unrecoverable error | Inspect error details; retry query if appropriate |

Once status is `complete` or `partial`, fetch results:

**Endpoint:** `GET /api/v1/{companyId}/query/{jobId}/results`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/query/b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f/results" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "jobId": "b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f",
    "isPartial": false,
    "results": [
      {
        "docType": "closing_disclosure",
        "found": true,
        "resultCount": 1,
        "results": [
          {
            "docId": "doc-12345",
            "docType": "closing_disclosure",
            "fileId": "file-001",
            "externalDocumentId": "DOC-CD-2024-001",
            "sourceSystemPath": "closing-package/CD_signed_final.pdf",
            "pageRanges": ["5-11", "13-20"],
            "needsReview": false,
            "matchedOn": {
              "isSigned": true,
              "borrowerName": "Mike Johnson"
            }
          }
        ]
      },
      {
        "docType": "promissory_note",
        "found": false,
        "resultCount": 0,
        "results": []
      }
    ]
  }
}
```

**Result fields:**

| Field | Meaning |
|-------|---------|
| `results[].docType` | Document type from the checklist request |
| `results[].found` | `true` if at least one matching document was found |
| `results[].resultCount` | Number of matched documents for this checklist item |
| `results[].results` | Array of matched documents (empty array when `found: false`) |
| `results[].results[].docId` | Internal NLDS document identifier |
| `results[].results[].fileId` | Internal file identifier (pass to the document access endpoint) |
| `results[].results[].externalDocumentId` | Your system's document ID as provided at upload time (null if not supplied) |
| `results[].results[].sourceSystemPath` | Folder/path in your system as provided at upload time (null if not supplied) |
| `results[].results[].pageRanges` | Array of page ranges where this document was found (e.g., `["5-11", "13-20"]`). Multiple entries when the document spans non-contiguous pages. |
| `results[].results[].needsReview` | `true` if extraction confidence was low and manual review is recommended |
| `results[].results[].matchedOn` | The filter values that were evaluated for this match — contains only the keys you specified in `checklist[].filters` (e.g., `isSigned`, `borrowerName`). Empty object if no filters were provided. |

### 9.2 Webhook (Recommended for Production)

For production systems, webhooks are more efficient than polling. NLDS POSTs query results directly to your endpoint when the query completes.

**Configure your webhook:**

**Endpoint:** `PUT /api/v1/{companyId}/webhooks/query-result`

**Request:**

```bash
curl -X PUT "https://api-nlds.digilytics.solutions/api/v1/123456/webhooks/query-result" \
  -H "Authorization: Bearer {access_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-system.example.com/webhooks/nlds-query-result",
    "secret": "your-webhook-signing-secret"
  }'
```

**Request Body:**

```json
{
  "url": "https://your-system.example.com/webhooks/nlds-query-result",
  "secret": "your-webhook-signing-secret"
}
```

| Field | Meaning |
|-------|---------|
| `url` | Your HTTPS endpoint that will receive POST requests |
| `secret` | A shared secret used to sign webhook payloads; use it to verify authenticity |

**Webhook event:**

When a query reaches terminal state (`complete`, `partial`, or `failed`), NLDS POSTs an `NldsQueryResultEvent` to your URL:

```json
{
  "eventType": "NLDOCSEARCH_QUERY_COMPLETED",
  "jobId": "b4c7d9e2-f5a8-4c1d-9e3f-5a2b8c4d7e6f",
  "casePublicId": "8d5a3e7c-9f21-4a2c-b8e1-d7c6f5a4e3b2",
  "isPartial": false,
  "results": [
    {
      "docId": "doc-12345",
      "docType": "closing_disclosure",
      "attributes": {
        "borrowerName": "Mike Johnson",
        "docDate": "2024-11-15",
        "isSigned": true
      }
    }
  ]
}
```

**Webhook headers:**

```
X-Webhook-Signature: sha256=abcdef0123456789abcdef0123456789abcdef01
X-Delivery-ID: delivery-12345
```

**Verify the signature:**

Use the `secret` to verify the payload:

```python
import hmac
import hashlib
import json

secret = "your-webhook-signing-secret"
payload_body = request.body  # Raw request body
signature = request.headers.get("X-Webhook-Signature")

# Compute expected signature
expected_sig = "sha256=" + hmac.new(
    secret.encode(),
    payload_body,
    hashlib.sha256
).hexdigest()

if not hmac.compare_digest(signature, expected_sig):
    # Signature invalid; reject
    return 401

# Signature valid; process webhook
event = json.loads(payload_body)
```

**Webhook requirements:**

- Your endpoint must return HTTP `200` within 10 seconds
- NLDS retries on non-200 responses or timeout (exponential backoff, max 3 attempts)
- Treat the webhook payload as the source of truth; no need to poll the results endpoint after webhook delivery

---

## 10. Opening a Document

Search results include document IDs, but not direct URLs. To access a document, request a short-lived access URL:

**Endpoint:** `GET /api/v1/{companyId}/documents/{docId}/file-access`

**Request:**

```bash
curl -X GET "https://api-nlds.digilytics.solutions/api/v1/123456/documents/doc-12345/file-access" \
  -H "Authorization: Bearer {access_token}"
```

**Response (200 OK):**

```json
{
  "meta": {
    "code": 200,
    "message": "success",
    "type": "success"
  },
  "data": {
    "accessUrl": "https://nldsstorage.blob.core.windows.net/documents/doc-12345?sv=2021-06-08&sig=...",
    "expiresAt": "2024-12-15T15:15:00Z"
  }
}
```

| Field | Meaning |
|-------|---------|
| `accessUrl` | Short-lived SAS URL to the document (valid for ~15 minutes) |
| `expiresAt` | When the URL expires; request a new one if needed |

**Important notes:**

- Do not persist or share `accessUrl` — it is short-lived and single-use
- Request a new URL every time the user wants to open a document
- The URL can be used in a browser `<a href>` or downloaded directly with curl/wget

---

## 11. Complete Flow Summary

Here is the entire integration journey from start to finish, using the LOS-998877 mortgage application scenario:

1. **Get a token** (`POST /authx/oauth2/token`) with your `client_credentials` — token valid for 30 minutes
2. **List products** (`GET /api/v1/123456/products`) — discover available products (e.g., `MORTGAGE`)
3. **Create a case** (`POST /api/v1/123456/cases`) with `productCode: "MORTGAGE"` and `externalCaseId: "LOS-998877"` — receive `casePublicId`
4. **Check documents before upload** (`POST /api/v1/123456/documents:lookup`) with list of `externalDocumentIds` — identify which are new
5. **Register new documents** (`POST /api/v1/123456/cases/{casePublicId}/documents:register`) — receive SAS URLs
6. **Upload to Blob Storage** (PUT to each `sasUploadUrl`) with file bytes — direct to Azure, not through NLDS
7. **Signal upload complete** (`POST /api/v1/123456/cases/{casePublicId}/documents/{documentPublicId}:complete-upload`) — triggers validation
8. **Poll document status** (`GET /api/v1/123456/cases/{casePublicId}/documents/{documentPublicId}`) until `validationStatus: "passed"`
9. **Monitor ingestion** (`GET /api/v1/123456/cases/{casePublicId}`) — watch `documentSummary` counts until all are `indexed`
10. **Submit query** (`POST /api/v1/123456/query`) with `casePublicId` and a `checklist` of doc types with optional filters — receive `jobId`
11. **Poll query status** (`GET /api/v1/123456/query/{jobId}/status`) until `status: "complete"` or `"partial"`
12. **Fetch results** (`GET /api/v1/123456/query/{jobId}/results`) — receive documents grouped by checklist item with `matchedOn` filter values
13. **Request document access** (`GET /api/v1/123456/documents/{docId}/file-access`) for each result — get short-lived URL
14. **Open document** (browser or download) using the access URL

---

## 12. Polling Code Example

Here is Python pseudocode demonstrating the complete client polling loop, from case creation through query result retrieval:

```python
import requests
import time
import json

# === Configuration ===
COMPANY_ID = 123456
AUTHX_TOKEN_ENDPOINT = "https://authx.digilytics.com/oauth2/token"
NLDS_API_BASE = "https://api-nlds.digilytics.solutions/api/v1"
CLIENT_ID = "your-client-id"
CLIENT_SECRET = "your-client-secret"

# === Helper: Get or refresh token ===
def get_token():
    """Obtain a fresh OAuth token."""
    response = requests.post(
        AUTHX_TOKEN_ENDPOINT,
        data={
            "grant_type": "client_credentials",
            "client_id": CLIENT_ID,
            "client_secret": CLIENT_SECRET,
            "scope": "nldocsearch.api",
        },
    )
    response.raise_for_status()
    token = response.json()["access_token"]
    return token

# === Helper: Common headers ===
def headers_for_token(token):
    return {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
    }

# === 1. Create case ===
token = get_token()
case_response = requests.post(
    f"{NLDS_API_BASE}/{COMPANY_ID}/cases",
    headers=headers_for_token(token),
    json={
        "productCode": "MORTGAGE",
        "externalCaseId": "LOS-998877",
    },
)
case_response.raise_for_status()
case_id = case_response.json()["data"]["casePublicId"]
print(f"Created case: {case_id}")

# === 2. Register documents (batch) ===
docs_to_register = [
    {
        "originalFilename": "Closing_Disclosure_signed.pdf",
        "externalDocumentId": "DOC-CD-2024-001",
        "contentLength": 1024576,
        "declaredContentType": "application/pdf",
    },
    {
        "originalFilename": "Promissory_Note.pdf",
        "externalDocumentId": "DOC-PN-2024-002",
        "contentLength": 512000,
        "declaredContentType": "application/pdf",
    },
]

register_response = requests.post(
    f"{NLDS_API_BASE}/{COMPANY_ID}/cases/{case_id}/documents:register",
    headers=headers_for_token(token),
    json={"documents": docs_to_register},
)
register_response.raise_for_status()
registered_docs = register_response.json()["data"]
print(f"Registered {len(registered_docs)} documents")

# === 3. Upload each document to Blob Storage ===
import urllib3
http = urllib3.PoolManager()

for i, doc in enumerate(registered_docs):
    file_path = f"./{docs_to_register[i]['originalFilename']}"
    with open(file_path, "rb") as f:
        file_bytes = f.read()
    
    upload_response = requests.put(
        doc["sasUploadUrl"],
        data=file_bytes,
        headers={"Content-Type": docs_to_register[i]["declaredContentType"]},
    )
    upload_response.raise_for_status()
    print(f"Uploaded {doc['externalDocumentId']}")
    
    # === 4. Signal upload complete ===
    complete_response = requests.post(
        f"{NLDS_API_BASE}/{COMPANY_ID}/cases/{case_id}/documents/{doc['documentPublicId']}:complete-upload",
        headers=headers_for_token(token),
        json={
            "confirmedContentLength": len(file_bytes),
            "confirmedContentType": docs_to_register[i]["declaredContentType"],
        },
    )
    complete_response.raise_for_status()
    print(f"Signaled upload complete for {doc['externalDocumentId']}")

# === 5. Poll for validation completion ===
all_validated = False
max_wait = 300  # 5 minutes
elapsed = 0
while not all_validated and elapsed < max_wait:
    validated_count = 0
    for doc in registered_docs:
        doc_status = requests.get(
            f"{NLDS_API_BASE}/{COMPANY_ID}/cases/{case_id}/documents/{doc['documentPublicId']}",
            headers=headers_for_token(token),
        )
        doc_status.raise_for_status()
        data = doc_status.json()["data"]
        if data.get("validationStatus") == "passed":
            validated_count += 1
        elif data.get("validationStatus") == "failed":
            print(f"Validation failed for {doc['externalDocumentId']}")
            print(data.get("validationDiagnostics"))
    
    if validated_count == len(registered_docs):
        all_validated = True
        print("All documents validated successfully")
    else:
        print(f"Validation progress: {validated_count}/{len(registered_docs)}")
        time.sleep(5)
        elapsed += 5

# === 6. Poll for ingestion completion (optional) ===
all_indexed = False
max_wait = 300  # 5 minutes
elapsed = 0
while not all_indexed and elapsed < max_wait:
    case_status = requests.get(
        f"{NLDS_API_BASE}/{COMPANY_ID}/cases/{case_id}",
        headers=headers_for_token(token),
    )
    case_status.raise_for_status()
    summary = case_status.json()["data"]["documentSummary"]
    if summary["indexed"] == len(registered_docs):
        all_indexed = True
        print("All documents indexed successfully")
    else:
        print(f"Ingestion progress: {summary['indexed']}/{len(registered_docs)}")
        time.sleep(10)
        elapsed += 10

# === 7. Submit checklist query ===
query_response = requests.post(
    f"{NLDS_API_BASE}/{COMPANY_ID}/query",
    headers=headers_for_token(token),
    json={
        "casePublicId": case_id,
        "checklist": [
            {"docType": "closing_disclosure", "filters": {"isSigned": True, "borrowerName": "Mike Johnson"}},
            {"docType": "promissory_note", "filters": {"isSigned": True}},
        ],
    },
)
query_response.raise_for_status()
job_id = query_response.json()["data"]["jobId"]
print(f"Submitted checklist query: {job_id}")

# === 8. Poll for query completion ===
query_complete = False
max_wait = 900  # 15 minutes
elapsed = 0
while not query_complete and elapsed < max_wait:
    status_response = requests.get(
        f"{NLDS_API_BASE}/{COMPANY_ID}/query/{job_id}/status",
        headers=headers_for_token(token),
    )
    status_response.raise_for_status()
    data = status_response.json()["data"]
    status = data.get("status")
    
    if status in ["complete", "partial", "failed"]:
        query_complete = True
        print(f"Query completed with status: {status}")
        if data.get("isPartial"):
            print("Warning: results may be incomplete (14-minute timeout)")
    else:
        print(f"Query status: {status}")
        time.sleep(5)
        elapsed += 5

# === 9. Fetch and display results ===
results_response = requests.get(
    f"{NLDS_API_BASE}/{COMPANY_ID}/query/{job_id}/results",
    headers=headers_for_token(token),
)
results_response.raise_for_status()
checklist_results = results_response.json()["data"]["results"]

print(f"\n=== Query Results ({len(checklist_results)} checklist items) ===")
first_doc = None
for item in checklist_results:
    print(f"DocType: {item['docType']} — found={item['found']} ({item['resultCount']} result(s))")
    for doc in item["results"]:
        print(f"  Pages: {', '.join(doc['pageRanges'])}")
        print(f"  Matched on: {doc['matchedOn']}")
        print(f"  Needs review: {doc['needsReview']}")
        if first_doc is None:
            first_doc = doc
    print()

# === 10. Request document access URL ===
for result in ([first_doc] if first_doc else []):  # Just the first matched document
    access_response = requests.get(
        f"{NLDS_API_BASE}/{COMPANY_ID}/documents/{result['docId']}/file-access",
        headers=headers_for_token(token),
    )
    access_response.raise_for_status()
    access_url = access_response.json()["data"]["accessUrl"]
    print(f"Document access URL (expires at {access_response.json()['data']['expiresAt']}): {access_url}")
```

---

## Next Steps

- Review the [OpenAPI specification](../openapi.yaml) for detailed endpoint definitions
- Check the [API reference documentation](../) for complete field specifications
- Consult the [AuthX developer guide](https://authx.digilytics.com/docs) for JWKS validation and advanced OAuth topics
- Contact support at api-support@digilytics.com for integration questions
