Uploads & formats

TechDocChat ingests text from your files, chunks it, embeds it, and stores vectors for semantic search. This page describes what you can upload and what happens under the hood.

Supported formats

The uploader generally accepts common text-bearing formats, including:

Plain text (.txt)
Markdown (.md)
CSV (.csv)
JSON (.json)
HTML (.html)
PDF (.pdf) — text extraction quality varies; text-heavy PDFs work best

If extraction yields no text (e.g. scanned PDF without OCR), indexing may fail with a clear status—try exporting text from your source tool or uploading Markdown instead.

Size & practical limits

Each subscription includes up to 5 GB of stored document files (total across all uploads in the organization). Very large documents are split into many chunks for search; extremely large uploads can still hit worker timeouts depending on environment configuration.

Indexing pipeline

Upload — file bytes are written to object storage with an org-scoped key.
Extract — text is pulled from the file using format-specific logic.
Chunk — text is split into segments suitable for embedding.
Embed & store — vectors are written to a vector index keyed by document and chunk.
Status — the document record shows indexed chunk counts or an error string.

Deleting documents

Deleting removes the file from storage and attempts to remove associated vectors so answers do not reference stale content. Large documents with many chunks may take a moment to fully purge from the index.

← Documentation home

Uploads & formats

Supported formats

Size & practical limits

Indexing pipeline

Tags

Deleting documents