Uploads & formats
TechDocChat ingests text from your files, chunks it, embeds it, and stores vectors for semantic search. This page describes what you can upload and what happens under the hood.
Supported formats
The uploader generally accepts common text-bearing formats, including:
- Plain text (
.txt) - Markdown (
.md) - CSV (
.csv) - JSON (
.json) - HTML (
.html) - PDF (
.pdf) — text extraction quality varies; text-heavy PDFs work best
If extraction yields no text (e.g. scanned PDF without OCR), indexing may fail with a clear status—try exporting text from your source tool or uploading Markdown instead.
Size & practical limits
Each subscription includes up to 5 GB of stored document files (total across all uploads in the organization). Very large documents are split into many chunks for search; extremely large uploads can still hit worker timeouts depending on environment configuration.
Indexing pipeline
- Upload — file bytes are written to object storage with an org-scoped key.
- Extract — text is pulled from the file using format-specific logic.
- Chunk — text is split into segments suitable for embedding.
- Embed & store — vectors are written to a vector index keyed by document and chunk.
- Status — the document record shows indexed chunk counts or an error string.
Tags
You can attach comma-separated tags at upload time. Tags help humans filter and organize; they may also be surfaced in the UI alongside semantic search. They are not a substitute for good titles and folder structure.
Deleting documents
Deleting removes the file from storage and attempts to remove associated vectors so answers do not reference stale content. Large documents with many chunks may take a moment to fully purge from the index.