Projects
Document Types
Alloovium automatically classifies documents to optimise analysis. Learn what types are supported and how the processing pipeline works.
Document Types
Alloovium automatically classifies uploaded documents into categories to help with organisation and to optimise analysis. Common types include contracts, specifications, drawings, reports, and correspondence. You can also assign a custom type to any document.
| Type | Description | Best used for |
|---|---|---|
| Contract | Legal agreements and subcontracts | Payment terms, obligations, liquidated damages |
| Specification | Technical and performance specs | Scope verification, compliance checking |
| Drawing | Engineering and architectural drawings | Clash detection, revision tracking |
| Report | Site reports, inspection reports | Issue tracking, defect management |
| Correspondence | Emails, RFIs, instructions | Notice tracking, variation history |
Processing Pipeline
Understanding how Alloovium processes your documents helps you get the most out of the platform. Each document goes through the following stages:
- Text extraction — Alloovium reads the raw text from each page. For scanned PDFs and images, GPU-accelerated OCR is applied automatically.
- Layout analysis — The document structure is analysed: headings, tables, figures, and paragraph blocks are identified and tagged.
- Chunking — The document is split into semantically meaningful segments (typically 200–500 tokens each) that are small enough for precise retrieval.
- Embedding — Each chunk is converted into a vector embedding using a high-dimensional model trained on technical and legal language.
- Indexing — Embeddings are stored in a vector index for fast similarity search, enabling sub-second retrieval during AI queries.
GPU extraction
Alloovium uses GPU-accelerated document extraction for complex PDFs and engineering drawings. This means higher accuracy on scanned documents, tables with merged cells, and multi-column layouts.