Stop parsing with prompts
Use models for judgement and reasoning, not routine document extraction.
Why burn an LLM call on parsing? vena8 lets agents inspect, extract, filter and retrieve the parts of complex files they actually need.
Most agentic workflows still waste time getting files into a usable shape. vena8 moves extraction out of the prompt and into a purpose-built API that exposes the internal structure of files before the agent decides what to read next.
Use models for judgement and reasoning, not routine document extraction.
Different file types become a stable data model: files, text, tables, assets, items, metadata and errors.
Surface embedded files, images, tables, emails, folders, attachments and metadata before the agent acts.
When something cannot be processed, the workflow gets a structured reason instead of bad context.
Send a file. Get back a map of the content: clean text, document structure, tables, assets, embedded items, metadata and searchable chunks.
The important part: this is not limited to one output format. JSON is the control plane; selected content can later be returned as text, extracted files, table rows, chunks, or Parquet-ready data.
{ "file": { "name": "claims-mailbox.pst", "type": "container", "bytes": 894120392 }, "content": { "text": "available", "tables": [/* sheets, CSVs, extracted tables */], "items": [/* emails, attachments, nested files */] }, "query": { "filterable": ["sentAt", "from", "folder", "hasAttachments"], "expand": "on_demand" }, "exports": ["json", "text", "assets", "parquet"] }
Large containers should not explode into one giant response. vena8 can index nested content first, then let agents filter by type, date, folder, sender, attachment name, table column or other structured fields.
Find messages by sent date, sender, subject, folder, attachment type or conversation before extracting bodies and attachments.
List archive contents, skip irrelevant binaries, and recursively process selected documents only when needed.
Expose sheets, columns and detected types so workflows can query rows or export clean table data.
Ask for metadata first, filtered items second, and full text or assets only for the content that is relevant.
The API is designed for machine consumption first: stable fields, repeatable extraction, selective expansion, composable exports and first-class errors.
The same concepts across formats, so workflows do not drift with inputs: files, items, tables, assets and errors.
Same file in, same result out. The parsing layer does not need creative interpretation.
Push selected data into RAG, search, analytics, Parquet conversion or downstream tools.
Structured failure states keep agents from confidently acting on broken or partial context.
vena8 turns unpredictable file inputs into structured, queryable data that agents can trust, route, filter and reason over.