Parse a captured PDF into structured wiki chunks with LLM-generated paraphrase and implication.
crossmem compile <cite_key>
Argument Description
<cite_key>The cite key printed by crossmem capture. Example: vaswani2017attention
Finds the raw PDF and .meta.json for the given cite key in ~/crossmem/raw/
Parses the PDF using Marker (preferred) or pdftotext (fallback)
Splits content into paragraph-level chunks with bounding-box provenance
Computes SHA-256 hash for each chunk’s verbatim text
Sends each chunk to Ollama for paraphrase and implication generation
Generates five citation formats (APA, MLA, Chicago, IEEE, BibTeX)
Emits the final wiki note to ~/crossmem/wiki/<timestamp>_<cite_key>.md
Code Meaning
0 Success
1 Error (cite key not found, Ollama unreachable, parse failure)
Variable Default Description
CROSSMEM_OLLAMA_MODELllama3.2:3bOllama model used for paraphrase/implication generation
$ crossmem compile vaswani2017attention
[compile] loading raw PDF for vaswani2017attention
[compile] parsing with Marker (MPS)...
[compile] 47 chunks extracted
[compile] compiling chunk 1/47...
...
[compile] wiki saved to ~/crossmem/wiki/1776227300_vaswani2017attention.md
Tier Parser When used Bounding boxes
0 pdftotext -layoutFallback when Marker unavailable No
1 Marker (MPS) Default for arXiv papers Yes (polygon per block)
The parser tier is recorded in the wiki frontmatter as the parser field.
The LLM (Ollama) is only allowed to generate paraphrase and implication fields. It never touches:
Original verbatim text (from PDF extractor)
Metadata fields (from reconciler)
Citation strings (deterministic generator)
Provenance data (from parser)