crossmem compile

Parse a captured PDF into structured wiki chunks with LLM-generated paraphrase and implication.

Usage

crossmem compile <cite_key>

Arguments

Argument	Description
`<cite_key>`	The cite key printed by `crossmem capture`. Example: `vaswani2017attention`

What it does

Finds the raw PDF and .meta.json for the given cite key in ~/crossmem/raw/
Parses the PDF using Marker (preferred) or pdftotext (fallback)
Splits content into paragraph-level chunks with bounding-box provenance
Computes SHA-256 hash for each chunk’s verbatim text
Sends each chunk to Ollama for paraphrase and implication generation
Generates five citation formats (APA, MLA, Chicago, IEEE, BibTeX)
Emits the final wiki note to ~/crossmem/wiki/<timestamp>_<cite_key>.md

Exit codes

Code	Meaning
0	Success
1	Error (cite key not found, Ollama unreachable, parse failure)

Environment variables

Variable	Default	Description
`CROSSMEM_OLLAMA_MODEL`	`llama3.2:3b`	Ollama model used for paraphrase/implication generation

Example

$ crossmem compile vaswani2017attention
[compile] loading raw PDF for vaswani2017attention
[compile] parsing with Marker (MPS)...
[compile] 47 chunks extracted
[compile] compiling chunk 1/47...
...
[compile] wiki saved to ~/crossmem/wiki/1776227300_vaswani2017attention.md

PDF parsing tiers

Tier	Parser	When used	Bounding boxes
0	`pdftotext -layout`	Fallback when Marker unavailable	No
1	Marker (MPS)	Default for arXiv papers	Yes (polygon per block)

The parser tier is recorded in the wiki frontmatter as the parser field.

LLM contract

The LLM (Ollama) is only allowed to generate paraphrase and implication fields. It never touches:

Original verbatim text (from PDF extractor)
Metadata fields (from reconciler)
Citation strings (deterministic generator)
Provenance data (from parser)

Keyboard shortcuts

crossmem