Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

crossmem compile

Parse a captured PDF into structured wiki chunks with LLM-generated paraphrase and implication.

Usage

crossmem compile <cite_key>

Arguments

ArgumentDescription
<cite_key>The cite key printed by crossmem capture. Example: vaswani2017attention

What it does

  1. Finds the raw PDF and .meta.json for the given cite key in ~/crossmem/raw/
  2. Parses the PDF using Marker (preferred) or pdftotext (fallback)
  3. Splits content into paragraph-level chunks with bounding-box provenance
  4. Computes SHA-256 hash for each chunk’s verbatim text
  5. Sends each chunk to Ollama for paraphrase and implication generation
  6. Generates five citation formats (APA, MLA, Chicago, IEEE, BibTeX)
  7. Emits the final wiki note to ~/crossmem/wiki/<timestamp>_<cite_key>.md

Exit codes

CodeMeaning
0Success
1Error (cite key not found, Ollama unreachable, parse failure)

Environment variables

VariableDefaultDescription
CROSSMEM_OLLAMA_MODELllama3.2:3bOllama model used for paraphrase/implication generation

Example

$ crossmem compile vaswani2017attention
[compile] loading raw PDF for vaswani2017attention
[compile] parsing with Marker (MPS)...
[compile] 47 chunks extracted
[compile] compiling chunk 1/47...
...
[compile] wiki saved to ~/crossmem/wiki/1776227300_vaswani2017attention.md

PDF parsing tiers

TierParserWhen usedBounding boxes
0pdftotext -layoutFallback when Marker unavailableNo
1Marker (MPS)Default for arXiv papersYes (polygon per block)

The parser tier is recorded in the wiki frontmatter as the parser field.

LLM contract

The LLM (Ollama) is only allowed to generate paraphrase and implication fields. It never touches:

  • Original verbatim text (from PDF extractor)
  • Metadata fields (from reconciler)
  • Citation strings (deterministic generator)
  • Provenance data (from parser)