Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Writing a Paper with crossmem

An end-to-end playbook for AI agents (Claude Code, Cursor, etc.) and their human authors. You have crossmem installed and the MCP server registered. You want to cite prior work correctly and quote-faithfully.

1. One-time setup

Install crossmem and its dependencies:

# Install crossmem
cargo install --path .
# Or, from the repo directly:
# cargo install --git https://github.com/crossmem/crossmem-rs

# Local LLM for paraphrase/implication generation
ollama pull llama3.2:3b

# PDF parser (preferred — produces bounding boxes)
pip install marker-pdf
# Fallback: brew install poppler   (provides pdftotext)

Register the MCP server so your agent can call crossmem_cite and crossmem_recall:

Claude Code:

claude mcp add crossmem -- crossmem mcp serve

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "crossmem": {
      "command": "crossmem",
      "args": ["mcp", "serve"]
    }
  }
}

2. Capturing a paper

crossmem capture https://arxiv.org/abs/1706.03762

Output:

[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
[capture] cite_key: vaswani2017attention
[capture] saved to ~/crossmem/raw/1776227254_vaswani2017attention.pdf

This does three things:

  1. Downloads the PDF to ~/crossmem/raw/<timestamp>_<cite_key>.pdf
  2. Fetches metadata from arXiv, CrossRef, and OpenAlex — reconciles across all three
  3. Generates a deterministic cite key via the pattern DSL

Then compile it into a wiki entry:

crossmem compile vaswani2017attention

This parses the PDF (Marker by default), splits it into chunks, runs each through Ollama for paraphrase and implication, and emits the wiki note at ~/crossmem/wiki/<timestamp>_vaswani2017attention.md.

Note: YouTube ingestion is design-only — see YouTube Ingestion Pipeline.

Capturing non-arXiv papers

Most journal papers (e.g. JCP, Nature, PRL) are not on arXiv. crossmem capture supports them through DOI lookup and local PDF import.

If you have a DOI — CrossRef metadata is fetched automatically:

# DOI URL
crossmem capture https://doi.org/10.1063/5.0012345

# Bare DOI
crossmem capture 10.1063/5.0012345

If the paper is open-access, the PDF downloads via Unpaywall. Otherwise you’ll get instructions to download it manually.

If you already have the PDF — the most common path for paywalled journals:

# With DOI (recommended — gets full CrossRef metadata)
crossmem capture ~/Downloads/smith2023.pdf --doi 10.1063/5.0012345

# Without DOI — extracts what it can from PDF metadata
crossmem capture ~/Downloads/smith2023.pdf --cite-key smith2023transport

Direct PDF URL — for preprint servers, institutional repos:

crossmem capture https://chemrxiv.org/paper.pdf --doi 10.1234/chemrxiv.5678

All paths produce the same raw/ + .meta.json output. Then compile as usual:

crossmem compile smith2023transport

For a JCP submission with 24 references, a typical workflow is:

# Capture each reference — most will be local PDFs with DOIs
for pdf in ~/papers/jcp-refs/*.pdf; do
  doi=$(grep -oP '10\.\d{4,9}/[^\s]+' <<< "$(pdftotext "$pdf" - | head -5)")
  crossmem capture "$pdf" --doi "$doi"
done

# Then compile each one
for meta in ~/crossmem/raw/*.meta.json; do
  key=$(jq -r .cite_key "$meta")
  crossmem compile "$key"
done

3. The compiled wiki entry — what the agent sees

Frontmatter

---
cite_key: vaswani2017attention
title: "Attention Is All You Need"
authors:
  - "Ashish Vaswani"
  - "Noam Shazeer"
year: 2017
arxiv_id: "1706.03762"
doi: "10.48550/arXiv.1706.03762"
captured_at: "1776227254"
raw: "~/crossmem/raw/1776227254_vaswani2017attention.pdf"
pdf_sha256: "9a8f3b..."
parser: "marker"
chunks: 47
meta:
  sources: ["arxiv", "crossref", "openalex"]
  reconciled: true
  warnings: []
---

After the frontmatter, five citation formats are pre-generated: APA, MLA, Chicago, IEEE, and BibTeX.

Chunks

Each chunk carries verbatim text, LLM-generated derivatives, and full provenance:

<!-- chunk id=p4s32c1 -->
> The dominant sequence transduction models are based on complex recurrent or
> convolutional neural networks that include an encoder and a decoder.

**Paraphrase:** Prior sequence models relied on RNNs or CNNs in an encoder-decoder setup.

**Implication:** This dependency on recurrence was the bottleneck the Transformer aimed to eliminate.

​```yaml
provenance:
  page: 4
  section: "3.2 Scaled Dot-Product Attention"
  bbox: [72.0, 340.5, 523.8, 412.1]
  text_sha256: "5f3e1c..."
  byte_range: [18342, 19104]
​```

Hard rule for agents: The > blockquote is the verbatim original extracted from the PDF. When citing, the agent MUST copy from this blockquote. NEVER fabricate or rephrase quotes. The Paraphrase and Implication fields exist for the agent’s reasoning and search — they do not belong in the paper as attributed quotes.

4. Agent prompts that actually work

Finding relevant chunks

“Search my library for how transformer attention was originally motivated. Return cite_keys and page numbers.”

Agent calls:

crossmem_recall("transformer attention motivation", limit=5)

Returns a ranked list of {cite_key, title, section, excerpt}. The agent picks the most relevant hits and reports them.

Quoting with provenance

“Write a paragraph introducing self-attention. Quote vaswani2017attention page 2 verbatim, then paraphrase in my voice. Include BibTeX.”

Agent workflow:

  1. Calls crossmem_recall("self-attention vaswani2017attention") to find the right chunk
  2. Reads the wiki file to locate the page-2 chunk
  3. Copies the > blockquote verbatim into the draft as a block quote
  4. Writes a surrounding paraphrase in the author’s voice (informed by the Paraphrase field, not copying it)
  5. Calls crossmem_cite("vaswani2017attention", "bibtex") for the BibTeX entry
  6. Embeds the text_sha256 and page reference as an HTML comment so crossmem verify can trace provenance:
% crossmem: vaswani2017attention p4s32c1 sha256=5f3e1c...
\begin{quote}
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder.
\end{quote}
\cite{vaswani2017attention}

Citing multiple papers

“Compare how Vaswani 2017 and Devlin 2019 frame the importance of pre-training.”

Agent calls crossmem_recall("pre-training importance"), gets hits from both papers, reads the relevant chunks, and writes a comparison paragraph quoting both — each quote traced to its chunk ID.

Running a drift check

After the human edits the draft (or the agent revises it), verify that no quotes have been accidentally mutated:

crossmem verify

Output when clean:

[verify] checked 94 chunks across 3 wiki entries
[verify] 0 drifts detected

Output when a quote was altered:

[verify] DRIFT in vaswani2017attention chunk p4s32c1
  expected: 5f3e1c...
  actual:   a1b2c3...
[verify] 1 drift detected

Exit code 1 means drift — the agent or human must restore the original quote from the wiki.

Building the bib file

Collect all \cite{...} keys from a LaTeX draft and emit a single .bib:

grep -oP '\\cite\{[^}]+\}' draft.tex \
  | sed 's/\\cite{//;s/}//' \
  | tr ',' '\n' \
  | sort -u \
  | while read key; do crossmem mcp serve <<< "{\"method\":\"tools/call\",\"params\":{\"name\":\"crossmem_cite\",\"arguments\":{\"cite_key\":\"$key\",\"format\":\"bibtex\"}}}"; done

Or, have the agent do it: “Collect every cite key from my draft and produce a references.bib file using crossmem_cite.”

5. What crossmem protects against

Failure modeHow crossmem prevents it
Hallucinated citation metadataMulti-source reconciliation: arXiv + CrossRef + OpenAlex, ≥2 must agree. Disagreements surface as warnings in frontmatter.
Hallucinated quotesAgent contract: never compose original text, only copy the > blockquote. crossmem verify catches any post-hoc mutation via SHA-256 re-hashing.
Wrong page numbersEvery chunk carries page, section, and bbox — the reader can trace back to the exact PDF region.
Lost contextbyte_range preserves the exact location in the raw PDF. Chunks retain their section heading for navigation.
Cite key collisionsDeterministic pattern DSL with a–z suffix tiebreaker (then _<count> if all 26 are taken).

6. Limits

Be honest about what crossmem cannot do today:

  • Scanned / image-only PDFs: Marker’s OCR quality varies. Chunks from poorly scanned pages may have garbled text.
  • Math-heavy pages: The pipeline does not run Nougat or other math-aware extractors. Equations may appear as lossy Unicode approximations or be missing entirely.
  • Non-arXiv sources: Journal papers captured via DOI or local PDF have single-source metadata (CrossRef only), so there is no cross-verification. Books and conference proceedings with non-standard DOIs may produce incomplete frontmatter.
  • Single-author workflow: There is no shared library, sync, or multi-user conflict resolution. Each machine has its own ~/crossmem/ directory.
  • Ollama dependency: Compile requires a running Ollama instance. If Ollama is down or the model is missing, compile will fail.

7. Minimal paper-writing session

A scripted walkthrough — capture two papers, write an intro paragraph, verify.

# Capture two papers
crossmem capture https://arxiv.org/abs/1706.03762
crossmem compile vaswani2017attention

crossmem capture https://arxiv.org/abs/1810.04805
crossmem compile devlin2019bert

Now prompt the agent:

“Write an introductory paragraph for my Related Work section. It should cite both vaswani2017attention and devlin2019bert, quoting one key sentence from each verbatim. Output LaTeX with \cite commands and the BibTeX entries.”

The agent:

  1. Calls crossmem_recall("attention mechanism transformer", limit=5) and crossmem_recall("pre-training bidirectional", limit=5)
  2. Reads the wiki entries for both papers, selects one chunk each
  3. Produces:
The Transformer architecture replaced recurrence with self-attention:
\begin{quote}
``The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder.''
\end{quote}
\cite{vaswani2017attention}. Building on this,
BERT demonstrated that bidirectional pre-training could be applied to a wide
range of NLP tasks:
\begin{quote}
``We introduce a new language representation model called BERT, which stands
for Bidirectional Encoder Representations from Transformers.''
\end{quote}
\cite{devlin2019bert}.

% crossmem: vaswani2017attention p1s0c1 sha256=...
% crossmem: devlin2019bert p1s0c1 sha256=...
  1. Calls crossmem_cite("vaswani2017attention", "bibtex") and crossmem_cite("devlin2019bert", "bibtex") to emit references.bib

Finally, verify nothing drifted:

crossmem verify
# [verify] checked 94 chunks across 2 wiki entries
# [verify] 0 drifts detected

The quotes in your LaTeX match the raw PDFs. Ship it.