Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

crossmem

crossmem is a local-first citation and knowledge pipeline. It captures academic papers (arXiv PDFs today, YouTube and more coming), compiles them into structured wiki notes with verbatim quotes and provenance metadata, and serves them to AI agents via MCP.

What it does

  1. Capture — downloads a paper, extracts metadata from arXiv + CrossRef + OpenAlex, generates a deterministic cite key
  2. Compile — parses the PDF (via Marker or pdftotext), splits into paragraph-level chunks with bounding-box provenance, runs a local LLM (Ollama) to add paraphrase and implication per chunk
  3. Verify — re-hashes every chunk’s text against its stored SHA-256; detects silent drift
  4. Cite & Recall — MCP tools that let Claude (or any MCP client) look up citations and search your wiki

Design principles

  • Verbatim quotes are ground truth. The LLM only touches paraphrase/implication fields, never the original text.
  • Provenance is first-class. Every chunk carries page, section, bounding box, SHA-256 hash, and byte range back to the source PDF.
  • Metadata is cross-verified. Title, authors, and year must agree across at least two canonical sources (arXiv, CrossRef, OpenAlex). Disagreements surface as warnings, not silent picks.
  • Everything runs locally. No cloud APIs. Ollama for LLM, Marker for PDF parsing, all on your Mac.