Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

crossmem capture

Download a paper and extract metadata.

Usage

crossmem capture <input> [--doi <doi>] [--cite-key <key>]

Input types

InputExampleDetection
Local PDF file/path/to/paper.pdfPath exists on disk
arXiv URL or bare IDhttps://arxiv.org/abs/1706.03762, 1706.03762arXiv URL pattern or bare numeric ID
DOI URL or bare DOIhttps://doi.org/10.1038/nature12373, 10.1038/nature12373DOI URL prefix or 10.NNNN/... pattern
Direct PDF URLhttps://example.com/paper.pdfHTTPS URL ending in .pdf

Inputs are matched in the order above — first match wins.

Flags

FlagDescription
--doi <doi>Attach a DOI to the capture. For local files and PDF URLs, fetches CrossRef metadata for this DOI.
--cite-key <key>Override the auto-generated cite key.

What it does

arXiv input (existing behavior)

  1. Extracts the arXiv ID from the URL
  2. Fetches metadata from arXiv API
  3. Cross-checks metadata against CrossRef and OpenAlex (reconciliation)
  4. Downloads the PDF from arXiv
  5. Generates a cite key using the configured pattern DSL
  6. Saves PDF + .meta.json sidecar

DOI input

  1. Fetches metadata from CrossRef API
  2. Tries Unpaywall API for an open-access PDF URL (requires CROSSMEM_UNPAYWALL_EMAIL env var)
  3. If no open-access PDF found, prints instructions to download manually and use local file capture

Local PDF file

  1. Copies (not moves) the PDF to ~/crossmem/raw/<timestamp>_<cite_key>.pdf
  2. If --doi given: fetches CrossRef metadata
  3. If no --doi: tries extracting embedded PDF metadata via pdfinfo (Title, Author, CreationDate)
  4. If no metadata found and no --cite-key: errors with instructions

Direct PDF URL

  1. Downloads the PDF
  2. Then follows the same metadata path as local file (CrossRef via --doi, or pdfinfo fallback)

Exit codes

CodeMeaning
0Success
1Error (invalid input, download failure, metadata fetch failure)
2Missing arguments

Environment variables

VariableDescription
CROSSMEM_UNPAYWALL_EMAILEmail address for Unpaywall API (required for DOI→PDF lookup)

See config.toml for cite key configuration.

Examples

arXiv paper

$ crossmem capture https://arxiv.org/abs/1706.03762
[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
cite_key:   vaswani2017attention

Journal paper via DOI

$ crossmem capture 10.1063/5.0012345
[capture] DOI: 10.1063/5.0012345
cite_key:   smith2023molecular

Local PDF with DOI metadata

$ crossmem capture ~/Downloads/paper.pdf --doi 10.1063/5.0012345
[capture] Local file: /Users/me/Downloads/paper.pdf
[capture] Fetching CrossRef metadata for DOI 10.1063/5.0012345
cite_key:   smith2023molecular

Local PDF with manual cite key

$ crossmem capture ~/Downloads/paper.pdf --cite-key jones2024transport
[capture] Local file: /Users/me/Downloads/paper.pdf
cite_key:   jones2024transport

Direct PDF URL

$ crossmem capture https://example.com/papers/preprint.pdf --doi 10.1234/example
cite_key:   doe2024example

Storage layout

~/crossmem/raw/
  <timestamp>_<cite_key>.pdf        # Raw PDF
  <timestamp>_<cite_key>.meta.json  # Metadata sidecar

The .meta.json file contains the reconciled metadata used by compile:

{
  "cite_key": "smith2023molecular",
  "title": "Molecular dynamics simulation of transport",
  "authors": ["John Smith", "Jane Doe"],
  "year": 2023,
  "arxiv_id": "",
  "doi": "10.1063/5.0012345",
  "container_title": "The Journal of Chemical Physics",
  "sources": ["crossref"],
  "reconciled": true,
  "warnings": []
}