crossmem capture
Download a paper and extract metadata.
Usage
crossmem capture <input> [--doi <doi>] [--cite-key <key>]
Input types
| Input | Example | Detection |
|---|---|---|
| Local PDF file | /path/to/paper.pdf | Path exists on disk |
| arXiv URL or bare ID | https://arxiv.org/abs/1706.03762, 1706.03762 | arXiv URL pattern or bare numeric ID |
| DOI URL or bare DOI | https://doi.org/10.1038/nature12373, 10.1038/nature12373 | DOI URL prefix or 10.NNNN/... pattern |
| Direct PDF URL | https://example.com/paper.pdf | HTTPS URL ending in .pdf |
Inputs are matched in the order above — first match wins.
Flags
| Flag | Description |
|---|---|
--doi <doi> | Attach a DOI to the capture. For local files and PDF URLs, fetches CrossRef metadata for this DOI. |
--cite-key <key> | Override the auto-generated cite key. |
What it does
arXiv input (existing behavior)
- Extracts the arXiv ID from the URL
- Fetches metadata from arXiv API
- Cross-checks metadata against CrossRef and OpenAlex (reconciliation)
- Downloads the PDF from arXiv
- Generates a cite key using the configured pattern DSL
- Saves PDF +
.meta.jsonsidecar
DOI input
- Fetches metadata from CrossRef API
- Tries Unpaywall API for an open-access PDF URL (requires
CROSSMEM_UNPAYWALL_EMAILenv var) - If no open-access PDF found, prints instructions to download manually and use local file capture
Local PDF file
- Copies (not moves) the PDF to
~/crossmem/raw/<timestamp>_<cite_key>.pdf - If
--doigiven: fetches CrossRef metadata - If no
--doi: tries extracting embedded PDF metadata viapdfinfo(Title, Author, CreationDate) - If no metadata found and no
--cite-key: errors with instructions
Direct PDF URL
- Downloads the PDF
- Then follows the same metadata path as local file (CrossRef via
--doi, orpdfinfofallback)
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid input, download failure, metadata fetch failure) |
| 2 | Missing arguments |
Environment variables
| Variable | Description |
|---|---|
CROSSMEM_UNPAYWALL_EMAIL | Email address for Unpaywall API (required for DOI→PDF lookup) |
See config.toml for cite key configuration.
Examples
arXiv paper
$ crossmem capture https://arxiv.org/abs/1706.03762
[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
cite_key: vaswani2017attention
Journal paper via DOI
$ crossmem capture 10.1063/5.0012345
[capture] DOI: 10.1063/5.0012345
cite_key: smith2023molecular
Local PDF with DOI metadata
$ crossmem capture ~/Downloads/paper.pdf --doi 10.1063/5.0012345
[capture] Local file: /Users/me/Downloads/paper.pdf
[capture] Fetching CrossRef metadata for DOI 10.1063/5.0012345
cite_key: smith2023molecular
Local PDF with manual cite key
$ crossmem capture ~/Downloads/paper.pdf --cite-key jones2024transport
[capture] Local file: /Users/me/Downloads/paper.pdf
cite_key: jones2024transport
Direct PDF URL
$ crossmem capture https://example.com/papers/preprint.pdf --doi 10.1234/example
cite_key: doe2024example
Storage layout
~/crossmem/raw/
<timestamp>_<cite_key>.pdf # Raw PDF
<timestamp>_<cite_key>.meta.json # Metadata sidecar
The .meta.json file contains the reconciled metadata used by compile:
{
"cite_key": "smith2023molecular",
"title": "Molecular dynamics simulation of transport",
"authors": ["John Smith", "Jane Doe"],
"year": 2023,
"arxiv_id": "",
"doi": "10.1063/5.0012345",
"container_title": "The Journal of Chemical Physics",
"sources": ["crossref"],
"reconciled": true,
"warnings": []
}