Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Model

Core types

ReconciledMetadata

The metadata reconciler merges data from multiple sources into a single canonical record.

#![allow(unused)]
fn main() {
pub struct ReconciledMetadata {
    pub title: String,
    pub authors: Vec<String>,
    pub year: u16,
    pub arxiv_id: String,
    pub doi: Option<String>,
    pub doi_preprint: Option<String>,
    pub doi_published: Option<String>,
    pub sources: Vec<String>,       // e.g. ["arxiv", "crossref", "openalex"]
    pub warnings: Vec<String>,
    pub reconciled: bool,
}
}

ChunkV2

The paragraph-level chunk with full provenance.

#![allow(unused)]
fn main() {
pub struct ChunkV2 {
    pub chunk_type: String,         // "page", "heading", "paragraph", etc.
    pub chunk_id: String,           // e.g. "p1s1c1"
    pub page: usize,
    pub text: String,               // Verbatim extracted text
    pub provenance: Provenance,
    pub paraphrase: Option<String>, // LLM-generated
    pub implication: Option<String>,// LLM-generated
}
}

Provenance

Tracks exactly where a chunk came from in the source PDF.

#![allow(unused)]
fn main() {
pub struct Provenance {
    pub page: usize,
    pub section: Option<String>,
    pub bbox: Option<[f64; 4]>,     // [x_min, y_min, x_max, y_max]
    pub text_sha256: String,
    pub byte_range: Option<[usize; 2]>,
}
}

WikiEntry (MCP)

The in-memory representation used by the MCP server.

#![allow(unused)]
fn main() {
struct WikiEntry {
    cite_key: Option<String>,
    title: String,
    authors: Vec<String>,
    year: Option<u16>,
    source: Option<String>,
    date: Option<String>,
    file_path: PathBuf,
    body: String,
}
}

Storage layout

~/crossmem/
├── raw/                                    # Capture output
│   ├── <timestamp>_<cite_key>.pdf          # Raw PDF
│   └── <timestamp>_<cite_key>.meta.json    # Reconciled metadata
└── wiki/                                   # Compile output
    └── <timestamp>_<cite_key>.md           # Wiki note

Trust boundaries

DataSourceVerifiable?
Title, authors, year, DOIMetadata reconciler (arXiv + CrossRef + OpenAlex)Cross-source agreement
Cite key, citation stringsDeterministic generatorPure function, unit-tested
Verbatim quote textPDF extractor (Marker / pdftotext)SHA-256 hash
Bounding box, byte rangePDF extractorRe-extraction reproducibility
Paraphrase, implicationLLM (Ollama)Not verifiable — advisory only