Back to tarcite.com
Under the Hood

How citation suggestions
actually work.

TarCite does not search the open web, Google Scholar, or a remote paper database. It runs a 7-stage retrieval pipeline over the PDFs and indexes stored on your machine, then uses your selected AI profile to judge which local-library papers support the paragraph. With the Local AI profile, the whole workflow stays on-device.

Stages 2–5 and feedback storage run fully offline, always. Stage 1 and Stage 6 use the selected LLM. Choose the Local AI profile and those LLM steps also run on your machine, with no internet or cloud request.

1
LLM · your AI profile

HyDE — Query Expansion

Before local retrieval begins, TarCite asks the selected LLM to write a hypothetical ideal cited passage — a 2–4 sentence excerpt that a relevant source might actually contain. If you use Local AI, that prompt is handled by the model on your computer; TarCite is not searching Google, Crossref, or any web database for candidate papers.

This matters because there is a vocabulary gap between how researchers write and how academic papers are written. Your paragraph might say "brain activity during learning" while the relevant paper says "synaptic plasticity in hippocampal networks." HyDE bridges that gap by generating text in paper-like language, which improves retrieval recall significantly.

If HyDE fails for any reason, the pipeline falls back to using your original paragraph directly — no suggestion is lost.

Why this works

  • Embeds the hypothetical passage as the vector search query inside your local vector index — not your raw paragraph
  • The resulting vector is closer in embedding space to real paper excerpts
  • Improves semantic recall by 10–20% compared to embedding the query paragraph directly
2
Fully local

Parallel Triple Retrieval

Three search strategies run simultaneously against ChromaDB and SQLite indexes stored on your machine. No web search, remote paper catalog, or cloud sync is involved. Each catches a different kind of local-library match — together they are far more robust than any single approach.

Vector Search

Local ChromaDB cosine similarity between the HyDE embedding and document chunk vectors already stored on your machine. Catches conceptual similarity — finds papers in your own library even when the words are different.

BM25 Keyword

Local SQLite FTS5 full-text search with Porter stemming. Catches exact terminology inside your imported PDFs — critical for specialised terms like gene names, chemical compounds, or acronyms that must match exactly.

Title Search

Direct keyword matching against paper titles in your local library. Papers whose titles match query terms receive a strong boost — title-level relevance is a high-confidence topical signal.

All document text from your imported PDFs is pre-chunked and stored locally in overlapping ~900 character segments with sentence-aware boundaries, so no evidence passage is cut mid-sentence.

3
Fully local

Weighted RRF Fusion

The three ranked lists are merged into a single ranking using Reciprocal Rank Fusion (RRF) — a proven technique for combining multiple ranked retrieval results.

Each chunk's fused score is the sum of weight / (60 + rank) across all lists. Vector and BM25 results use weight 1.0; title matches use weight 2.0, giving them a strong boost because a paper whose title directly matches is very likely relevant.

RRF is rank-based rather than score-based, making it robust to the fact that cosine distances and BM25 scores are on completely different scales.

After fusion

  • Results are grouped by paper — all chunks from the same document are consolidated
  • The best 6 evidence chunks per paper are kept
  • Full metadata (title, authors, year, DOI, journal) is retrieved from the local SQLite database for each paper
4
Fully local

Cross-Encoder Reranking

The fused candidates are re-scored on your machine by a cross-encoder model — a small neural network (~66 MB) that scores each (paragraph, evidence) pair jointly using full cross-attention.

This is fundamentally more accurate than cosine similarity. Bi-encoder models embed query and document separately; cross-encoders process both texts together, allowing the model to directly compare them word by word. The result is a much more precise relevance score.

Up to 1024 characters of each paper's best evidence are used as input — enough to capture the substance of the argument, not just a fragment.

Model

  • BAAI/bge-reranker-base (default) or cross-encoder/ms-marco-MiniLM-L-6-v2
  • Runs on CPU, Apple Silicon MPS, or NVIDIA CUDA — auto-detected, no configuration needed
  • The model is downloaded once, cached locally, and then reused — no network access and no document text leaves your machine during reranking
5
Fully local

MMR Diversity Selection

After reranking, Max Marginal Relevance (MMR) ensures the final candidate set is diverse. Without this step, you might receive five citations that all make the same point from slightly different angles.

Each candidate is scored as: λ × relevance − (1−λ) × max_similarity_to_already_selected

With λ = 0.7 (70% relevance, 30% diversity), MMR selects sources that are both highly relevant and meaningfully different from each other — covering different aspects of the claim in your paragraph rather than repeating the same finding.

6
LLM · your AI profile

LLM Evaluation & Validation

The diverse, reranked candidates are passed to your selected LLM for final evaluation. With the Local AI profile, this handoff stays on your machine through Ollama; with a cloud or custom profile, only the paragraph and selected evidence snippets are sent to that provider, not your full PDFs or library. The LLM receives each candidate's title, authors, year, DOI, journal, and up to 6 evidence passages — and is instructed to:

What the LLM does

  • Select the most relevant sources and rank them
  • Extract verbatim evidence quotes — direct text from the paper, not paraphrased
  • Rate confidence: High, Medium, or Low — based on how directly the source supports the claim
  • Rate evidence coverage: strong, partial, or single point
  • Return structured JSON for the app to render

Anti-hallucination safeguards: Citations are pre-formatted before LLM validation — the LLM cannot invent a different reference format. Every returned paper ID is validated against the actual retrieved set; any fabricated citation is silently removed. If the LLM returns malformed JSON, the system retries with a stricter prompt, then attempts automatic repair.

7
Fully local

User Feedback

Each suggestion shows a thumbs up / thumbs down button. Feedback is saved locally in SQLite on your machine, tied to the specific run and paper, and is not uploaded. Clicking the active thumb again undoes it.

This feedback is available for future ranking personalisation — as the dataset grows, it can be used to weight sources you consistently find useful higher in future runs.

Ready to try it?

Download TarCite and run your first citation suggestion in minutes — no account, no cloud library upload, no setup beyond pointing it at your PDF folder.