TarCite does not search the open web, Google Scholar, or a remote paper database. It runs a 7-stage retrieval pipeline over the PDFs and indexes stored on your machine, then uses your selected AI profile to judge which local-library papers support the paragraph. With the Local AI profile, the whole workflow stays on-device.
Before local retrieval begins, TarCite asks the selected LLM to write a hypothetical ideal cited passage — a 2–4 sentence excerpt that a relevant source might actually contain. If you use Local AI, that prompt is handled by the model on your computer; TarCite is not searching Google, Crossref, or any web database for candidate papers.
This matters because there is a vocabulary gap between how researchers write and how academic papers are written. Your paragraph might say "brain activity during learning" while the relevant paper says "synaptic plasticity in hippocampal networks." HyDE bridges that gap by generating text in paper-like language, which improves retrieval recall significantly.
If HyDE fails for any reason, the pipeline falls back to using your original paragraph directly — no suggestion is lost.
Three search strategies run simultaneously against ChromaDB and SQLite indexes stored on your machine. No web search, remote paper catalog, or cloud sync is involved. Each catches a different kind of local-library match — together they are far more robust than any single approach.
Local ChromaDB cosine similarity between the HyDE embedding and document chunk vectors already stored on your machine. Catches conceptual similarity — finds papers in your own library even when the words are different.
Local SQLite FTS5 full-text search with Porter stemming. Catches exact terminology inside your imported PDFs — critical for specialised terms like gene names, chemical compounds, or acronyms that must match exactly.
Direct keyword matching against paper titles in your local library. Papers whose titles match query terms receive a strong boost — title-level relevance is a high-confidence topical signal.
All document text from your imported PDFs is pre-chunked and stored locally in overlapping ~900 character segments with sentence-aware boundaries, so no evidence passage is cut mid-sentence.
The three ranked lists are merged into a single ranking using Reciprocal Rank Fusion (RRF) — a proven technique for combining multiple ranked retrieval results.
Each chunk's fused score is the sum of weight / (60 + rank) across all lists. Vector and BM25 results use weight 1.0; title matches use weight 2.0, giving them a strong boost because a paper whose title directly matches is very likely relevant.
RRF is rank-based rather than score-based, making it robust to the fact that cosine distances and BM25 scores are on completely different scales.
The fused candidates are re-scored on your machine by a cross-encoder model — a small neural network (~66 MB) that scores each (paragraph, evidence) pair jointly using full cross-attention.
This is fundamentally more accurate than cosine similarity. Bi-encoder models embed query and document separately; cross-encoders process both texts together, allowing the model to directly compare them word by word. The result is a much more precise relevance score.
Up to 1024 characters of each paper's best evidence are used as input — enough to capture the substance of the argument, not just a fragment.
BAAI/bge-reranker-base (default) or cross-encoder/ms-marco-MiniLM-L-6-v2After reranking, Max Marginal Relevance (MMR) ensures the final candidate set is diverse. Without this step, you might receive five citations that all make the same point from slightly different angles.
Each candidate is scored as: λ × relevance − (1−λ) × max_similarity_to_already_selected
With λ = 0.7 (70% relevance, 30% diversity), MMR selects sources that are both highly relevant and meaningfully different from each other — covering different aspects of the claim in your paragraph rather than repeating the same finding.
The diverse, reranked candidates are passed to your selected LLM for final evaluation. With the Local AI profile, this handoff stays on your machine through Ollama; with a cloud or custom profile, only the paragraph and selected evidence snippets are sent to that provider, not your full PDFs or library. The LLM receives each candidate's title, authors, year, DOI, journal, and up to 6 evidence passages — and is instructed to:
Anti-hallucination safeguards: Citations are pre-formatted before LLM validation — the LLM cannot invent a different reference format. Every returned paper ID is validated against the actual retrieved set; any fabricated citation is silently removed. If the LLM returns malformed JSON, the system retries with a stricter prompt, then attempts automatic repair.
Each suggestion shows a thumbs up / thumbs down button. Feedback is saved locally in SQLite on your machine, tied to the specific run and paper, and is not uploaded. Clicking the active thumb again undoes it.
This feedback is available for future ranking personalisation — as the dataset grows, it can be used to weight sources you consistently find useful higher in future runs.
Download TarCite and run your first citation suggestion in minutes — no account, no cloud library upload, no setup beyond pointing it at your PDF folder.