Using PaperQA

A High Accuracy RAG System

Nov 14, 2024

PaperQA, FutureHouse's scientific RAG system, boasts superhuman performance in scientific tasks such as question answering, summarization, and contradiction detection. As it's open source, I decided to give it a try.

To save yourself hours of frustration (I didn’t find their documentation particularly helpful), just follow the 3 simple steps below to have it up and running in less than 5min. This AI agent can answer questions from your collection of PDFs or text files, providing citations. It has a virtually unlimited context window size and is free from hallucinations. How cool is that?

Minimum requirements: You’ll need Python 3.11+ and a package manager (PIP).

Step 1: Install paper-qa

Run this in your terminal.

pip install paper-qa

Step 2: OpenAI API key

By default, PaperQA will use OpenAI’s LLMs. Log in to your OpenAI Dashboard and copy your API secret key. Yes, this means you’ll have to pay to index your documents and for every question you ask about them. More on that below.

After obtaining your API key, you'll need to set it up in your environment. Simply run this command:

export OPENAI_API_KEY="your_api_key_here"

Step 3 - Ask your question

Now move to a folder where you have your documents and ask a question.

cd my_papers
pqa ask 'How are we going to solve aging?'

That’s it!

Now, what if you want to use different LLMs, or multiple indexes, or what if you want to integrate PaperQA with your own app? See advanced mode below. But first, let’s look at a use case.

Use case: Mouse Lifespan Interventions

I have a curated list of 51 published studies on mouse lifespan interventions and I have a few questions that I would like to ask. Here’s the tricky part: the answers cannot be found in any one of the documents. Providing a correct answer requires comprehending the most relevant facts from the full text of all papers and synthesizing them into a coherent response with citations. This approach helps prevent hallucinations and ensures accuracy.

The numbers

Indexing 51 papers took 4m23s and cost $0.25 in total. Each answer took around 25s to generate and cost $0.06. Not sure how these will scale with bigger collections of papers, though.

The answers

Here’s a couple of examples.

Q1: What's the most impactful lifespan intervention in mice?

Answer:

The most impactful lifespan intervention in mice, as identified in recent studies, is a 40% caloric restriction (CR). This intervention has been shown to significantly extend the lifespan of genetically diverse female mice. The study found that a 40% reduction in caloric intake resulted in the strongest lifespan extension effect among various dietary interventions, although it also led to a loss of lean mass and alterations in the immune system, potentially increasing susceptibility to infections (francesco2024dietaryrestrictionimpacts pages 1-1; luciano2024longitudinalfragilityphenotyping pages 2-3).

Another notable intervention involves early, transient exposure to adult female olfactory cues. This method significantly extended the lifespan of female mice when they were exposed to these scents from 3 days to 60 days of age. The study highlighted that such olfactory cues could have long-lasting effects on health maintenance and lifespan extension, specifically in female mice, while male mice did not exhibit similar benefits from olfactory exposure (garratt2022lifespanextensionin pages 1-2; garratt2022lifespanextensionin pages 2-2).

These findings underscore the potential of both dietary and environmental interventions in modulating lifespan, with caloric restriction and olfactory cues emerging as significant factors in lifespan extension in mice.

References
1. (francesco2024dietaryrestrictionimpacts pages 1-1): Andrea Di Francesco, Andrew G. Deighan, Lev Litichevskiy, Zhenghao Chen, Alison Luciano, Laura Robinson, Gaven Garland, Hannah Donato, Matthew Vincent, Will Schott, Kevin M. Wright, Anil Raj, G. V. Prateek, Martin Mullis, Warren G. Hill, Mark L. Zeidel, Luanne L. Peters, Fiona Harding, David Botstein, Ron Korstanje, Christoph A. Thaiss, Adam Freund, and Gary A. Churchill. Dietary restriction impacts health and lifespan of genetically diverse mice. Nature, 634:684-692, Oct 2024. URL: https://doi.org/10.1038/s41586-024-08026-3, doi:10.1038/s41586-024-08026-3.

2. (garratt2022lifespanextensionin pages 1-2): Michael Garratt. Lifespan extension in female mice by early, transient exposure to adult female olfactory cues. eLife, 2022. URL: https://doi.org/10.7554/elife.84060, doi:10.7554/elife.84060.

3. (garratt2022lifespanextensionin pages 2-2): Michael Garratt. Lifespan extension in female mice by early, transient exposure to adult female olfactory cues. eLife, 2022. URL: https://doi.org/10.7554/elife.84060, doi:10.7554/elife.84060.

4. (luciano2024longitudinalfragilityphenotyping pages 2-3): Alison Luciano, Laura Robinson, Gaven Garland, Bonnie Lyons, Ron Korstanje, Andrea Di Francesco, and Gary A. Churchill. Longitudinal fragility phenotyping predicts lifespan and age-associated morbidity in c57bl/6 and diversity outbred mice. bioRxiv, Feb 2024. URL: https://doi.org/10.1101/2024.02.06.579096, doi:10.1101/2024.02.06.579096.

PaperQA's response, based on the 51 papers I provided, is quite impressive. The AI agent compared the impact of multiple studies across various papers. Moreover, it addressed the ambiguity in my question about "impact" by highlighting important distinctions such as lifespan versus healthspan and differences between male and female mice. This nuanced approach demonstrates the system's ability to synthesize complex information from multiple sources.

Q2: What was the experiment with the longest duration?

Answer:

The experiment with the longest duration involved testing the effects of different caloric restriction (CR) protocols on the lifespan of male C57BL/6J mice. This study required at least four years to complete and compared five CR protocols with an ad libitum (AL) control group. The CR protocols varied in feeding times and fasting intervals, with some mice receiving food during the day or night, and others having food distributed over a 12-hour or 24-hour window. The study utilized a purified diet to ensure consistency over the duration and employed an automated feeding system for continuous monitoring of feeding and activity throughout the mice's lifespan (acostarodriguez2022circadianalignmentof pages 2-3).

In contrast, other experiments, such as those involving the administration of various substances like NDGA, metformin, and rapamycin to UM-HET3 mice, started at different ages and did not explicitly mention a duration as long as four years. For instance, NDGA was administered starting at four months of age, but the overall duration of these experiments was not specified to be as long as the CR study (strong2016longerlifespanin pages 10-11). Therefore, the caloric restriction study appears to be the longest in terms of duration.

References
1. (acostarodriguez2022circadianalignmentof pages 2-3): Victoria Acosta-Rodríguez. Circadian alignment of early onset caloric restriction promotes longevity in male c57bl/6j mice. Science (New York, N.Y.), 376:1192-1202, 2022. URL: https://doi.org/10.1126/science.abk0297, doi:10.1126/science.abk0297.

2. (strong2016longerlifespanin pages 10-11): Randy Strong. Longer lifespan in male mice treated with a weakly estrogenic agonist, an antioxidant, an α‐glucosidase inhibitor or a nrf2‐inducer. Aging Cell, 15:872-884, 2016. URL: https://doi.org/10.1111/acel.12496, doi:10.1111/acel.12496.

“Advanced” mode

With just a few lines of Python code, you can enhance functionality by swapping OpenAI for other LLMs, creating and reusing multiple indexes, and asking questions programmatically.

Setting up the API keys, adding Google’s Gemini to the mix:

from paperqa import Settings
from paperqa.agents.main import agent_query
from paperqa.agents.models import QueryRequest
from paperqa.agents.search import get_directory_index
import os
import pickle


## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["GEMINI_API_KEY"] = "your-api-key"
os.environ["GOOGLE_API_KEY"] = "your-api-key"

## Folder with the papers
paper_folder = "PDF"

Initialising PaperQA with Gemini 1.5 Flash 8B (you can use any models supported by LiteLLM):

## PaperQA Settings
## manifest_file: The manifest file contains paper metadata. This is optional but it will ensure that papers are correctly identified. It's a CSV with file_location, doi, title. 
## embedding: It's "text-embedding-3-small", by default. You can change it to "text-embedding-large-3" or any other embedding model.
## paper_directory: Folder where you have the papers. It will look in sub-folders, unless you disable this option.
## You can use any models supported by LiteLLM. See full list here: https://docs.litellm.ai/docs/providers
##

# Use Gemini 1.5 Flash 8B
settings = Settings(llm = "gemini/gemini-1.5-flash-8b", summary_llm = "gemini/gemini-1.5-flash-8b", manifest_file = "manifest.csv", paper_directory = paper_folder)

# Use Gemini 1.5 Pro
# settings = Settings(llm = "gemini/gemini-1.5-pro", summary_llm = "gemini/gemini-1.5-pro", manifest_file = "manifest.csv", paper_directory = paper_folder)

# Use OpenAI
# settings = Settings(manifest_file = "manifest.csv", paper_directory = paper_folder)

index_name = "my_index"  ## The index will be saved in this file

Reusing & saving indexes:

# Load previously generated embeddings (saved in my_docs.pkl), if any, so that we don't need to recreate the embeddings again.
if os.path.exists("my_docs.pkl"):
    with open("my_docs.pkl", "rb") as f:
        docs = pickle.load(f)

## Build the index of papers
built_index = await get_directory_index(index_name=index_name, settings=settings) 

# Save generated embeddings (saved in my_docs.pkl), so that we don't need to recreate the embeddings again
with open("my_docs.pkl", "wb") as f:
        pickle.dump(docs, f)

Asking questions programmatically and extracting metadata from the answers (token count, cost, context, etc.):

# Ask your question
query = "What's the effect of rapamycin on healthspan?"
answer_response = await agent_query(
        query=QueryRequest(
            query=query, settings=settings
        )
    )

# Print answer
print(f"\n=== Formatted answer: {answer_response.answer.formatted_answer}")

# Extracting metadata from the answer
for c in answer_response.answer.contexts:
    print(f"=== Context: {c}")
print(f"\n=== Used contexts: {answer_response.answer.used_contexts}")
print(f"\n=== References: {answer_response.answer.references}")
print(f"\n=== Token counts: {answer_response.answer.token_counts}")
print(f"\n=== Cost: {answer_response.answer.cost}")

I hope you enjoy it.

P.S.
I'm creating a catalog of mouse lifespan interventions, detailing experiment design, intervention protocol, survival data, etc. A single place where anyone can find this data, visualise, compare and maybe even find new research opportunities.
If you find this interesting and want to help in any capacity, please drop me a message.

Jorge’s Substack

Discussion about this post