Nits-AI
Posts
AI Research Paper Disgest : #1

AI Research Paper Disgest : #1

Hallucination can be a feature | Best Practices for RAG | CAG is all you need

anita okoh
January 30, 2025

❝

“Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last.”
—Stephen Hawking

I have been taking time out to read through research papers related to LLM. I have found them quite insightful and I would like to share 3 of my recently read research paper.

They are

🎯 Enhancing Retrieval-Augmented Generation: Best Practices

🎯 Hallucinations Enhance LLMs in Drug Discovery

🎯 Cache-Augmented Generation for Knowledge Tasks

Below are three insights from each of the three papers above

🎯 Enhancing Retrieval-Augmented Generation: Best Practices

This research paper investigates best practices for Retrieval-Augmented Generation (RAG) systems, which enhance large language models (LLMs) by incorporating external knowledge sources. The authors examine the impact of various factors, including LLM size, prompt design, and knowledge base size, on RAG performance.

✔️ My favourite Insights

Contrastive In-Context Learning (ICL) significantly enhances RAG performance: The study found that incorporating contrastive examples (both correct and incorrect answers) into the in-context learning process substantially improves the accuracy and relevance of the generated responses
Focus Mode RAG is effective in improving response quality by prioritizing relevant sentences. This approach, called "Focus Mode," involves extracting and using only the most essential sentences instead of entire documents.The research highlights that focusing on the most relevant sentences within retrieved documents can lead to enhanced performance
The size of the knowledge base is not as critical as the quality and relevance of the documents: The study's findings indicate that increasing the size of the knowledge base or retrieving more documents does not necessarily improve the output quality of the RAG system

Paper Link: Here

🎯 Hallucinations Enhance LLMs in Drug Discovery

This research paper explores the benefits of Large Language Model (LLM) "hallucinations"—the generation of factually incorrect information—in drug discovery. The authors hypothesize and demonstrate that incorporating these hallucinations into prompts significantly improves LLMs' performance on various drug classification tasks.

✔️ My favourite Insights

LLM Hallucinations Can Be Beneficial: It's counterintuitive, but the study demonstrates that the very thing that is usually seen as a flaw in LLMs—their tendency to generate incorrect or nonsensical information (hallucinations)—can actually improve their performance in specific tasks, such as drug discovery
The source of the hallucination matters, with GPT-4o providing the most consistent improvements. While many LLMs showed performance gains with hallucinated text, hallucinations generated by OpenAI models, particularly GPT-4o, led to the greatest average improvements across different models
Non-Pretrained Languages Can Enhance Performance: It's unexpected that the language in which the hallucinations are generated can impact the performance of the LLM, with Chinese yielding the most significant improvements, even though it was not a language the model was pre-trained on

Paper Link: Here

🎯 Cache-Augmented Generation for Knowledge Tasks

This research paper introduces cache-augmented generation (CAG) as a more efficient alternative to retrieval-augmented generation (RAG) for knowledge-intensive tasks. CAG preloads all relevant knowledge into a large language model (LLM), eliminating the need for real-time retrieval and its associated latency and errors.

✔️ My favourite Insights

CAG as a Low-Latency Alternative to RAG
- Preloading and Caching: CAG eliminates real-time retrieval by loading all relevant resources into the LLM's extended context and caching the model’s runtime parameters. This reduces latency, simplifies the system, and minimizes document selection errors compared to retrieval-based approaches.
- Streamlined Usage: With all documents already in the context, queries can be answered immediately without additional retrieval steps. However, it requires that these documents are known in advance and can be reasonably fit into the LLM’s context window.
Limitations Tied to Context Window and Knowledge Base Size
- Context Window Constraints: Even long-context LLMs have upper limits on how much text they can handle in a single inference. If the total size of the needed documents exceeds the context window, CAG becomes infeasible.
- Suitability for “Manageable” Knowledge: CAG works best when you have a relatively small or stable corpus. In contrast, a large, continuously expanding knowledge base would be impractical to preload and maintain in the model’s context.
Precomputation Trade-offs and Potential Redundancy
- One-Time Cost: Precomputing the model’s key-value cache can be expensive, especially with large datasets or frequently updated knowledge bases (recomputing becomes necessary whenever documents change).
- Possible Redundancy: CAG processes all loaded documents for every query, which might lead to slower query handling if much of the context is irrelevant. RAG, by comparison, only retrieves and processes the most pertinent documents per query.

Paper Link: Here

💡Bonus

If you are like me and prefer listening to actually reading as a way to digest content, I created a Spotify Playlist containing podcast for all these research paper above — powered by NotebookLM.

Here is the Spotify playlist

Subscribe to the newsletter here and get notified when new curated research paper podcasts are uploaded

Reply

or to participate.