Using LlamaIndex with LangChain for Enhanced Document Retrieval and QA: A Comprehensive Guide

The integration of LlamaIndex with LangChain provides a powerful framework for building advanced document retrieval and question-answering (QA) systems. LlamaIndex excels at indexing and querying large document sets, while LangChain enhances conversational AI with tools for memory, chains, and LLM integrations.

Introduction to LlamaIndex and LangChain

LlamaIndex is a data framework designed for connecting LLMs with external data sources, offering efficient indexing, retrieval, and query engines for documents. LangChain complements this with conversational memory, chains, and integrations, enabling context-aware applications. Together, they create robust QA systems powered by LLMs like OpenAI’s gpt-3.5-turbo.

This tutorial assumes basic Python knowledge, with references to LangChain’s getting started guide, LlamaIndex’s documentation, and OpenAI’s API documentation.

Prerequisites for Building the QA System

Ensure you have:

pip install langchain openai llama-index pypdf

Step 1: Setting Up the Development Environment

Configure your environment by importing libraries and setting the OpenAI API key.

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

Replace "your-openai-api-key" with your actual key. Environment variables enhance security, as explained in LangChain’s security and API keys guide. The imported modules from LlamaIndex and LangChain are core to the system, detailed in LangChain’s core components overview and LlamaIndex’s getting started.

Step 2: Loading and Indexing Documents with LlamaIndex

Load documents and create a vector index using LlamaIndex’s SimpleDirectoryReader and VectorStoreIndex.

# Load documents from a directory
documents = SimpleDirectoryReader(
    input_dir="path/to/documents",
    recursive=True,
    required_exts=[".pdf", ".txt"]
).load_data()

# Create vector index
index = VectorStoreIndex.from_documents(
    documents,
    embed_model="local:BAAI/bge-small-en-v1.5",
    chunk_size=1000,
    chunk_overlap=200
)

Key Parameters for SimpleDirectoryReader

  • input_dir: Directory containing documents (e.g., "path/to/documents").
  • recursive: If True, scans subdirectories for files.
  • required_exts: File extensions to include (e.g., [".pdf", ".txt"]).

Key Parameters for VectorStoreIndex.from_documents

  • documents: Loaded document objects.
  • embed_model: Embedding model (e.g., "local:BAAI/bge-small-en-v1.5" for local embeddings, or OpenAI’s text-embedding-ada-002 via API). Local models reduce costs.
  • chunk_size: Maximum characters per chunk (e.g., 1000). Balances retrieval and context.
  • chunk_overlap: Overlapping characters between chunks (e.g., 200). Preserves context.

For advanced document loading, see LangChain’s document loaders or LlamaIndex’s data connectors.

Step 3: Initializing the Language Model

Initialize the OpenAI LLM using ChatOpenAI for conversational responses.

llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0.5,
    max_tokens=512,
    top_p=0.9,
    frequency_penalty=0.2,
    presence_penalty=0.1
)

Key Parameters for ChatOpenAI

  • model_name: OpenAI model (e.g., gpt-3.5-turbo, gpt-4). gpt-3.5-turbo is efficient; gpt-4 excels in reasoning. See OpenAI’s model documentation.
  • temperature (0.0–2.0): Controls randomness. At 0.5, responses are focused yet creative. Lower (e.g., 0.2) for precision; higher (e.g., 1.0) for diversity.
  • max_tokens: Maximum response length (e.g., 512). Adjust for detail; higher values increase costs. See LangChain’s token limit handling.
  • top_p (0.0–1.0): Nucleus sampling. At 0.9, focuses on high-probability tokens.
  • frequency_penalty (–2.0–2.0): Discourages repetition. At 0.2, promotes variety.
  • presence_penalty (–2.0–2.0): Encourages new topics. At 0.1, promotes novelty.

For alternatives, see LangChain’s integrations.

Step 4: Creating a LlamaIndex Query Engine

Set up a query engine to retrieve and synthesize information from the indexed documents.

query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=3,
    response_mode="compact",
    verbose=True
)

Key Parameters for as_query_engine

  • llm: The initialized LLM for response generation.
  • similarity_top_k: Number of top documents to retrieve (e.g., 3). Balances relevance and context.
  • response_mode: Synthesis mode (e.g., "compact", "tree_summarize", "refine"). "compact" generates concise responses; "refine" iterates for detail.
  • verbose: If True, logs query details for debugging.

For advanced query engines, see LlamaIndex’s query engine guide.

Step 5: Integrating LlamaIndex with LangChain’s ConversationChain

Combine LlamaIndex’s query engine with LangChain’s ConversationChain for context-aware conversations.

memory = ConversationBufferMemory(
    memory_key="history",
    return_messages=True,
    k=5
)

custom_prompt = PromptTemplate(
    input_variables=["history", "input"],
    template="You are a research assistant. Use the provided context to answer questions accurately, incorporating conversation history:\n\nHistory: {history}\n\nUser: {input}\n\nAssistant: ",
    validate_template=True
)

# Wrap LlamaIndex query engine as a LangChain tool
from langchain.agents import Tool

query_tool = Tool(
    name="DocumentQuery",
    func=lambda q: str(query_engine.query(q)),
    description="Queries a document index for relevant information."
)

# Create a ConversationChain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=custom_prompt,
    verbose=True,
    output_key="response"
)

Key Parameters for ConversationBufferMemory

  • memory_key: History variable name (default: "history").
  • return_messages: If True, returns message objects; if False, a string. True suits chat models.
  • k: Limits stored interactions (e.g., 5).

Key Parameters for PromptTemplate

  • input_variables: Template variables (e.g., ["history", "input"]).
  • template: Defines tone and structure.
  • validate_template: If True, validates variables.

Key Parameters for Tool

  • name: Tool identifier (e.g., "DocumentQuery").
  • func: Function to execute (e.g., query engine wrapper).
  • description: Tool purpose for agent reasoning.

Key Parameters for ConversationChain

  • llm: The LLM.
  • memory: The memory component.
  • prompt: Custom prompt template.
  • verbose: If True, logs prompts.
  • output_key: Output key (default: "response").

See LangChain’s chains.

Step 6: Querying the QA System

Test the system by querying the document index and maintaining conversational context.

# Example queries
response = conversation.predict(input="What are the key findings in the documents?")
print(response)

response = conversation.predict(input="Can you elaborate on the main theme?")
print(response)

Example Output:

The documents highlight key findings in sustainable technology, including energy efficiency and scalable solutions, as detailed in the indexed research papers.
The main theme of sustainability focuses on balancing technological innovation with environmental impact, emphasizing renewable energy adoption and policy frameworks.

The system retrieves relevant document chunks via LlamaIndex and maintains context with LangChain’s memory. For patterns, see LangChain’s conversational flows.

Step 7: Customizing the QA System

Enhance the system with advanced prompts, additional data sources, or tool integration.

7.1 Advanced Prompt Engineering

Refine the prompt for academic precision.

custom_prompt = PromptTemplate(
    input_variables=["history", "input"],
    template="You are an academic expert. Provide detailed, evidence-based answers using the document context and conversation history:\n\nHistory: {history}\n\nUser: {input}\n\nAssistant: ",
    validate_template=True
)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=custom_prompt,
    verbose=True
)

See LangChain’s prompt templates guide.

7.2 Adding Diverse Data Sources

Load web content with LlamaIndex’s WebBaseReader.

from llama_index.readers.web import SimpleWebPageReader

web_documents = SimpleWebPageReader(html_to_text=True).load_data(
    urls=["https://example.com/research"]
)
documents.extend(web_documents)
index = VectorStoreIndex.from_documents(documents, embed_model="local:BAAI/bge-small-en-v1.5")

SimpleWebPageReader Parameters:

  • html_to_text: If True, converts HTML to plain text.
  • urls: List of URLs to load.

See LangChain’s web loaders.

7.3 Tool Integration with Agents

Add tools like SerpAPI.

from langchain.agents import initialize_agent
from langchain.tools import Tool
from langchain.tools import SerpAPIWrapper

search = SerpAPIWrapper()
tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Fetch current research trends."
    ),
    query_tool
]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True,
    max_iterations=3,
    early_stopping_method="force"
)

response = agent.run("Supplement document findings with recent trends.")
print(response)

initialize_agent Parameters:

  • tools: List of tools.
  • llm: The LLM.
  • agent: Agent type.
  • verbose: If True, logs decisions.
  • max_iterations: Limits steps.
  • early_stopping_method: Stops execution.

See LangChain’s agents guide.

Step 8: Deploying the QA System

Deploy as a Streamlit app for an interactive interface.

import streamlit as st

st.title("Document QA with LlamaIndex and LangChain")
st.write("Ask questions about your documents!")

if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

if prompt := st.chat_input("Ask a question:"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    with st.chat_message("assistant"):
        with st.spinner("Querying..."):
            response = conversation.predict(input=prompt)
            st.markdown(response)
            st.session_state.messages.append({"role": "assistant", "content": response})

Save as app.py, install Streamlit (pip install streamlit), and run:

streamlit run app.py

Visit http://localhost:8501. Deploy to Streamlit Community Cloud by pushing to GitHub and configuring secrets. See LangChain’s Streamlit tutorial or Streamlit’s deployment guide.

Step 9: Evaluating and Testing the QA System

Evaluate responses using LangChain’s evaluation metrics.

from langchain.evaluation import load_evaluator

evaluator = load_evaluator(
    "qa",
    criteria=["correctness", "relevance"]
)
result = evaluator.evaluate_strings(
    prediction="The documents discuss sustainable technology.",
    input="What are the main findings?",
    reference="The documents focus on sustainable technology and energy efficiency."
)
print(result)

load_evaluator Parameters:

  • evaluator_type: Metric type (e.g., "qa").
  • criteria: Evaluation criteria.

Test with diverse queries. Debug with LangSmith per LangChain’s LangSmith intro.

Advanced Features and Next Steps

Enhance with:

See LangChain’s startup examples or GitHub repos.

Conclusion

Integrating LlamaIndex with LangChain creates a powerful QA system for document retrieval and conversational AI. This guide covered setup, indexing, LLM integration, deployment, evaluation, and parameters, empowering you to build advanced applications. Leverage LangChain’s chains, memory, and integrations with LlamaIndex’s indexing capabilities.

Explore agents, tools, or evaluation metrics. Debug with LangSmith. Happy coding!