Understanding RAG systems with a practical application using locally executed LLMs

Introduction

One of the latest trending methods in modern NLP is Retrieval Augmented Generation (RAG), which is a recent approach to improve the information retrieval and generation capabilities of large language models. In a RAG system, the models are used to extract relevant information from large datasets and generate coherent responses and insights from them. This is a invaluable technique for several common use cases such as question answering or conversational agents. However, cutting-edge LLMs take vast amount of resources for training and inference and are not usually free to access.

Motivation

One of the most typical applications of RAG is question answering over corporate documents, such as company guidelines, product reviews, or documentation. While these applications are common, I prefer to explore more unconventional data sources that may lead to more engaging projects.

In this article, we will walk through the process of running an LLM locally and using it to build our RAG application. This application will consist of a simple chatbot that can retrieve context from a knowledge base containing documents related to the award-winning videogame Elden Ring.

Background: Elden Ring

Elden Ring is renowned for its intricate and immersive story, with narrative elements such as environmental storytelling, lore and mythos found in item, skill and equipment descriptions; and NPC interactions. Much of this story is left open to the player’s interpretation of cryptic dialogue and obscure descriptions, as well as drawing connections between disparate elements to create a cohesive narrative of the world.

With all that preamble out of the way, let’s create a chatbot that serves as an expert in the lore of Elden Ring and can assist the user in exploring the different connections in the game’s rich storytelling.

A Primer on RAG Systems

Note

If you want to skip the theory and preprocessing, go straight here.

RAG systems combine the strengths of infomation retrieval and natural language to effectively understand and generate human-like responses. Not unexpectedly, the two key components of such systems are retrieval and generation.

Retrieval

In the retrieval step, the system searches its knowledge base for hits matching a user’s query. The knowledge base should be stored in a vector database with embeddings for efficient search with techniques such as semantic similarity measures. The goal of this step is to extract the most relevant pieces of information given the original prompt, and those that could help in generating a coherent response. Once the context information is collected, the generation step can use it for further processing.

Generation

In the generation step, the system feeds the obtained context to an LLM, such as GPT or BERT, that can generate coherent output based on the provided information. Coupling the natural language understanding of the model with the context results in responses that are gramatically correct, contextually relevant and coherent.

Application Design

Before writing the first line of code, let us take some time to design our application and define its scope.

Development Environment Setup

Warning

LLM inference needs a lot of computing power, so make sure your machine can handle it before continuing. For reference, I ran this on a Windows laptop with an 11th Gen i7 chip and 32GB RAM. If you have better or similar specs (or GPU) in your computer, you are probably good to go.

You will need the following stuff in order to run the code in this guide:

LangChain: a Python library that provides a framework for building LLM-powered applications
a large language model file: TheBloke’s HuggingFace page already has many models that are already quantized, lifting some of the process from your machine. Use the 7B or 13B parameter models depending on your system. More on this further below.

Then, let’s create a virtual environment with the necessary requirements:

python -m venv .venv
source .venv/bin/activate
pip install langchain docarray pandas

Prepare the Knowledge Base

In order to feed your LLM of choice with the required context, you will need a knowledge base that can be parsed and incorporated into a vector database. For this guide, I will use the data available from the excellent Elden Ring Explorer project, which in turn comes from the Carian Archive.

import re
import json
import requests
import pandas as pd

data_url = "https://eldenringexplorer.github.io/EldenRingTextExplorer/elden_ring_text.json"
response = requests.get(data_url)

data = response.json()

# Peek at the data format
for key1 in data.keys():
    for key2 in data[key1]:
        print(
            json.dumps(data[key1][key2], indent=2)
        )
        break
    break

{
  "name_en": "Petition for Help",
  "name_jp": "\u6551\u63f4\u306e\u8acb\u9858\u66f8",
  "info_en": "Summons Stalker to face invading Broken Finger",
  "info_jp": "\u6f70\u308c\u6307\u306b\u4fb5\u5165\u3055\u308c\u305f\u6642\u3001\u6f70\u308c\u72e9\u308a\u3092\u6551\u63f4\u53ec\u559a\u3059\u308b",
  "caption_en": "Online multiplayer item. Receipt of a plea for\r\nhelp to the maidens of the Finger Reader.\r\n\r\nSummons a Broken Finger Stalker from another\r\nworld to face an invading Broken Finger.\r\n\r\nMaidens of the Finger Reader speak in hushed\r\ntones about the loathsome, traitorous Broken\r\nFingers and the dangers of their base invasions.",
  "caption_jp": "\u30aa\u30f3\u30e9\u30a4\u30f3\u30d7\u30ec\u30a4\u5c02\u7528\u30a2\u30a4\u30c6\u30e0\r\n\u6307\u8aad\u307f\u306e\u5deb\u5973\u305f\u3061\u306b\u8acb\u9858\u3057\u305f\u8a3c\r\n\r\n\u6f70\u308c\u6307\u306b\u4fb5\u5165\u3055\u308c\u305f\u6642\r\n\u4ed6\u4e16\u754c\u304b\u3089\u6f70\u308c\u72e9\u308a\u3092\u6551\u63f4\u53ec\u559a\u3059\u308b\r\n\r\n\u6307\u8aad\u307f\u306e\u5deb\u5973\u306f\u3001\u58f0\u3092\u6f5c\u3081\u8a9e\u308b\u3060\u308d\u3046\r\n\u6f70\u308c\u6307\u306e\u3001\u553e\u68c4\u3059\u3079\u304d\u88cf\u5207\u308a\u3068\r\n\u5351\u52a3\u306a\u4fb5\u5165\u306e\u5371\u3046\u3055\u3092\r\n"
}

Quick and dirty data engineering

The keys in the JSON object are categories, and every category contains nested dictionaries that describe a document. This dataset contains many fields that are not relevant to the task, so let us clean the data and prepare it so that we can give the chatbot a reliable knowledge base.

categories = [c for c in data.keys()]

category_data = []
for c in categories:
    df = pd.DataFrame(data[c]).T
    df["category"] = c
    category_data.append(df)

df = pd.concat(category_data).reset_index()
df.head()

	index	name_en	name_jp	info_en	info_jp	caption_en	caption_jp	category	effect_en	effect_jp	dialog_en	dialog_jp	type	form	id
0	100	Petition for Help	救援の請願書	Summons Stalker to face invading Broken Finger	潰れ指に侵入された時、潰れ狩りを救援召喚する	Online multiplayer item. Receipt of a plea for...	オンラインプレイ専用アイテム\r\n指読みの巫女たちに請願した証\r\n\r\n潰れ指に侵入...	accessories	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	101	Broken Finger Stalker Contract	潰れ狩りの誓約書	Be summoned to worlds invaded by Broken Fingers	潰れ指に侵入された世界に救援召喚される	Online multiplayer item. Record of contract wi...	オンラインプレイ専用アイテム\r\n指読みの巫女たちと誓約した証\r\n\r\n他プレイヤー...	accessories	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	1000	Crimson Amber Medallion	緋琥珀のメダリオン	Raises maximum HP	ＨＰの最大値を上昇させる	A medallion with crimson amber inlaid.\r\nBoos...	緋色の琥珀が嵌めこまれたメダリオン\r\nＨＰの最大値を上昇させる\r\n\r\n琥珀とは、...	accessories	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	1001	Crimson Amber Medallion +1	緋琥珀のメダリオン＋１	Greatly raises maximum HP	ＨＰの最大値を大きく上昇させる	A medallion with crimson amber inlaid.\r\nGrea...	緋色の琥珀が嵌めこまれたメダリオン\r\nＨＰの最大値を大きく上昇させる\r\n\r\n琥珀...	accessories	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	1002	Crimson Amber Medallion +2	緋琥珀のメダリオン＋２	Vastly raises maximum HP	ＨＰの最大値を、とても大きく上昇させる	A medallion with crimson amber inlaid.\r\nVast...	緋色の琥珀が嵌めこまれたメダリオン\r\nＨＰの最大値を、とても大きく上昇させる\r\n\r...	accessories	NaN	NaN	NaN	NaN	NaN	NaN	NaN

We will focus only on the English texts and keep some other metadata. Moreover, only some of the categories contain documents that might help us give the chatbot the necessary knowledge base:

keep_columns = [
    "id",
    "index",
    "name_en",
    "info_en",
    "caption_en",
    "dialog_en",
    "type",
    "form",
    "category",
]

df = df[keep_columns]

Note

In an earlier version of this application, I chose to remove some categories such as those containing system messages and mechanic descriptions unrelated to the lore. However, upon testing the system, I found that some documents in those categories did contain relevant snippets of information.

Take these nuances into account when developing your prototypes!

The data cleaning process will be as follows:

Remove rows with null values in the name_en column.
Rename this column as the document title.
Concatenate info_en and caption_en.
Coalesce caption_en and dialog_en.

After this, the resulting data will be nicely transformed into a simpler format containing only the document title, its content and category.

df = df.dropna(subset="name_en")
df["title"] = df["name_en"]

df["info_caption"] = df["info_en"].fillna("") + ". " + df["caption_en"].fillna("")
df["description"] = df["info_caption"].combine_first(df["dialog_en"])

Some of the text strings in the description column contain extraneous characters and unwanted tokens. Let’s clean those too:

Some of the descriptions contain tagged IDs, e.g. [9000010]
Some descriptions contain the token (dummyText)
There are some duplicated descriptions in the case of upgraded items (e.g “Black Knife Tiche +4” and “Black Knife Tiche +5” have the same description.)

def process_text(df: pd.DataFrame, col_name: str):
    df.loc[:, col_name] = [re.sub(r'\[\d+\]', '', x) for x in df[col_name]]
    df.loc[:, col_name] = [re.sub(r'\(dummyText\)', '', x) for x in df[col_name]]
    df = df.drop_duplicates(subset=[col_name])

    return df

df = process_text(df, "description")

Finally, select the relevant columns and save the resulting dataframe. Let’s also run a few sanity checks to spot leftover nulls or duplicates:

output = df[[
    "title",
    "description",
    "category",
]]

output.isnull().sum()

title          0
description    0
category       0
dtype: int64

output["description"].duplicated().sum()

Nice! The documents are now clean and easy to parse. To make it easier to use the data in other tasks later on, I’ll save it again as a JSON/dictionary. This way, it can be smoothly integrated into different processes, especially when the data processing and chatbot logics are kept separate.

data = output.to_dict(orient="records")

Build the RAG system

Now onto the cool(er) stuff!

Let’s summarize how a RAG system works in three (very) simplified steps:

The user prompts the system with a question.
The question is matched against the knowledge base, which is already transformed through embeddings in the vectorstore.
The returned context is fed to the LLM which can now return an informed response based on the given context.

The first thing we will need to work on is how to store the documents so that the system can find them in the efficiently, and semantically. Using embeddings and vectorstores with LangChain is almost trivial since they are mostly plug-and-play.

Retrieval Setup

Let’s import the libraries needed for this step, mostly related to LangChain:

from langchain.docstore.document import Document
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

There is a JSONLoader() class available in LangChain, but I found it easier to just create the individual documents since they are not many and we purposefully prepared it in a tidy format. For each record in the data, a document is created containing the description field, while the title and category are kept as metadata:

docs = []
for record in data:
    new_document = Document(
        page_content=record["description"],
        metadata={"title": record["title"], "category": record["category"]},
    )
    docs.append(new_document)

docs[42]

Document(page_content='Boosts dexterity, raises attack power with successive attacks. Part of the golden prosthesis used by Millicent.\r\nThe hand is locked into a fist that once raised a sword aloft.\r\n\r\nBoosts dexterity and raises attack power with successive attacks.\r\n\r\nThe despair of sweet betrayal transformed Millicent from a mere bud into a magnificent flower. And one day, she will be reborn—as a beautiful scarlet valkyrie.', metadata={'title': "Millicent's Prosthesis", 'category': 'accessories'})

Next, we need to use pretrained embeddings to transform the documents when loading them into the vectorstore. I’ll just use the HuggingFace embbedings but there are many other options available in LangChain.

embeddings = HuggingFaceEmbeddings()

db = DocArrayInMemorySearch.from_documents(docs, embeddings)
retriever = db.as_retriever(search_type="mmr", search_kwargs={"k": 5});

Once the documents are vectorized, they are loaded into the vectorstore which we can use as a retriever for our chatbot chain. Some insight into the parameters chosen here:

search_type="mmr": stands for Maximum Marginal Relevance, a search measure that aims to reduce redundancy and increase diversity in search results
k=5: the number of hits to return, using 5 in this case to avoid retrieving possibly unrelated context or surpassing the input token limit

Behind the scenes, the retriever will perform similarity search and return the results to the chat agent:

db.similarity_search("Black Knife", search_type="mmr", k=5)

[Document(page_content='Gauntlets used by the Black Blade Assassins. Gauntlets used by the Black Knife Assassins.\r\nCrafted with scale armor that makes no sound.\r\n\r\nThe assassins that carried out the deeds of the Night of the Black Knives were all women, and rumored to be Numen who had close ties with Marika herself.', metadata={'title': 'Black Knife Gauntlets', 'category': 'protector'}),
 Document(page_content='. Dagger once belonging to one of the assassins who murdered Godwyn the Golden on the Night of the Black Knives.\r\n\r\nA ritual performed on the oddly misshapen blade imbued it with the power of the stolen Rune of Death.', metadata={'title': 'Black Knife', 'category': 'weapon'}),
 Document(page_content='Simple map showing location of black knifeprint\r\nExamine using <?keyicon@31?>. A simple map given by Fia.\r\n\r\nA clue to the whereabouts of a black knifeprint.', metadata={'title': 'Knifeprint Clue', 'category': 'goods'}),
 Document(page_content='. Dagger with a bloodstained blade.\r\nAfflicts targets with blood loss.\r\n\r\nAs blood darkened the dagger through repeated slashing and stabbing, its blade only grew sharper and harder.', metadata={'title': 'Bloodstained Dagger', 'category': 'weapon'}),
 Document(page_content='Armor used by the Black Blade Assassins. Armor used by the Black Knife Assassins.\r\nCrafted with scale armor that makes no sound.\r\n\r\nThe assassins that carried out the deeds of the Night of the Black Knives were all women, and rumored to be Numen who had close ties with Marika herself.', metadata={'title': 'Black Knife Armor (Altered)', 'category': 'protector'})]

LLM Setup

Language models are usually tuned so that they excel at specific tasks, such as text summarization or translation. In this case, we would need a question answering model for optimal results, but you can also use general models such as GPT.

Since we are aiming to use a local LLM, there are some extra steps we need to take:

Run the LLM locally

1. Download the GGUF model file

GGUF files are already quantized which helps the model speed up inference. For this project, I used WizardLM-13B but you can use smaller models such a the 7B version which will run faster but perform worse.

As for the quantization level, Q4_K_M is a good option from my limited experience, because the task doesn’t require a very high degree precision and correctness, so lower quant levels may be acceptable. For other tasks that might require more precision such as coding, higher quants (or none at all) should be used.

2. Compile the model

The LlamaCpp class allows for the model to be loaded and easily interfaced with other LangChain components:

from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path="../../../sandbox/erdbot/models/WizardLM/wizardlm-7b-v1.0-uncensored.Q4_K_M.gguf",
    temperature=0.3,
    max_tokens=4096,
    n_ctx=2048,
);

Alternative: Use the Hugging Face Inference API

If you don’t have enough computing resources, you can substitute the LLM component with the ChatHuggingFace class and use the free (and rate-limited) Inference API provided by HF.

You can also use any other API keys if you have access to other AI providers such as OpenAI or Azure OpenAI. Just use their the corresponding LangChain interfaces and swap the component in the chain. This guide will only cover the local and Inference API cases.

# Not executed: alternative to the local LLM
# Note that in this case, the chain expects a ChatModel object and not a text LLM
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chat_models import ChatHuggingFace


llm = HuggingFaceEndpoint(
    repo_id="deepset/roberta-base-squad2",
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
    model_kwargs={"max_length": -1}, # unlimited response length to avoid truncated answers
)

chat_model = ChatHuggingFace(llm=llm)

You can choose from the many models available on the HF Hub, instead of being limited to only quantized models such as in the local case. For this example, I just used roberta-base for QA.

Design the Language Chain

There are three main components for a chain: the prompt template, the language model, and (optionally) an output parser.

LangChain’s API can be somewhat obtuse at times, especially when building chains with multiple arguments, so I’ll try my best to explain what’s going on.

Prompt Design and Engineering

Some research (i.e. Reynolds & McDonell, 2021) on LLMs shows that a carefully crafted and directed prompt can yield better results. In fact, a whole new subfield of prompt engineering is emerging, contributing to the state of the art with methods such as few shot learning, chain-of-thought and self-reflexion.

Furthermore, prompt design seems to be a very iterative process. As you will discover when developing your own applications, it often takes a good few tries to create a prompt that consistenly generates the desired outputs. After some time tweaking the template, I came up with the following prompt for our loremaster chatbot:

from langchain.prompts import ChatPromptTemplate

template = """
You are an expert historian studying the lore of an ancient civilization. 
To answer the user's question, use the following context:
{context}

Only include contextual information that not relevant to the user's question in your answer.
If you can't infer an answer based on the provided context, explicitly say so. 
Do not invent or hallucinate your responses but try to find likely relationships and connections among the documents.
Be concise but thorough, and use no more than 5 sentences in your response.

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

Some things to note:

The prompt describes the role the chat agent should take. Other applications might need different agents roles such as “a helpful assistant” or “a teacher grading a math submission”. These should be tweaked based on your use case.
I tried to find a balance between making likely connections between semantically unrelated documents and avoid hallucinations.
Notice the f-string formatted variables {context} and {question}, which are a placeholder for the prompt inputs.

Chains with LCEL

The next step is to build the actual language chain that will mesh together all the components of the system. In this example, I will be using the LangChain Expression Language (LCEL) that allows the usage of the pipe (|) operator to chain operators and enhance readability.

from langchain_core.runnables import RunnableParallel
from operator import itemgetter

context = itemgetter("question") | retriever
answer = prompt | llm

chain = {
    "context": context,
    "question": itemgetter("question"),
} | RunnableParallel({"answer": answer, "context": context})

context = itemgetter("question") | retriever: the context is the result of sending the question to the vector database (the retriever) and getting document matches back
"question": itemgetter("question"): this simply grabs the question (the user input) from the prompt
answer = prompt | llm: the answer is the result of passing the prompt through the LLM
RunnableParallel({"answer": answer, "context": itemgetter("context")}): the answer is the result of passing the prompt through the LLM, and the context is the one obtained in the first step

Here, the input to the prompt is expected to be a dictionary with the keys “context” and “question”. The user input is just the question, so we need to get the context using our retriever and passthrough the user input under the “question” key.

I’m not excessively proficient on LangChain so I’m sure there are better ways to write this chain to enhance readability. If you know of any potential improvements, feel free to let me know!

response = chain.invoke({"question": "Who were the Black Knives?"})
response


llama_print_timings:        load time =    1175.93 ms
llama_print_timings:      sample time =      12.09 ms /    58 runs   (    0.21 ms per token,  4797.35 tokens per second)
llama_print_timings: prompt eval time =   77701.33 ms /   687 tokens (  113.10 ms per token,     8.84 tokens per second)
llama_print_timings:        eval time =   11026.84 ms /    57 runs   (  193.45 ms per token,     5.17 tokens per second)
llama_print_timings:       total time =   88983.37 ms /   744 tokens

{'answer': '\nAnswer: The Black Knives were a group of assassins who were rumored to be Numen who had close ties with Marika herself during the Night of the Black Knives. They were all women and were responsible for carrying out deeds during that eventful night.',
 'context': [Document(page_content='Gauntlets used by the Black Blade Assassins. Gauntlets used by the Black Knife Assassins.\r\nCrafted with scale armor that makes no sound.\r\n\r\nThe assassins that carried out the deeds of the Night of the Black Knives were all women, and rumored to be Numen who had close ties with Marika herself.', metadata={'title': 'Black Knife Gauntlets', 'category': 'protector'}),
  Document(page_content=". Unique curved sword, notched like shark's teeth.\r\nWeapon carried by corpse pillagers who prowl the sites of old battles.\r\n\r\nThe blade is tacky with blood and covered in hefty nicks, making it totally uneven. Life can be sinister indeed.", metadata={'title': "Scavenger's Curved Sword", 'category': 'weapon'}),
  Document(page_content='Throw fanned-out knives at enemies to inflict damage. A set of five throwing knives bundled together.\r\nA concealed weapon cherished by the raptor assassins.\r\n\r\nThe thin knives fan out when thrown, dealing damage to the target.\r\n\r\nEach knife deals paltry damage, but the wide range makes it suitable for constraining enemies.', metadata={'title': 'Fan Daggers', 'category': 'goods'}),
  Document(page_content="Mark of the Night of the Black Knives ritual. On the Night of the Black Knives, someone stole a fragment of Death from Maliketh, the Black Blade, and imbued its power into the assassins' daggers.\r\n\r\nThis mark is evidence of the ritual, and hides the truth of the conspiracy.", metadata={'title': 'Black Knifeprint', 'category': 'goods'}),
  Document(page_content='. Curved greatswords of black steel wielded by General Radahn.\r\nA pair of weapons decorated with a lion mane motif.\r\n\r\nRadahn earned considerable renown as the Starscourge in his youth, and it is said that it was during this time he engraved the gravity crest upon these blades.', metadata={'title': 'Starscourge Greatsword', 'category': 'weapon'})]}

Let’s take a look at our output here:

Timings: these are not very interesting right now, but they can be useful when attempting to optimize your application. For example, this interaction had a total response time of over 20 seconds, which is probably not great for a production application.
Output: in this chain, I chose to output both the answer and the context so we can analyse the information that influenced the LLM’s response.

This is the answer an end user would get:

print(
    response["answer"].replace(". ", ".\n")
)

Answer: The Black Knives were a group of assassins who carried out the Night of the Black Knives, a secretive event that occurred in the past.
They were rumored to be Numen who had close ties with Marika herself.
The assassins were all women and were known to be skilled in combat and stealth.
They were also known to be equipped with unique weapons such as the Black Knife Gauntlets and Scavenger's Curved Sword.
The Night of the Black Knives was a conspiracy that involved stealing a fragment of Death from Maliketh, the Black Blade, and imbuing its power into the assassins' daggers.
The Mark of the Night of the Black Knives ritual was also performed on this night, and it is believed that this ritual hides the truth of the conspiracy.

The results are somewhat sensible, but take a closer look at the following statement:

They were also known to be equipped with unique weapons such as the Black Knife Gauntlets and Scavenger’s Curved Sword.

This mentions the Scavenger’s Curved Sword. However, if you read the document from the context that originated this fragment, you will see that it has no relation to the Black Knives at all. In these cases, consider the following methods:

lower the k value used in similarity search to reduce the number of relevant results
change the search_type argument depending on your needs, since in the example there are many other documents containing the exact substring “Black Knives” that do not appear in the context due to using maximum marginal relevance, which penalizes redundancy

This shows that domain knowledge may be useful when evaluating and debugging these systems. So, while we’re on the topic:

Optimization and Advanced Features

This is the very basics of a RAG architecture. There’s not much to do with it while it’s confined to a notebook, so here are a few ways you can take it a step further:

Prompt optimization: aside from fine-tuning your prompt manually, you can try letting the LLM write it for you! Automatically generated prompts¹ have proven to perform slighly better than some hand-tuned prompts.
Evaluation: use the RAG triad of measures to assess the performance of your application. One simple and straight-forward way to do it is to ask the LLM itself (or another one prepared for an evaluation task) to grade the generated response based on the provided context.
Deployment: so far, Streamlit is the easiest way I’ve found to interact with the system in a chat-like enviroment. Check out this short guide.
Scalability: in a laptop or any other mid-tier machine, the response times of the LLM will probably be quite high. Consider using smaller models, or cloud instances for hosting your application in any of the commercial hyperscalers.

¹ Battle, R., & Gollapudi, T. (2024). The Unreasonable Effectiveness of Eccentric Automatic Prompts

New and exciting stuff is coming to light every few days in this field, so keep an eye open and you’re sure to find new improvements for your application.

Conclusion

That’s it! Your very own RAG application now lives and runs on your computer, as long as it doesn’t spontaneously combust during the inference process.

Feel free to test it with your own data as well, since LangChain offers different loader classes to read from webpages, PDF documents and the like. The advent of somewhat easily accesible LLMs opens up a myriad possibilities for new projects and ideas. It is a constantly evolving landscape and you can get creative with stuff like chatbots, research assistants, or content generators. The skills from this guide are fairly basic but building on them can take your NLP game to the next level.

Now go forth and deploy something cool with RAG tech!