[HAI5016] Week 15: Asking dr. Kingo
This week will will finally query dr. Kingo with the use of Chainlit, LLamaIndex and the Azure OpenAI API. We will evaluate the answers and discuss how we can improve the system.
Disclaimer: This blog provides instructions and resources for the workshop part of my lectures. It is not a replacement for attending class; it may not include some critical steps and the foundational background of the techniques and methodologies used. The information may become outdated over time as I do not update the instructions after class.
1. Unpauze Supabase
As we found out during previous class, Supabase is currently pausing free-tier projects that are inactive for more than 7 days. No problem, you can simple unpause it (or reactivate your project) from the dashboard within 90 days.
2. Update the database
I have added some extra relevant urls to the excel sheet that is hosted on my website. Go to your GitHub Codespaces overview, open your dr-Kingo Codespace and then take the following steps:
2.1 Update the url-list
Run the url_loader.py
script to update the skku_urls
table with the latest urls.
You can check details on the progress of the script in the
logs/url_loader.log
file.
2.2 Run the downloader
Run the downloader.py
script to download the new data from the urls and create a new hash for the pages which content has changed.
You can check details on the progress of the script in the
logs/downloader.log
file.
2.3 Update the repository
I have added and updated some files in the original dr-Kingo repository that you need to pull into your codespace. To do this, open a new terminal in your codespace and run the following commands.
Fetch the updates from the original repository:
1
git fetch upstream
This will show you the updates that are available to merge.
Force - Merge the updates into your local branch:
1
git merge -X theirs upstream/main -m "Class updates Dec 12"
2.4 Remove the calendar records
In the skku_md
table, look for the records that start with https://www.skku.edu/eng/edu/bachelor/ca_de_schedule.do
and delete them.
2.5 Create markdowner.py
script
Let’s transform the markdown_parser.ipynb
from a Jupyter Notebook into a python script, so that we can conveniently run it from the terminal (or using the play button in VS Code) in the future.
- Open the command palette with
Ctrl+Shift+P
and selectExport Python Script
. A new tab will open with the python script In the script, change the name for the logfile to
logs/markdowner.log
:1 2 3
## Set up logger logger.remove() logger.add("logs/markdowner.log", rotation="10 MB")
- Save the file as
markdowner.py
Now run the markdowner.py
script to parse the markdown files from the HTML files that we have scraped from the SKKU website.
3. Recreate the vector store
Because we made some changes to our prompt and some of the mardown files have changed, we need to recreate the vector store. There are several ways to do this, but to keep it simple today, we wil delete the vector store and run the indexing.ipynb
script again.
- Open the Supabase dashboard and click on your project
- In the left navigation bar click on
Table Editor
- In the schema selector, select the
vecs
schema - Delete the
md_kingo
table
Now you can run the indexing.ipynb
script again to recreate the vector store.
4. Open dr. Kingo in a chainlit app
Now that we have updated the database and recreated the vector store, we can query dr. Kingo with the LLamaIndex and the Azure OpenAI API. We will evaluate the answers and discuss how we can improve the system. To quickly create a graphical user interface for this, we will use the Chainlit library.
Open the
requirements.txt
file and add the following line:1
chainlit
Run the following command in the terminal to install the Chainlit library:
1
pip install -r requirements.txt
Finaly, run the
chainlit.py
script to start the Chainlit app.1
chainlit run chainlit.py -w
This should automatically open a new tab in your browser with the dr. Kingo Chainlit app:
You can now ask dr. Kingo questions and evaluate the answers.
If your browser does not automatically open, you can manually open the app by clicking on the link that is shown in the terminal after running the command:
Your app is available at http://localhost:8000
.
5. Questions for dr. Kingo
When we take a look at SKKU’s kingobot, it seems that today’s food menu is the most popular question asked by students. So let’s see if our RAG based chatbot can answer this question:
What is on the menu for lunch today?
Great! Seems like we won’t starve today. But what if it’s your first day on campus, and WiFi might be more important to you than food? Well, let’s see if dr. Kingo can help you with that:
How to connect to Wifi?
Questions in a different language
The information that we scraped from the SKKU website is in English. However, it would be nice if dr. Kingo could understand questions in different languages. Let’s see if we can ask dr. Kingo a question in Dutch or Chinese.
Let’s say you are a visiting Dutch student or Professor, and you are curious if you can use your Eduroam account at SKKU. Let’s see if dr. Kingo understands your question in Dutch:
Kan ik hiero Eduroam gebruiken, dr. Kingo?
Or, you are a Chinese gradute student, about to graduate and wanna know quickly when is the deadline for submitting your thesis. Let’s see if dr. Kingo can help you with that:
我需要什么时候给学校提交我的博士毕业论文?