Post

[HAI5016] Week 15: Asking dr. Kingo

[HAI5016] Week 15: Asking dr. Kingo

This week will will finally query dr. Kingo with the use of Chainlit, LLamaIndex and the Azure OpenAI API. We will evaluate the answers and discuss how we can improve the system.

Disclaimer: This blog provides instructions and resources for the workshop part of my lectures. It is not a replacement for attending class; it may not include some critical steps and the foundational background of the techniques and methodologies used. The information may become outdated over time as I do not update the instructions after class.

1. Unpauze Supabase

As we found out during previous class, Supabase is currently pausing free-tier projects that are inactive for more than 7 days. No problem, you can simple unpause it (or reactivate your project) from the dashboard within 90 days.


2. Update the database

I have added some extra relevant urls to the excel sheet that is hosted on my website. Go to your GitHub Codespaces overview, open your dr-Kingo Codespace and then take the following steps:

2.1 Update the url-list

Run the url_loader.py script to update the skku_urls table with the latest urls.

You can check details on the progress of the script in the logs/url_loader.log file.

2.2 Run the downloader

Run the downloader.py script to download the new data from the urls and create a new hash for the pages which content has changed.

You can check details on the progress of the script in the logs/downloader.log file.

2.3 Update the repository

I have added and updated some files in the original dr-Kingo repository that you need to pull into your codespace. To do this, open a new terminal in your codespace and run the following commands.

  • Fetch the updates from the original repository:

    1
    
    git fetch upstream
    

    This will show you the updates that are available to merge.

  • Force - Merge the updates into your local branch:

    1
    
    git merge -X theirs upstream/main -m "Class updates Dec 12"
    

2.4 Remove the calendar records

In the skku_md table, look for the records that start with https://www.skku.edu/eng/edu/bachelor/ca_de_schedule.do and delete them.

Remove calendar records

2.5 Create markdowner.py script

Let’s transform the markdown_parser.ipynb from a Jupyter Notebook into a python script, so that we can conveniently run it from the terminal (or using the play button in VS Code) in the future.

Jupyter: Export to Python Script

  • Open the command palette with Ctrl+Shift+P and select Export Python Script. A new tab will open with the python script
  • In the script, change the name for the logfile to logs/markdowner.log:

    1
    2
    3
    
    ## Set up logger
    logger.remove()
    logger.add("logs/markdowner.log", rotation="10 MB")
    
  • Save the file as markdowner.py

Now run the markdowner.py script to parse the markdown files from the HTML files that we have scraped from the SKKU website.


3. Recreate the vector store

Because we made some changes to our prompt and some of the mardown files have changed, we need to recreate the vector store. There are several ways to do this, but to keep it simple today, we wil delete the vector store and run the indexing.ipynb script again.

  • Open the Supabase dashboard and click on your project
  • In the left navigation bar click on Table Editor
  • In the schema selector, select the vecs schema
  • Delete the md_kingo table

Delete the md_kingo table

Now you can run the indexing.ipynb script again to recreate the vector store.


4. Open dr. Kingo in a chainlit app

Now that we have updated the database and recreated the vector store, we can query dr. Kingo with the LLamaIndex and the Azure OpenAI API. We will evaluate the answers and discuss how we can improve the system. To quickly create a graphical user interface for this, we will use the Chainlit library.

  • Open the requirements.txt file and add the following line:

    1
    
    chainlit
    
  • Run the following command in the terminal to install the Chainlit library:

    1
    
    pip install -r requirements.txt
    
  • Finaly, run the chainlit.py script to start the Chainlit app.

    1
    
    chainlit run chainlit.py -w
    

This should automatically open a new tab in your browser with the dr. Kingo Chainlit app:

Hello dr. Kingo

You can now ask dr. Kingo questions and evaluate the answers.

If your browser does not automatically open, you can manually open the app by clicking on the link that is shown in the terminal after running the command: Your app is available at http://localhost:8000.


5. Questions for dr. Kingo

When we take a look at SKKU’s kingobot, it seems that today’s food menu is the most popular question asked by students. So let’s see if our RAG based chatbot can answer this question:

What is on the menu for lunch today?

Great! Seems like we won’t starve today. But what if it’s your first day on campus, and WiFi might be more important to you than food? Well, let’s see if dr. Kingo can help you with that:

How to connect to Wifi?

Questions in a different language

The information that we scraped from the SKKU website is in English. However, it would be nice if dr. Kingo could understand questions in different languages. Let’s see if we can ask dr. Kingo a question in Dutch or Chinese.

Let’s say you are a visiting Dutch student or Professor, and you are curious if you can use your Eduroam account at SKKU. Let’s see if dr. Kingo understands your question in Dutch:

Kan ik hiero Eduroam gebruiken, dr. Kingo?

Or, you are a Chinese gradute student, about to graduate and wanna know quickly when is the deadline for submitting your thesis. Let’s see if dr. Kingo can help you with that:

我需要什么时候给学校提交我的博士毕业论文?


Sources

This post is licensed under CC BY 4.0 by the author.