[HAI5016] Week 11: Hi Claude
Remember I mentioned in previous class that GitHub Copilot will support models from Anthropic, Google, and OpenAI? Well it seems that the time has come.
Disclaimer: This blog provides instructions and resources for the workshop part of my lectures. It is not a replacement for attending class; it may not include some critical steps and the foundational background of the techniques and methodologies used. The information may become outdated over time as I do not update the instructions after class.
1.1 Enable more models in your Github Copilot
Enable more models in your Github Copilot now:
- Go to https://github.com/settings/copilot
- Sroll down to the bottom of the settings page, find the options for more models (e.g. Anthropic Claude 3.5 Sonnet in Copilot) and enable the models.
From today, we can compare the ability of the available models to assit us in VS code on our data (science) projects
1.2 Join waitlist for/check acccess to Github Models
GitHub Models has also entered public preview! GitHub Models provides every GitHub developer with access to top models via a playground, API, and more.
- Check if you have access to https://github.com/marketplace/models
- If not, join the waitlist via https://github.com/marketplace/models/waitlist
Back to Science
Let’s see if we have more luck this time in finishing the bike prediction project by using a little different approach than last time, and by teaming up with Claude 3.5 Sonnet on GitHub Copilot instead of the standard 4o model.
To test whether our new code is better then the ‘old’ code, we’ll create a new branch in our repository. Think of a branch as an extension or a new version of your current codebase. This allows us to experiment without altering or deleting original code.
2.1 Create new branch
- Create new branch called
with-claude
- Check if you are in the new branch
- Delete the previous Notebook
- Create a new notebook
demand-prediction.ipynb
- Compose message, commit, push
- Switch branch to have a look at your ‘old’ notebook in the main branch
- Switch back to the
with-claude
branch
2.2 Call claude
To see if we can enable claude, and make sure that our current workspace is drafted into its context, let’s ask a question about our workspace.
- Open Copilot Chat (
Ctrl+Alt+I
orCMD+alt+I
) - In the chat input box, find the model selector and select the
Claude 3.5 Sonnet model
- Ask the question
@workspace what is this project about?
- Check walk through all the elements of the answer, and check if it is right. You can ask additonal questions like
@workspace What data sources are we using?
What statistical methods do you recommend to answer the problem statement?
Examine the answers and the options in the chat module
2.3 New notebook with Claude
As we learned during class, answers from an LLM are not always consistent. In my case however, I got an answer to approach the problemsolving with the steps below.
Copy and paste this markdown code in the top cell of your Notebook to give the LLM some context:
1 2 3 4 5 6 7 8 9 10 11 12
# Bike Sharing Demand Prediction ## Index 1. Know Your Data 2. Understanding Your Variables 3. EDA 4. Data Cleaning 5. Feature Engineering 6. Model Building 7. Model Implementation 8. Conclusion
- Generate a new cell by clicking the ‘generate’ button, or make a new cell like you’re used to and open the inline Copilot chat (
CRTL+I
orCMD+I
). Ask the following:Load the dataset and show me inconsistencies by checking the shape, information, duplicate values, missing values, etc
- Check the suggested code and results
- Ask for more details on the data:
- Show the names of all the columns
- Describe the data (e.g. counts, mean, std, min, max, etc)
- Match the variable names with the attrubte information in our readme.md
- Count the unique values for each variable
2.4 Feature Engineering
- Let’s rename the features for better readability and usability by follwing consistent naming conventions and to avoid issues with special characters, spaces, or inconsistent capitalization
- Breaking down the date column into year, month, and day for further analysis
- For analysis purpose, create column “session” which groups “hours” into different categories
Explore the rented_bike column
- Get the Minimum and Maximum count of the “rented_bike_count” column
- Visualize the distribution of ‘rented_bike_count’
- Visualize Rented Bike Count vs Session
- Visualize Rented Bike Count Vs Weekday
You can do this analysis on all the features that you’d like to explore.
2.5 Some Analysis
- Transform all categorical values into numerical values for model analysis
- Help me to find which features can predict Rented Bike Count by Regression Analysis of Numerical features
- If it comes with OLS Regression Results, ask Copilot to summarize and explain the results
- Make regression plots of the correlation values above (implement try catch for each feature)
- Make a correlation coefficient heatmap of numerical features
- Regularization Models (Ridge, Lasso, and Elastic Net)
Create the script to split the dataset and use Random Forest Prediction for Split the dataset and Random Forest to train and predict the rented_bike_count Then, describe the model
Don’t forget to commit and push your changes to the with-claude branch
Requirements.txt
To make our code run in other environments, we have to make an inventory of the python packages that are needed in order to run all our code. Lucklily, Github Copilot can do this for us:
- Make sure your most recent files are still open in the workspace
- Open (a new) Copilot chat
- Ask
@workspace draft a requirements.txt
- If the suggested code is to your likings, select the
Insert to new file
under themore actions
menu - Save this new file as
requirements.txt
Github Codespaces
Ever been frustrated because your laptop couldn’t keep up with the speed of the workshop in class? Or wished you could keep coding from a different device and have your code run on the go?
Enter Github Codespaces: the development environment that’s hosted in the cloud. GitHub Codespaces can be customized to your project by adding configuration files to your repository (known as Configuration-as-Code), which creates a repeatable codespace configuration for all users of the project. More info see Github.
Let’s open a Codespace for our latest project:
- In your local seoul-bike-prediction repository (main brach) in VS code:
- Create a new directory
.devcontainer
- Create a new file devcontainer.json
- Paste and save the following JSON code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
{ "name": "Python 3", // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile "image": "mcr.microsoft.com/devcontainers/python:0-3.11-bullseye", "features": { "ghcr.io/devcontainers-contrib/features/coverage-py:2": {} }, "hostRequirements": { "cpus": 2 }, // Use 'postCreateCommand' to run commands after the container is created. "postCreateCommand": "pip3 install --user -r requirements.txt", // Configure tool-specific properties. "customizations": { "vscode": { "extensions": [ "github.copilot", "vsls-contrib.codetour", "ms-python.python" ] } } // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root. // "remoteUser": "root" }
- Create a new directory
- Go to Github, open your repositories page and find the seoul-bike-prediction repository
- From the
<> Code
button launch a new codespace