Post

[HAI5016] Week 11: Hi Claude

[HAI5016] Week 11: Hi Claude

Remember I mentioned in previous class that GitHub Copilot will support models from Anthropic, Google, and OpenAI? Well it seems that the time has come.

Disclaimer: This blog provides instructions and resources for the workshop part of my lectures. It is not a replacement for attending class; it may not include some critical steps and the foundational background of the techniques and methodologies used. The information may become outdated over time as I do not update the instructions after class.

1.1 Enable more models in your Github Copilot

Enable more models in your Github Copilot now:

  1. Go to https://github.com/settings/copilot
  2. Sroll down to the bottom of the settings page, find the options for more models (e.g. Anthropic Claude 3.5 Sonnet in Copilot) and enable the models. Enable more models in GitHub Copilot

From today, we can compare the ability of the available models to assit us in VS code on our data (science) projects

1.2 Join waitlist for/check acccess to Github Models

GitHub Models has also entered public preview! GitHub Models provides every GitHub developer with access to top models via a playground, API, and more.

  1. Check if you have access to https://github.com/marketplace/models

Back to Science

Let’s see if we have more luck this time in finishing the bike prediction project by using a little different approach than last time, and by teaming up with Claude 3.5 Sonnet on GitHub Copilot instead of the standard 4o model.

To test whether our new code is better then the ‘old’ code, we’ll create a new branch in our repository. Think of a branch as an extension or a new version of your current codebase. This allows us to experiment without altering or deleting original code.

2.1 Create new branch

  1. Create new branch called with-claude
  2. Check if you are in the new branch
  3. Delete the previous Notebook
  4. Create a new notebook demand-prediction.ipynb
  5. Compose message, commit, push
  6. Switch branch to have a look at your ‘old’ notebook in the main branch
  7. Switch back to the with-claude branch

2.2 Call claude

To see if we can enable claude, and make sure that our current workspace is drafted into its context, let’s ask a question about our workspace.

  1. Open Copilot Chat (Ctrl+Alt+Ior CMD+alt+I)
  2. In the chat input box, find the model selector and select the Claude 3.5 Sonnet model
  3. Ask the question @workspace what is this project about?
  4. Check walk through all the elements of the answer, and check if it is right. You can ask additonal questions like
    • @workspace What data sources are we using?
    • What statistical methods do you recommend to answer the problem statement? Examine the answers and the options in the chat module

2.3 New notebook with Claude

As we learned during class, answers from an LLM are not always consistent. In my case however, I got an answer to approach the problemsolving with the steps below.

  1. Copy and paste this markdown code in the top cell of your Notebook to give the LLM some context:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
     # Bike Sharing Demand Prediction
    
     ## Index
    
     1. Know Your Data
     2. Understanding Your Variables
     3. EDA
     4. Data Cleaning
     5. Feature Engineering
     6. Model Building
     7. Model Implementation
     8. Conclusion
    
  2. Generate a new cell by clicking the ‘generate’ button, or make a new cell like you’re used to and open the inline Copilot chat (CRTL+I or CMD+I). Ask the following: Load the dataset and show me inconsistencies by checking the shape, information, duplicate values, missing values, etc
  3. Check the suggested code and results
  4. Ask for more details on the data:
    • Show the names of all the columns
    • Describe the data (e.g. counts, mean, std, min, max, etc)
    • Match the variable names with the attrubte information in our readme.md
    • Count the unique values for each variable

2.4 Feature Engineering

  • Let’s rename the features for better readability and usability by follwing consistent naming conventions and to avoid issues with special characters, spaces, or inconsistent capitalization
  • Breaking down the date column into year, month, and day for further analysis
  • For analysis purpose, create column “session” which groups “hours” into different categories

Explore the rented_bike column

  • Get the Minimum and Maximum count of the “rented_bike_count” column
  • Visualize the distribution of ‘rented_bike_count’
  • Visualize Rented Bike Count vs Session
  • Visualize Rented Bike Count Vs Weekday

You can do this analysis on all the features that you’d like to explore.

2.5 Some Analysis

  • Transform all categorical values into numerical values for model analysis
  • Help me to find which features can predict Rented Bike Count by Regression Analysis of Numerical features
  • If it comes with OLS Regression Results, ask Copilot to summarize and explain the results
  • Make regression plots of the correlation values above (implement try catch for each feature)
  • Make a correlation coefficient heatmap of numerical features
  • Regularization Models (Ridge, Lasso, and Elastic Net)
  • Create the script to split the dataset and use Random Forest Prediction for Split the dataset and Random Forest to train and predict the rented_bike_count Then, describe the model

    Don’t forget to commit and push your changes to the with-claude branch


Requirements.txt

To make our code run in other environments, we have to make an inventory of the python packages that are needed in order to run all our code. Lucklily, Github Copilot can do this for us:

  1. Make sure your most recent files are still open in the workspace
  2. Open (a new) Copilot chat
  3. Ask @workspace draft a requirements.txt
  4. If the suggested code is to your likings, select the Insert to new file under the more actions menu
  5. Save this new file as requirements.txt

Github Codespaces

Github Codespaces Logo Ever been frustrated because your laptop couldn’t keep up with the speed of the workshop in class? Or wished you could keep coding from a different device and have your code run on the go?

Enter Github Codespaces: the development environment that’s hosted in the cloud. GitHub Codespaces can be customized to your project by adding configuration files to your repository (known as Configuration-as-Code), which creates a repeatable codespace configuration for all users of the project. More info see Github.

Let’s open a Codespace for our latest project:

  1. In your local seoul-bike-prediction repository (main brach) in VS code:
    • Create a new directory .devcontainer
    • Create a new file devcontainer.json
    • Paste and save the following JSON code
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
         {
         "name": "Python 3",
         // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
         "image": "mcr.microsoft.com/devcontainers/python:0-3.11-bullseye",
         "features": {
         "ghcr.io/devcontainers-contrib/features/coverage-py:2": {}
         },
         "hostRequirements": {
             "cpus": 2
         },
    
         // Use 'postCreateCommand' to run commands after the container is created.
         "postCreateCommand": "pip3 install --user -r requirements.txt",
        
         // Configure tool-specific properties.
         "customizations": {
             "vscode": {
             "extensions": [
             "github.copilot",
             "vsls-contrib.codetour",
             "ms-python.python"
             ]
         }
         }
        
         // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
         // "remoteUser": "root"
    
     }
    
  2. Go to Github, open your repositories page and find the seoul-bike-prediction repository
  3. From the <> Code button launch a new codespace

Sources

This post is licensed under CC BY 4.0 by the author.