[HAI5016] Week 9: Setting up a Data Science Flow
Update VS Code and its extensions to the latest version
- Open the command palette (Ctrl + Shift + P) and type
Check for updates
and press Enter - A notification will appear if there are updates available. Click on the notification to update VS Code
- After updating, restart VS Code to apply the changes
Create a GitHub Repo and clone it to your local PC
- Create a new GitHub Repo for this via the GitHub website. You can find the green button with ‘new’ on the GitHub homepage after being logged in, or simply click here on the link below:
Because we are going to use Python for this, we will use the Python .gitignore
template. Use the following settings:
- Owner: Your GitHub username
- Repository name:
seoul-bike-prediction
- Description:
Predicting the demand for Seoul Bike Sharing
- Select Private (for safety reasons)
- Check Add a README file
- Add a .gitignore file: Python
- Add a license: None
- Click on the green button
Create repository
Clone the repository to your local PC using VS Code
Let’s clone the repository to your local PC using VS Code. Just to be sure, let’s copy the URL of the repository. You can find the URL on the GitHub repository page.
- Click on the green button with
<> Code
and copy the URL (under the Local tab) - In VS Code, open the command palette (Ctrl + Shift + P) and type
Git: Clone
and press Enter - The option to clone a repository from Github will appear, press Enter. If the option does not appear, make sure you have the Git extension installed in VS Code and are logged in to your GitHub account
- After waiting a few seconds, a list with your repositories will appear. Select the repository you just created and press Enter
- Choose a folder on your local PC where you want to clone the repository and press Enter. The Git repository will now be cloned to your local PC
- A dialog will appear asking if you want to open the repository. Click on
Open Repository
to open the repository in VS Code
About the added files
Two files are created in the repository: a README.md
file and a .gitignore
file. The README.md file contains the description of the repository and the .gitignore file contains the files that should not be uploaded to the repository. For demonstration purposes, we will remove the line with .env
from the .gitignore
file. This is not recommended in practice, as the .env file contains sensitive information like passwords and API keys. We will later add the .env file to the .gitignore file again.
- Open the .gitignore file in VS Code and remove the line with
.env
- Save the file
Let’s fill in the readme.md file
We will use the README.md file to describe the purpose and other details of the project. The README.md file is the first thing people see when they visit your repository on GitHub, this goes not only for human visitors but also for search engines and AI agents like CoPilot. Therefore it is important to write a clear and concise description of the project in the README.md file. You can use Markdown to format the text in the README.md file.
VS Code includes some useful snippets that can speed up writing Markdown. This includes snippets for code blocks, images, and more. Press Ctrl+Space (Trigger Suggest) while editing to see a list of suggested Markdown snippets. You can also use the dedicated snippet picker by selecting Insert Snippet in the Command Palette. See Markdown and Visual Studio Code for more information.
First, let’s add the details of the dataset into our README.md file. This information can be found on the Kaggle page of the dataset.
- Go to https://www.kaggle.com/datasets/joebeachcapital/seoul-bike-sharing and read the description of the dataset
- Open the README.md file in VS Code
- Change the title and description of the repository that were set by Github:
1
2
# Seoul Bike Sharing Demand Prediction
Predict demand for shared bikes in Seoul based on various environmental factors
- Add the following information to the README.md file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
## About Dataset
**Data Description**:
The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.
**Attribute Information**:
- Date : year-month-day
- Rented Bike count - Count of bikes rented at each hour
- Hour - Hour of the day
- Temperature-Temperature in Celsius
- Humidity - %
- Windspeed - m/s
- Visibility - 10m
- Dew point temperature - Celsius
- Solar radiation - MJ/m2
- Rainfall - mm
- Snowfall - cm
- Seasons - Winter, Spring, Summer, Autumn
- Holiday - Holiday/No holiday
- Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)
- Open the review tab in VS Code to see the Markdown markup and changes in the README.md file
- Save the file
Commit the changes to the repository
Now that we have made changes to the README.md file, we will commit the changes to the repository. This will save the changes to the repository and create a new version of the repository.
- Open the source control tab in VS Code
- You will see the README.md file in the list of changes
- Enter a commit message in the text box at the top of the source control tab. For example,
Add dataset information to README.md
- Click on the checkmark icon to commit the changes
- Click on the three dots icon and select
Push
to push the changes to the GitHub repository - Go to the GitHub repository page and refresh the page to see the changes in the README.md file
Add a picture header to the README.md file
We will add a picture header to the README.md file to make it more visually appealing. We will use the image at the top of this page as the header image. You can find the image URL by right-clicking on the image and selecting Copy image address
(or Copy Image Link in Edge).
- Open the
README.md
file in VS Code - Add the following line at the top of the file to add the image between the title and description:
1
![Bicycle header](https://camphouse.me/assets/img/HAI5016-week-9-header.jpg)
- Open the review tab in VS Code to see the changes in the README.md file
- Save the file
- Before committing the changes, let Copilot write the commit message for you. Click the two stars in the message box to let Copilot generate summary and description for the commit message
- Check the message and if you are satisfied, click the commit and push button to commit the changes to the repository and push them to GitHub
Create an environment variables file
We will create a file to store the environment variables that we will use in the project. This will make it easier to manage the variables and keep them separate from the code. We will use a .env
file to store the environment variables. For security reasons, we will add the .env
file to the .gitignore
file so that it is not uploaded to the repository.
- Create a new file in the root of the repository called
.env
Add the following environment variables to the
.env
file:1 2 3
# .env file DOWNLOAD_PATH = 'data/' LOGGING_PATH = 'logs/'
- Save the file
- Check the source control tab in VS Code to see the changes (.env file should be in the list of changes)
- Right-click on the
.env
file in the source control tab and selectIgnore File
to add the.env
file to the.gitignore
file - The .gitignore file opens in VS code and should now contain the line
.env
. - Save the file
Ask Copilot on how to start the project
Create a data science project to predict Seoul bicycle rental demand based on weather, holidays, and rainfall data. This project should follow best practices for data science workflows in VS Code, GitHub, and Copilot. Key requirements include:
- Setting up a clear project folder structure for data, notebooks, scripts, and environment configurations
- Code to load and process the latest SeoulBikeData.csv from Kaggle.
- Creating a Jupyter Notebook for data exploration and analysis, with appropriate visualizations and summaries.
- Using an
.env
file for securely storing any API keys or sensitive data, and loading variables as needed. - Utilizing version control on GitHub to document project progress and changes.
Logging every step into a log file for debugging and later automation purposes
Follow standard comments and Copilot suggestions to enhance readability, modularity, and performance.
Sources
- Markdown: Basic writing and formatting syntax
- Markdown and Visual Studio Code
- https://www.kaggle.com/datasets/joebeachcapital/seoul-bike-sharing
- https://github.com/mahin-arvind/Seoul-Bike-Sharing-Demand-Prediction-Capstone-Project/tree/main
- https://medium.com/@dancerworld60/project-title-seoul-bike-sharing-demand-prediction-e1be18f23cbe