Vancouver DataJam Workshop Part 2: Let’s Git Started with GitHub
This workshop is part of the Vancouver DataJam 2021 event. It will equip participants with the basics of Git, a version control software, and how to collaborate with others on GitHub.
There are three parts:
- Part 1 details installation and set up instructions
- Part 2 introduces the concept of version control, Git, and GitHub
- Part 3 contains hands-on exercises where participants will create a repository in GitHub, make their first commit, and become familiar with other common Git commands
Before we start, here are some useful resources
Install Git and create a GitHub account using the instructions in Part 1 of this workshop here.
If you’re looking for a quick refresher on Git commands, check out this Git Cheat Sheet. For solutions to specific scenarios or problems, check out Oh Shit, Git!?! or the profanity-free version Dangit, Git!?!.
What is version control?
Version control is a way to keep track of changes. This allows users to revert back to previous versions or “undo” a change. Keeping track of changes also makes collaboration more efficient by providing a platform to consolidate changes made by various users, identify who made each change, and also resolve conflicts like changes to the same line in a file.
Google Docs may be a familiar example of version control software. It tracks the history of changes made to a document, providing a timeline on which user made each edit.
What is Git?
Git is one of the most commonly used version control softwares, often used by software developers and data professionals to manage code. It is useful for both individual projects to track different iterations and also enables users collaborate on projects.
Files are stored both locally on the user’s computer and remotely in a location like GitHub. Local files can only be accessed by the user while users with adequate permissions can access remote files from any computer. Imagine storing a Word Document on your desktop, this is a local file. Users on another computer cannot access it. An example of a remote location would be Google Drive. Files on Google Drive can be accessed from other computers
Click here to enlarge the image. |
Git terminology
You will come across the following terminology when using Git:
- Branch: A version of the repository that is different from the main project. These are copies of the main branch where experimental changes can be made without affecting the main project.
- Clone: A copy of the repository. To create a local repository, we can clone a remote repository to a local destination.
- Commit: A snapshot of changes that are saved. An accompanying message is typically written to provide context for documentation purposes.
- Conflicts: When changes are made to the same line in the same file by multiple users. These need to be resolved before merging.
- Main branch: Where the working/tested version of the project is stored. Generally, changes are made on other branches which are then merged to the main branch once they are reviewed.
- Merge: Combine two branches to consolidate changes, usually done through a pull request.
- Pull request (PR): When code has been changed on a branch and the user wants to add those changes to another branch, the changes usually need to be reviewed by another user. Users submit pull requests so that others can review their changes before being merged.
- Repository (repo): A directory where the files and folders of a project are stored.
What is GitHub?
GitHub is a Git repository hosting service that allows users to save their project (code, figures, data, documentation etc.) in the cloud and share it with others, much like Google Drive. It makes working together on projects much simpler than traditional file sharing services because changes can be tracked, there are features to perform code reviews, and enables users to work on different copies of the same file before merging the changes back together.
Fun fact - the GitHub logo is a combination of an octopus and a cat named Octocat. It represents “how complex code combines can create peculiar things” accourding to WhiteSource Software.
How are Git and GitHub used?
By a single user
Common reasons why individuals may want to use Git for their own projects include:
- Tracking changes to understand what was done and rationale behind decisions
- Backing up files
- Ability to undo a change
- Organized documentation
- Share and showcase work
By multiple users
In addition to the reasons above, using Git to collaborate with others takes full advantage of the features available such as:
- Ability for more than one person to edit the same file at the same time
- Managing permissions for different users (i.e. read, write, admin)
- Merging changes and conflict resolution
- Reviewing code and writing comments to facilitate discussion
- Branching to work on separate copies of the repository
Example of collaboration
The diagram below shows an example of two people working on the same project:
- User1 and User2 have creates new branches where they are doing their respective work.
- After 2 commits, User1 is finished making their changes and creates a pull request (PR) to merge their work with the main branch.
- User2 reviews the changes and approves the PR, merging User1’s branch (orange) with the main branch.
Note that User2’s branch (green) does not contain the changes that User1 made because their branch was created before the merge. This ensures that they do not overwrite each other’s work while making changes.
Click here to enlarge the image. |
Now you’re ready for Part 3 of the workshop!