Introduction to git
git
is an open source tool that allows anyone working with text documents on the computer to keep track of multiple versions of those documents.
If you're working by yourself and you're only writing 5-10 lines of code at a time, you don't need to keep track of multiple versions of your document.
But if you are working on a larger program over some period of time, and especially if you're collaborating with others (students, teachers, developers), you're going to need a system for managing your work. You need version control.
Table of Contents
0. Version Control
If you're writing serious programs, you're going to run into the problem of version control.
It's the same problem that everybody has whenever they're working on a larger body of work: you've created a rough "working version" of this thing—a novel, a painting, a physics solution—and you need to make some changes to improve it. You want to make the changes, but you don't want to lose the good parts of what you've created previously, so you cross out some of the work that you want to change, and add new writing/painting/solutions to make it better. But after making those changes, you re-evaluate and decided that in fact you want to just keep part of the new stuff while going back to what you'd done before...
It gets messy.
Writers have it easiest, perhaps, in that they can work on a word processor and save multiple versions of their document as they go... but even that can get messy. Take a look here at what the contents of the folder myNovel
might look like.
Which one is the most recent one? Which one is the best one? How can I easily compare the writing in one of these documents with the writing in another?
Version control provides you with tools and strategies to manage these issues.
1. Ways to Use git
The local tool git
and the website service GitHub both provide you with the ability to manage multiple versions of your work.
In this tutorial we're going to cover some common workflows and uses of git
that you should use. As we develop each one, you'll get a chance to slowly add to your knowledge of the tool.
The different workflows we'll examine:
- Using
git
on your local machine to track ongoing changes in development of a program and multiple simultaneous versions (branches) of a program's development. - Downloading from a public repository located at GitHub so that you have a copy of a project on your local machine.
- "Cloning" from a public repository that you are a contributor for, so that you can work on the project locally and then push your contributions back up to the project.
- Cloning from a public repository that you are not a contributor for. You'll edit your local copy of the project just the same, and then issue a pull request to the repository admin, requesting that your modifications be considered for inclusion in the main project.
- Creating a public repository of a project of your own that is published and maintained on GitHub. You'll be the administrator of this project, and you'll be responsible for maintaining it, receiving pull requests, etc.
We've got a lot to do! Let's get started...
2. Installing git
Before you can use git
you need to make sure it's installed on your computer.
It may already be installed on your computer. Open up a terminal window and type:
$ git --version git version 2.42.0 $
If git
is not installed, you'll need to follow the instructions here.
2.1. Installing the software
2.1.1. Installing git
on Windows
Download from https://git-for-windows.github.io/, install using the .exe
file, and select default values during installation process. The installation process will take a few minutes.
Once installed, you can launch git
by running the Git Bash application.
2.1.2. Installing git
on macOS
Download from https://git-scm.com/download/mac.
2.1.3. Installing git
on Linux (Debian (Ubuntu))
$ sudo apt-get install git $
2.2. Customizing your installation
From your home directory, you'll want to set some global configurations that git
can use to manage your work.
In the Terminal:
$ git config --global user.name "Richard White" $ git config --global user.email "rwhite@crashwhite.com" $ git config --global core.editor nano $ git config --global color.ui true $ git config --global init.defaultBranch main % git config --global pull.rebase true
(You may want to choose a different editor, like emacs
or vi
.)
Now, every time we start tracking a project with git, or updating a project, or contributing to someone else's project, git will know who we are.
3. Using git for personal version control
Once you've got git
installed on your local machine, it's time to learn a little bit about how version tracking works.
What does git
do?
git
is designed to track changes in text files (source files), the files we use to write programs.
Software projects typically have multiple files organized in folders, or directories. Rather than using a software project to learn the basics of git
, we're going to suppose the following.
Project - StuffForClass
You and a friend have decided to combine forces in taking notes for a class. You'll have a folder called StuffForClass
that contains various materials used for the class, including a text filed called NotesFromClass.txt
which has a copy of each day's notes.
We'll use git
to manage these notes as they develop over time.
3.1. Creating a project and a first commit
- Create a project folder
In whatever location on our computer you want to store this project, create a folder for your project (if it hasn't already been created).% cd ~/Desktop % mkdir StuffForClass
- Move into the project folder.
% cd StuffForClass
- Initialize the project in
git
Use thegit init
command to initialize this project as a git project.% git init Initialized empty Git repository in /Users/rwhite/Desktop/StuffForClass/.git/
At this pointgit
has created a hidden folder in your project directory that it will use in managing your project. You can see this folder...% ls -a
... but don't mess around with any of the files in there. Those files will be managed bygit
itself. - Create a file in the project
Using a text editor (VSCode, nano, etc.) create a preliminary version of theNotesFromClass.txt
file. For our example, do it like this:Notes From Class ---------------- These are the notes from our class. Study them carefully!
Save this file in the project directory. - Create a
.gitignore
file
While we're interested in tracking most of the files in our project folder, not all of them are important for our project.
If you're on a Mac, you'll have hidden.DS_Store
files that are part of the filesystem, not our project. In ourStuffForClass
project let's say we don't want to track Microsoft.docx
files, or PDFs that were added from class. If you use BlueJ for a Java project, there's apackage.bluej
file in there. There may also be.class
files (which contain bytecode), and.ctxt
files (used by BlueJ)... These files all have their purposes, but they're not part of the source code that make up our project, and we don't wantgit
to be tracking them.
Using a text editor, create in the project folder a file called.gitignore
, and list all the files thatgit
should ignore:.DS_Store *.pdf *.docx package.bluej *.class *.ctxt
Save the file in your project. - Add our project files to the "staging area"
Although we've "initialized" our project ingit
, and have a couple of files saved in the folder, we haven't yet instructgit
to save this version of the project. Check the status of the project:% git status On branch main No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) .gitignore NotesFromClass.txt nothing added to commit but untracked files present (use "git add" to track)
The "untracked files" and the red text indicates that we have some files that we need toadd
and thencommit
to our project.
First let'sadd
them. The dot "." in this command indicates to add the files in the current directory.% git add .
Now check the status again.% git status On branch main No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .gitignore new file: NotesFromClass.txt
These files, listed in green, are now in the "staging area" where they will be part of the "commit". - Make a
commit
to the project
When we issue thecommit
command, we need to include a message in double-quotes that will accompany that commit, and reveal what was significant about this version of the project. It's customary to simply label this first version with "First commit".% git commit -m "First commit" [main (root-commit) 71270f1] First commit 2 files changed, 11 insertions(+) create mode 100644 .gitignore create mode 100644 NotesFromClass.txt
- Check the
status
of the project again% git status On branch main nothing to commit, working tree clean
This indicates that we haven't made any changes to our project this the last commit. Git knows everything that's happened up to this point! - View the
log
of the project
We can examine the development of the project over time using thelog
command. This will be an important tool for us as we continue to work on the project.% git log commit 71270f1ec67f2a84e42350c5e76bce131b633a60 (HEAD -> main) Author: Richard White
Date: Tue Jan 2 10:59:46 2024 -0800 First commit
3.2. Working on the project, and a second commit
Continuing with our StuffForClass
simulation, let's say that the first day of class we take a few notes.
- Update the
NotesForClass.txt
file
Use a text editor to modify yourNotesForClass.txt
file so that it looks like this:Notes From Class ---------------- These are the notes from our class. Study them carefully! 2023-12-21 ---------- All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
- Create a new
DueDates.txt
file
Accordingly, use a text editor to create a fileDueDates.txt
that looks like this:Due Dates --------- * January 13 - Final exam
- Check the status of our project
% git status On branch main Changes not staged for commit: (use "git add
We've got a modified file, an untracked file......" to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: NotesFromClass.txt Untracked files: (use "git add ..." to include in what will be committed) DueDates.txt no changes added to commit (use "git add" and/or "git commit -a") git
is letting us know that it hasn't yet made a record of these changes. - Add and Commit the new version of our project
% git add . % git commit -m "Updated notes, added DueDates file" [main 1d2c5a0] Updated notes, added DueDates file 2 files changed, 9 insertions(+) create mode 100644 DueDates.txt
- Check the status, check the log
% git status On branch main nothing to commit, working tree clean (base) rwhite@VingtMille StuffForClass % git log commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main) Author: Richard White
The status is clean—good. More importantly, look at the log. We now have two "commits" of our project, two versions, two different "snapshots" of the state of the project. TheDate: Tue Jan 2 11:28:49 2024 -0800 Updated notes, added DueDates file commit 71270f1ec67f2a84e42350c5e76bce131b633a60 Author: Richard White Date: Tue Jan 2 10:59:46 2024 -0800 First commit HEAD
pointer indicates that we're looking at commit1d2c5a...
, which is on themain
branch of our project.
More on that soon.
3.3. Examine a previous version with checkout
One of the many powers of version control is the ability to go back in time and look at a previous version of your project. You might want to recover a some code that was removed in an update, or maybe you want to go back and restart things from a previous version.
Let's see how you can do that.
- In the Terminal, look at the project's log
% git log commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main) Author: Richard White
Note the two commit hashes identifying the different versions of the project. TheDate: Tue Jan 2 11:28:49 2024 -0800 Updated notes, added DueDates file commit 71270f1ec67f2a84e42350c5e76bce131b633a60 Author: Richard White Date: Tue Jan 2 10:59:46 2024 -0800 First commit HEAD
pointer indicates that we're looking at commit1d2c5a...
, which is on themain
branch of our project.
For reference, do a quickls
on our project directory to identify the files in there.% ls DueDates.txt NotesFromClass.txt
- Use the
checkout
command to switch to a different commit
git checkout 71270f1 Note: switching to '7127'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c <new-branch-name> Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at 71270f1 First commit
This command "checks out" a different version of the project at some other point in time. Do anls
command and you'll see that, in this previous version, there was noDueDates.txt
file, and sure enough, that file is no longer in our project!
Use a text editor to examine the contents of theNotesFromClass.txt
file. It no longer has the notes from the first day of class!
As the notes fromgit
above indicate, you can play around with this previous version of the project as needed. We haven't lost our work. It still exists in the "future version" of our project. Use theswitch
command to return back to the most up-to-date version of our project.% git switch -
Then look atgit status
orgit log
to confirm that you're back to the current state of the project.
3.4. Managing a larger project: Branches
When developing a larger software project, it's often the case that there will be at least two different versions of your code:
- The production version, a possibly public version of the project that works, more or less.
- The development version, typically a private version of the project that is actively being worked on.
A typical development cycle involves working on the development version to fix bugs or add upgrades, and then releasing them as a new production version when everything has been satisfactorily tested.
There may be other versions of a project as well, but we'll stick with these two for now.
Our current StuffForClass
project only has a single version, which is what the main
label designates. This main
branch is the production version of our project. It works, more or less!
In our simulated project, let's assume that we want to consider keeping track of the class notes in a different way: rather than a single file with all of the notes for the course in it, we're going to have a file for each day with just the notes for that day included. We want to try out this new strategy in our project without messing up the current "production" version.
This is what a branch
is for.
Once you know you're going to be trying something out and you want to make sure you don't mess up the good work you've done, create a new branch with an appropriate name (NotesByDay
, maybe?), and then switch over to that branch.
- Make a new branch for the "development" version of the project
% git branch NotesByDay
- List the different branches in this project
% git branch NotesByDay * main
The asterisk * indicates that we are currently on themain
branch.
You can also rungit log
to see which branch theHEAD
is pointing to. WhereverHEAD
is, that's where are work is being done.% git log commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main, NotesByDay) . . .
- Switch to the new development branch and confirm that you're there
% git switch NotesByDay Switched to branch 'NotesByDay' % git branch * NotesByDay main
The asterisk byNotesByDay
indicates that this is our working branch—any work we do will be applied to this branch only.
This new branch of our project is a complete working copy of the main
branch we were working on before—somebody could delete the entire main
branch and we'd still have access to everything in the project, all the way back to the very first commit. So all the things that we could do with the main
branch, we can do with this one, too.
Let's consider the following scenario. We're going to:
- Make some modifications to the project in our
NotesByDay
development branch, and - Make some modifications to the project in our
main
branch, entering some more notes for a new day of class.
3.4.1. Modifying the development branch
- Make sure you are on the
NotesByDay
branch
% git switch NotesByDay Already on 'NotesByDay'
- Make a new directory that will hold each day's notes, and create a file for the one day of notes that we've had so far.
% mkdir DailyNotes
- Use text editor to create a file for the first note in that folder
Call the file2023-12-21.txt
and enter the following information (from the original notes file):All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
- Modify the original
NotesFromClass.txt
file
Modify to remove the note for that day so that the file now says:Notes From Class ---------------- These are the notes from our class. Study them carefully! See files in the DailyNotes directory for notes.
- Check status, then
add
andcommit
these changes
We've made some major changes to the structure of our project (but only in this branch)! You can check the status of the project to see that we have modified and untracked files. Or, from the project's main directory,add
andcommit
the changes we've made.% git add . % git commit -m "Created new directory for daily notes and moved first day's notes to that directory." [NotesByDay 7bbdcab] Created new directory for daily notes and moved first day's notes to that directory. 2 files changed, 5 insertions(+), 4 deletions(-) create mode 100644 DailyNotes/2023-12-21.txt
Check the log and you'll see some interesting things. OurHEAD
is still pointing to our current branch,NotesByDay
, and we're "ahead" of the work that's been done in themain
branch.% git log commit 7bbdcab33d8da7fed065016047d20a800e02709c (HEAD -> NotesByDay) Author: Richard White
Date: Tue Jan 2 14:43:26 2024 -0800 Created new directory for daily notes and moved first day's notes to that directory. commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (main) Author: Richard White Date: Tue Jan 2 11:28:49 2024 -0800 Updated notes, added DueDates file commit 71270f1ec67f2a84e42350c5e76bce131b633a60 Author: Richard White Date: Tue Jan 2 10:59:46 2024 -0800 First commit
3.4.2. Modifying the main branch
Ordinarily the main
"production" branch would not be edited while it was "live." Code should be developed and tested in a non-production branch and then merged.
For this exercise, however, let's depart from best practices and modify the NotesFromClass.txt
file in the main
branch.
- Switch to the
main
branch
% git switch main Switched to branch 'main'
- Modify the
NotesFromClass.txt
file
Use a text editor to modify theNotesFromClass.txt
file so that it reads as follows:Notes From Class ---------------- These are the notes from our class. Study them carefully! 2023-12-21 ---------- All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates. 2023-12-22 ---------- No class today. Instructor sick.
- Add and commit this change to the
main
branch
% git add . % git commit -m "Updated notes for 2023-12-22" [main a24643e] Updated notes for 2023-12-22 1 file changed, 5 insertions(+)
We now have two branches that represent our project. The larger a project is, the more branches it may have as different features are being implemented, but at some point, the features in those branches will typically be incorporated back into the main
branch.
We'll see how to do that next.
3.5. Managing a larger project: Merging
We have two branches in our project, the main
branch and the development NotesByDay
branch. Let's bring our work from the development branch into the main
, a process called merging.
Merging can be hard
The process of merging two different files can sometimes be a little messy. There is no easy way to reconcile two files that have differences in them except by going through and looking at the code. You can make the job a little easier for yourself by making more frequent commits, and integrating your work into the main
branch as you go.
- Make sure both branches are "clean," with no changes to commit
% git switch main % git status On branch main nothing to commit, working tree clean % git switch NotesByDay Switched to branch 'NotesByDay' % git status On branch NotesByDay nothing to commit, working tree clean
- Switch to the branch you want to merge into
In our case, we're going to merge the experimental, development branchNotesByDay
intomain
, so switch tomain
.% git switch main
- Use the
git merge <branchname>
command
% git merge NotesByDay Auto-merging NotesFromClass.txt CONFLICT (content): Merge conflict in NotesFromClass.txt Automatic merge failed; fix conflicts and then commit the result.
In some cases, if you've been careful about only working on a single branch at a time, the auto-merge will work just fine.
In this case, because we had two different branches each with their own commit changing a file,git
needs to know how to proceed.
We have clear instructions from the conflict message:Merge conflict in NotesFromClass.txt... fix conflicts and then commit the result.
. Let's use a text editor to open upNotesFromClass.txt
and see what the issue is. - Use a text editor to examine and fix the file with the conflict
Opening up the file, we see this:Notes From Class ---------------- These are the notes from our class. Study them carefully! <<<<<<< HEAD 2023-12-21 ---------- All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates. 2023-12-22 ---------- No class today. Instructor sick. ======= See files in the DailyNotes directory for notes. >>>>>>> NotesByDay
We know where to focus our efforts:git
indicates where the issue is by adding a<<<<<<< HEAD
at the start of the difficulties, and a>>>>>>> NotesByDay
at the end of the difficulties. Everything above the=======
in that section is in the working branch (that we're trying to merge into), and everything below that is in the branch that we're trying to merge from.
Here, we know that we want to bring in the text below the=======
and get rid of the text above it, so we edit the text file to reflect that, and delete the three lines thatgit
put in there. The file should look like this now:Notes From Class ================ These are the notes from our class. Study them carefully! See files in the DailyNotes directory for notes.
- Then
add
, andcommit
as instructed to complete the merge
% git add . % git commit [main 5a6a50c] Merge branch 'NotesByDay'
- Delete the branch at some point?
If you've just got a few leftover branches in your project you don't need to worry about getting rid of them, especially not right away. You may find that you need to go back and perform some additional work or bug fixes on a branch, for example.
Once you know a branch is no longer needed however, you can easily delete it from the project using the-d
flag.% git branch -d NotesByDay Deleted branch NotesByDay (was 7bbdcab).
3.6. Problems; Troubleshooting
Everybody has difficulties with
git
, and it's not uncommon to need to undo something. Here are some things you can consider if you run into trouble.- Reverting a file
If you've made some minor changes to a file but haven't committed them yet, and just want to revert the file back to its previous state in the last commit, there's a quick way to do that.% git checkout -- <filename>
- Reverting a project
If you need a larger, whole-project reversion to the project's previous state, you can use therevert
command, which removes a previous version of the project and makes a new commit with that version removed.% git revert <commit hash>
Note that all records of the project are retained, and we can go back and re-examine that work as needed. This is the safest way to keep track of what you've done. - Reset a project
The "nuclear option" is to just destroy all the work done in the recent commits to get back to an earlier version of the project. This is a destructive option—no records of recent work will be retained.% git reset --hard <commit hash>
This is not recommended for shared projects—it destroys work done by other people—but it can occasionally come in handy for personal projects. - Search online for help
Thegit
tool is powerful, and searching online for assistance is a common strategy for getting help.
- Reverting a file