Computer Science

Introduction to git

git is an open source tool that allows anyone working with text documents on the computer to keep track of multiple versions of those documents.

If you're working by yourself and you're only writing 5-10 lines of code at a time, you don't need to keep track of multiple versions of your document.

But if you are working on a larger program over some period of time, and especially if you're collaborating with others (students, teachers, developers), you're going to need a system for managing your work. You need version control.

Table of Contents

  1. Version Control
  2. Ways to use git
  3. Installing git
  4. Using git for personal version control

0. Version Control

If you're writing serious programs, you're going to run into the problem of version control.

It's the same problem that everybody has whenever they're working on a larger body of work: you've created a rough "working version" of this thing—a novel, a painting, a physics solution—and you need to make some changes to improve it. You want to make the changes, but you don't want to lose the good parts of what you've created previously, so you cross out some of the work that you want to change, and add new writing/painting/solutions to make it better. But after making those changes, you re-evaluate and decided that in fact you want to just keep part of the new stuff while going back to what you'd done before...

It gets messy.

Writers have it easiest, perhaps, in that they can work on a word processor and save multiple versions of their document as they go... but even that can get messy. Take a look here at what the contents of the folder myNovel might look like.

Which one is the most recent one? Which one is the best one? How can I easily compare the writing in one of these documents with the writing in another?

Version control provides you with tools and strategies to manage these issues.

1. Ways to Use git

The local tool git and the website service GitHub both provide you with the ability to manage multiple versions of your work.

In this tutorial we're going to cover some common workflows and uses of git that you should use. As we develop each one, you'll get a chance to slowly add to your knowledge of the tool.

The different workflows we'll examine:

  1. Using git on your local machine to track ongoing changes in development of a program and multiple simultaneous versions (branches) of a program's development.
  2. Downloading from a public repository located at GitHub so that you have a copy of a project on your local machine.
  3. "Cloning" from a public repository that you are a contributor for, so that you can work on the project locally and then push your contributions back up to the project.
  4. Cloning from a public repository that you are not a contributor for. You'll edit your local copy of the project just the same, and then issue a pull request to the repository admin, requesting that your modifications be considered for inclusion in the main project.
  5. Creating a public repository of a project of your own that is published and maintained on GitHub. You'll be the administrator of this project, and you'll be responsible for maintaining it, receiving pull requests, etc.

We've got a lot to do! Let's get started...

2. Installing git

Before you can use git you need to make sure it's installed on your computer.

It may already be installed on your computer. Open up a terminal window and type:

$ git --version
git version 2.42.0
$

If git is not installed, you'll need to follow the instructions here.

2.1. Installing the software

2.1.1. Installing git on Windows

Download from https://git-for-windows.github.io/, install using the .exe file, and select default values during installation process. The installation process will take a few minutes.

Once installed, you can launch git by running the Git Bash application.

2.1.2. Installing git on macOS

Download from https://git-scm.com/download/mac.

2.1.3. Installing git on Linux (Debian (Ubuntu))

$ sudo apt-get install git
$

2.2. Customizing your installation

From your home directory, you'll want to set some global configurations that git can use to manage your work.

In the Terminal:

$ git config --global user.name "Richard White"
$ git config --global user.email "rwhite@crashwhite.com"
$ git config --global core.editor nano
$ git config --global color.ui true
$ git config --global init.defaultBranch main
% git config --global pull.rebase true

(You may want to choose a different editor, like emacs or vi.)

Now, every time we start tracking a project with git, or updating a project, or contributing to someone else's project, git will know who we are.

3. Using git for personal version control

Once you've got git installed on your local machine, it's time to learn a little bit about how version tracking works.

What does git do?

git is designed to track changes in text files (source files), the files we use to write programs.

Software projects typically have multiple files organized in folders, or directories. Rather than using a software project to learn the basics of git, we're going to suppose the following.

Project - StuffForClass

You and a friend have decided to combine forces in taking notes for a class. You'll have a folder called StuffForClass that contains various materials used for the class, including a text filed called NotesFromClass.txt which has a copy of each day's notes.

We'll use git to manage these notes as they develop over time.

An overview of the development process

3.1. Creating a project and a first commit

  1. Create a project folder
    In whatever location on our computer you want to store this project, create a folder for your project (if it hasn't already been created).
    % cd ~/Desktop
    % mkdir StuffForClass
  2. Move into the project folder.
    % cd StuffForClass
  3. Initialize the project in git
    Use the git init command to initialize this project as a git project.
    % git init
    Initialized empty Git repository in /Users/rwhite/Desktop/StuffForClass/.git/
    At this point git has created a hidden folder in your project directory that it will use in managing your project. You can see this folder...
    % ls -a
    ... but don't mess around with any of the files in there. Those files will be managed by git itself.
  4. Create a file in the project
    Using a text editor (VSCode, nano, etc.) create a preliminary version of the NotesFromClass.txt file. For our example, do it like this:
    Notes From Class
    ----------------
    
    These are the notes from our class. Study them carefully!
    Save this file in the project directory.
  5. Create a .gitignore file
    While we're interested in tracking most of the files in our project folder, not all of them are important for our project.

    If you're on a Mac, you'll have hidden .DS_Store files that are part of the filesystem, not our project. In our StuffForClass project let's say we don't want to track Microsoft .docx files, or PDFs that were added from class. If you use BlueJ for a Java project, there's a package.bluej file in there. There may also be .class files (which contain bytecode), and .ctxt files (used by BlueJ)... These files all have their purposes, but they're not part of the source code that make up our project, and we don't want git to be tracking them.

    Using a text editor, create in the project folder a file called .gitignore, and list all the files that git should ignore:
    .DS_Store
    *.pdf
    *.docx
    package.bluej
    *.class
    *.ctxt
    Save the file in your project.
  6. Add our project files to the "staging area"
    Although we've "initialized" our project in git, and have a couple of files saved in the folder, we haven't yet instruct git to save this version of the project. Check the status of the project:
    % git status
    On branch main
    
    No commits yet
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
        .gitignore
        NotesFromClass.txt
    
    nothing added to commit but untracked files present (use "git add" to track)
    The "untracked files" and the red text indicates that we have some files that we need to add and then commit to our project.
    First let's add them. The dot "." in this command indicates to add the files in the current directory.
    % git add .
    Now check the status again.
    % git status
    On branch main
                        
    No commits yet
    
    Changes to be committed:
      (use "git rm --cached <file>..." to unstage)
        new file:   .gitignore
        new file:   NotesFromClass.txt
    These files, listed in green, are now in the "staging area" where they will be part of the "commit".
  7. Make a commit to the project
    When we issue the commit command, we need to include a message in double-quotes that will accompany that commit, and reveal what was significant about this version of the project. It's customary to simply label this first version with "First commit".
    % git commit -m "First commit"
    [main (root-commit) 71270f1] First commit
     2 files changed, 11 insertions(+)
     create mode 100644 .gitignore
     create mode 100644 NotesFromClass.txt
  8. Check the status of the project again
    % git status
    On branch main
    nothing to commit, working tree clean
    This indicates that we haven't made any changes to our project this the last commit. Git knows everything that's happened up to this point!
  9. Commands (in red) to produce our first commit on the main branch
  10. View the log of the project
    We can examine the development of the project over time using the log command. This will be an important tool for us as we continue to work on the project.
    % git log
    commit 71270f1ec67f2a84e42350c5e76bce131b633a60 (HEAD -> main)
    Author: Richard White 
    Date:   Tue Jan 2 10:59:46 2024 -0800
    
        First commit

3.2. Working on the project, and a second commit

Continuing with our StuffForClass simulation, let's say that the first day of class we take a few notes.

  1. Update the NotesForClass.txt file
    Use a text editor to modify your NotesForClass.txt file so that it looks like this:
    Notes From Class
    ----------------
    
    These are the notes from our class. Study them carefully!
    
    2023-12-21
    ----------
    
    All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
  2. Create a new DueDates.txt file
    Accordingly, use a text editor to create a file DueDates.txt that looks like this:
    Due Dates
    ---------
    
    * January 13 - Final exam
  3. Check the status of our project
    % git status
    On branch main
    Changes not staged for commit:
      (use "git add ..." to update what will be committed)
      (use "git restore ..." to discard changes in working directory)
        modified:   NotesFromClass.txt
    
    Untracked files:
      (use "git add ..." to include in what will be committed)
        DueDates.txt
    
    no changes added to commit (use "git add" and/or "git commit -a")
    We've got a modified file, an untracked file... git is letting us know that it hasn't yet made a record of these changes.
  4. Add and Commit the new version of our project
    % git add .
    % git commit -m "Updated notes, added DueDates file"
    [main 1d2c5a0] Updated notes, added DueDates file
     2 files changed, 9 insertions(+)
     create mode 100644 DueDates.txt
  5. Check the status, check the log
    % git status
    On branch main
    nothing to commit, working tree clean
    (base) rwhite@VingtMille StuffForClass % git log
    commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main)
    Author: Richard White 
    Date:   Tue Jan 2 11:28:49 2024 -0800
    
        Updated notes, added DueDates file
    
    commit 71270f1ec67f2a84e42350c5e76bce131b633a60
    Author: Richard White 
    Date:   Tue Jan 2 10:59:46 2024 -0800
    
        First commit
    The status is clean—good. More importantly, look at the log. We now have two "commits" of our project, two versions, two different "snapshots" of the state of the project. The HEAD pointer indicates that we're looking at commit 1d2c5a..., which is on the main branch of our project.

    More on that soon.
Two commits on our main branch

3.3. Examine a previous version with checkout

One of the many powers of version control is the ability to go back in time and look at a previous version of your project. You might want to recover a some code that was removed in an update, or maybe you want to go back and restart things from a previous version.

Let's see how you can do that.

  1. In the Terminal, look at the project's log
    % git log
    commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main)
    Author: Richard White 
    Date:   Tue Jan 2 11:28:49 2024 -0800
    
        Updated notes, added DueDates file
    
    commit 71270f1ec67f2a84e42350c5e76bce131b633a60
    Author: Richard White 
    Date:   Tue Jan 2 10:59:46 2024 -0800
    
        First commit
    Note the two commit hashes identifying the different versions of the project. The HEAD pointer indicates that we're looking at commit 1d2c5a..., which is on the main branch of our project.
    For reference, do a quick ls on our project directory to identify the files in there.
    % ls
    DueDates.txt		NotesFromClass.txt
  2. Use the checkout command to switch to a different commit
    git checkout 71270f1
    Note: switching to '7127'.
    
    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by switching back to a branch.
    
    If you want to create a new branch to retain commits you create, you may
    do so (now or later) by using -c with the switch command. Example:
    
      git switch -c <new-branch-name>
    
    Or undo this operation with:
    
      git switch -
    
    Turn off this advice by setting config variable advice.detachedHead to false
    
    HEAD is now at 71270f1 First commit
    This command "checks out" a different version of the project at some other point in time. Do an ls command and you'll see that, in this previous version, there was no DueDates.txt file, and sure enough, that file is no longer in our project!
    Use a text editor to examine the contents of the NotesFromClass.txt file. It no longer has the notes from the first day of class!
    As the notes from git above indicate, you can play around with this previous version of the project as needed. We haven't lost our work. It still exists in the "future version" of our project. Use the switch command to return back to the most up-to-date version of our project.
    % git switch -
    Then look at git status or git log to confirm that you're back to the current state of the project.

3.4. Managing a larger project: Branches

When developing a larger software project, it's often the case that there will be at least two different versions of your code:

  1. The production version, a possibly public version of the project that works, more or less.
  2. The development version, typically a private version of the project that is actively being worked on.
It's funny. IYKYK.

A typical development cycle involves working on the development version to fix bugs or add upgrades, and then releasing them as a new production version when everything has been satisfactorily tested.

There may be other versions of a project as well, but we'll stick with these two for now.

Our current StuffForClass project only has a single version, which is what the main label designates. This main branch is the production version of our project. It works, more or less!

In our simulated project, let's assume that we want to consider keeping track of the class notes in a different way: rather than a single file with all of the notes for the course in it, we're going to have a file for each day with just the notes for that day included. We want to try out this new strategy in our project without messing up the current "production" version.

This is what a branch is for.

Once you know you're going to be trying something out and you want to make sure you don't mess up the good work you've done, create a new branch with an appropriate name (NotesByDay, maybe?), and then switch over to that branch.

  1. Make a new branch for the "development" version of the project
    % git branch NotesByDay
  2. List the different branches in this project
    % git branch
      NotesByDay
    * main
    The asterisk * indicates that we are currently on the main branch.
    You can also run git log to see which branch the HEAD is pointing to. Wherever HEAD is, that's where are work is being done.
    % git log
    commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (HEAD -> main, NotesByDay)
     .
     .
     .
  3. Switch to the new development branch and confirm that you're there
    % git switch NotesByDay
    Switched to branch 'NotesByDay'
    % git branch
    * NotesByDay
      main
    The asterisk by NotesByDay indicates that this is our working branch—any work we do will be applied to this branch only.
A separate branch allows for development without disturbing the main (production) branch/code

This new branch of our project is a complete working copy of the main branch we were working on before—somebody could delete the entire main branch and we'd still have access to everything in the project, all the way back to the very first commit. So all the things that we could do with the main branch, we can do with this one, too.

Let's consider the following scenario. We're going to:

  1. Make some modifications to the project in our NotesByDay development branch, and
  2. Make some modifications to the project in our main branch, entering some more notes for a new day of class.

3.4.1. Modifying the development branch

  1. Make sure you are on the NotesByDay branch
    % git switch NotesByDay
    Already on 'NotesByDay'
  2. Make a new directory that will hold each day's notes, and create a file for the one day of notes that we've had so far.
    % mkdir DailyNotes
  3. Use text editor to create a file for the first note in that folder
    Call the file 2023-12-21.txt and enter the following information (from the original notes file):
    All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
  4. Modify the original NotesFromClass.txt file
    Modify to remove the note for that day so that the file now says:
    Notes From Class
    ----------------
    
    These are the notes from our class. Study them carefully!
    
    See files in the DailyNotes directory for notes.
  5. Check status, then add and commit these changes
    We've made some major changes to the structure of our project (but only in this branch)! You can check the status of the project to see that we have modified and untracked files. Or, from the project's main directory, add and commit the changes we've made.
    % git add .
    % git commit -m "Created new directory for daily notes and moved first day's notes to that directory."
    [NotesByDay 7bbdcab] Created new directory for daily notes and moved first day's notes to that directory.
     2 files changed, 5 insertions(+), 4 deletions(-)
     create mode 100644 DailyNotes/2023-12-21.txt
    Check the log and you'll see some interesting things. Our HEAD is still pointing to our current branch, NotesByDay, and we're "ahead" of the work that's been done in the main branch.
    % git log
    commit 7bbdcab33d8da7fed065016047d20a800e02709c (HEAD -> NotesByDay)
    Author: Richard White 
    Date:   Tue Jan 2 14:43:26 2024 -0800
    
        Created new directory for daily notes and moved first day's notes to that directory.
    
    commit 1d2c5a05a013bfd34c4381699b24fc007e4eeba2 (main)
    Author: Richard White 
    Date:   Tue Jan 2 11:28:49 2024 -0800
    
        Updated notes, added DueDates file
    
    commit 71270f1ec67f2a84e42350c5e76bce131b633a60
    Author: Richard White 
    Date:   Tue Jan 2 10:59:46 2024 -0800
    
        First commit
A new commit on the development branch

3.4.2. Modifying the main branch

Ordinarily the main "production" branch would not be edited while it was "live." Code should be developed and tested in a non-production branch and then merged.

For this exercise, however, let's depart from best practices and modify the NotesFromClass.txt file in the main branch.

  1. Switch to the main branch
    % git switch main
    Switched to branch 'main'
  2. Modify the NotesFromClass.txt file
    Use a text editor to modify the NotesFromClass.txt file so that it reads as follows:
    Notes From Class
    ----------------
    
    These are the notes from our class. Study them carefully!
    
    2023-12-21
    ----------
    
    All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
    
    2023-12-22
    ----------
    
    No class today. Instructor sick.
  3. Add and commit this change to the main branch
    % git add .
    % git commit -m "Updated notes for 2023-12-22"
    [main a24643e] Updated notes for 2023-12-22
     1 file changed, 5 insertions(+)

We now have two branches that represent our project. The larger a project is, the more branches it may have as different features are being implemented, but at some point, the features in those branches will typically be incorporated back into the main branch.

We'll see how to do that next.

Merging the NotesByDay branch into the main branch

3.5. Managing a larger project: Merging

We have two branches in our project, the main branch and the development NotesByDay branch. Let's bring our work from the development branch into the main, a process called merging.

Merging can be hard

The process of merging two different files can sometimes be a little messy. There is no easy way to reconcile two files that have differences in them except by going through and looking at the code. You can make the job a little easier for yourself by making more frequent commits, and integrating your work into the main branch as you go.

  1. Make sure both branches are "clean," with no changes to commit
    % git switch main
    % git status
    On branch main
    nothing to commit, working tree clean
    % git switch NotesByDay
    Switched to branch 'NotesByDay'
    % git status
    On branch NotesByDay
    nothing to commit, working tree clean
  2. Switch to the branch you want to merge into
    In our case, we're going to merge the experimental, development branch NotesByDay into main, so switch to main.
    % git switch main
    
  3. Use the git merge <branchname> command
    % git merge NotesByDay
    Auto-merging NotesFromClass.txt
    CONFLICT (content): Merge conflict in NotesFromClass.txt
    Automatic merge failed; fix conflicts and then commit the result.
    In some cases, if you've been careful about only working on a single branch at a time, the auto-merge will work just fine.
    In this case, because we had two different branches each with their own commit changing a file, git needs to know how to proceed.
    We have clear instructions from the conflict message: Merge conflict in NotesFromClass.txt... fix conflicts and then commit the result.. Let's use a text editor to open up NotesFromClass.txt and see what the issue is.
  4. Use a text editor to examine and fix the file with the conflict
    Opening up the file, we see this:
    Notes From Class
    ----------------
    
    These are the notes from our class. Study them carefully!
    
    <<<<<<< HEAD
    2023-12-21
    ----------
    
    All I can is that this first day of the class had a *lot* of instructions, and nothing really to do afterwards except go home and read the syllabus. "Make sure to pay attention to the due dates," the instructor said. Got it. Adding a separate file with Due Dates.
    
    2023-12-22
    ----------
    
    No class today. Instructor sick.
    =======
    See files in the DailyNotes directory for notes.
    >>>>>>> NotesByDay
    We know where to focus our efforts: git indicates where the issue is by adding a
    <<<<<<< HEAD
    at the start of the difficulties, and a
    >>>>>>> NotesByDay
    at the end of the difficulties. Everything above the
    =======
    in that section is in the working branch (that we're trying to merge into), and everything below that is in the branch that we're trying to merge from.

    Here, we know that we want to bring in the text below the ======= and get rid of the text above it, so we edit the text file to reflect that, and delete the three lines that git put in there. The file should look like this now:
    Notes From Class
    ================
    
    These are the notes from our class. Study them carefully!
    
    See files in the DailyNotes directory for notes.
  5. Then add, and commit as instructed to complete the merge
    % git add .
    % git commit
    [main 5a6a50c] Merge branch 'NotesByDay'
  6. Delete the branch at some point?
    If you've just got a few leftover branches in your project you don't need to worry about getting rid of them, especially not right away. You may find that you need to go back and perform some additional work or bug fixes on a branch, for example.
    Once you know a branch is no longer needed however, you can easily delete it from the project using the -d flag.
    % git branch -d NotesByDay
    Deleted branch NotesByDay (was 7bbdcab).

    3.6. Problems; Troubleshooting

    Everybody has difficulties with git, and it's not uncommon to need to undo something. Here are some things you can consider if you run into trouble.

    1. Reverting a file
      If you've made some minor changes to a file but haven't committed them yet, and just want to revert the file back to its previous state in the last commit, there's a quick way to do that.
      % git checkout -- <filename>
    2. Reverting a project
      If you need a larger, whole-project reversion to the project's previous state, you can use the revert command, which removes a previous version of the project and makes a new commit with that version removed.
      % git revert <commit hash>
      Note that all records of the project are retained, and we can go back and re-examine that work as needed. This is the safest way to keep track of what you've done.
    3. Reset a project
      The "nuclear option" is to just destroy all the work done in the recent commits to get back to an earlier version of the project. This is a destructive option—no records of recent work will be retained.
      % git reset --hard <commit hash>
      This is not recommended for shared projects—it destroys work done by other people—but it can occasionally come in handy for personal projects.
    4. Search online for help
      The git tool is powerful, and searching online for assistance is a common strategy for getting help.