Published on

How to Git

Authors
  • avatar
    Name
    Linell Bonnette
    Twitter

If you've got any experience with Git, you know that it can really live up to its name. Sometimes things just don't seem to work like you think they should. Sometimes stuff just plain doesn't work. Sometimes you don't even know what words to Google to figure out what the problem is.

This is my attempt on a brief explation on how Git really works and how you can use it.

How Does Git Work?

You open up your terminal. You head over to your brand spanking new directory and fire off a git init command. Boom. Done.

Wait. What just happened?

Everything Git does is inside of a repository. A repository, often just called a repo, is just a directory that contains a set of commit objects and references to these objects. This directory, .git is stored in the same directory as the project that you're tracking. The git init command creates this .git direcory and initializes it based on your system defaults. All of Git's tracking for the project takes place right there in that .git direcory. There is no central server repository. It's all locally stored on your machine. "Wait. What the heck is GitHub, then?" you're probably asking yourself. I'll get to that soon. Promise. So everything is stored locally and we're only storing commit objects and references to those objects. What are those objects, though? A commit object is comprised of three things:

  1. The files that you're tracking. These reflect the state of the project at a given point in time.
  2. References to parent commit objects.
  3. A SHA-1 name that identifies the object.

The files are all stored as a blob. The blob is just what it sounds like -- a blob of data. Pretty much just a file. Just like in a UNIX file system, you must have a way to organize your files. In Git, this is accomplished through the use of a tree, which acts a lot like a normal directory does. The tree object contains one or more tree entries, each of which contains a reference (via the SHA-1 name) to either a blob or a subtree. If you're having trouble understanding this, just think of a series of folders that go down to a file at the end. Boom. That's it.

Now that we know what the commit objects are, lets look at the references to those objects. We call these 'heads.' Each head has a name. The default head that comes with every repo is called master. git init creates this head for you. Whenver you're using Git, there is always a head that is currently being used as the 'main head.' It's aliased to HEAD, which is always caps locked. The all caps difference is important, because lowercase 'head' can refer to any of the named heads in the repo while 'HEAD' refers only to the currently active head. Whenever you're looking over Git documentation, they'll use that notation and it can be confusing if you don't note the difference. These heads function to simply refer to a commit specific commit object through that objects SHA-1 name. This seems overly simple, but in fact Git is, at its core, a simple key-value data store. You can insert any data into it, and it'll give you a key that you can use to retrieve the content again at any time. This happens for every file you're tracking.

All of that is cool. Really cool. Buuuuuut how is it useful?

There are commands that you can use to manually create commit objects and heads. All sorts of nifty little commands to do various things. But you don't really use those. You're going to use the commands built on top of those commands. The commands you'll be using are a lot more friendly, now that you know what's really happening.

Friendly Commands?

We'll go over a few of the more basic commands here, explaining what they do in relation to the information above.

Hopefully you remember that git init creates the .git direcory and initalizes the repository. It's still empty at this point, though. There is one head, which is HEAD, named master. It doesn't currently point to anything, though. It's just empty.

You need to create a commit object for master to point to. You're probably thinking, "Oh, I know! git commit!" Nope, sorry. You're close, though. git commit does create a commit object. Just throwing the command around willy nilly isn't going to do any good, though. You see, git commit has absolutely no idea what you want it to do. You have to let it know what it's supposed to be adding to the commit object. So, to tell it what to add, you use git add <stuff_to_add>. So, if I want to just add everything in the directory, I can do git add *. Everything is now added to a sort of 'pending' queue to be committed. Now when you run git commit everything that you've added will be put into your commit object, which will then be referenced by the HEAD that you're currently on.

Note: I usually use git add ., because it has the benefit of adding everything except what is in my .gitignore file, which I'll touch on later.

You're usually not going to want to add everything at once, though. You'll usually want to just add one or two files that you've modified. To do this, you can use git commit -a, which will add everything that you've modified, but not files that you've created (or added -- anything that isn't already being tracked by Git). I'd like to point out that you can create a commit wihout adding all of the modified changes. The file as it sits in your current directory won't be changed, but the changes you've made since the last commit will not be present in the commit object.

Whenver you create a new commit, you'll be asked to provide a message to go along with the commit. It's best practice to put in any relevant information. Stuff like why you changed what you changed. Maybe reference any bugs it fixed or enhancements it made. These messages can be super important, so do yourself a favor and create decent commit messages.

Now that you know how to commit code, you honestly know the most important part. You know enough that you can use Git. You won't get a ton out of it, though.

To get more out of it, you'll obviously need some POWER!

One powerful thing you're probably already thinking about is how big of a pain it's going to be knowing what is and isn't going to be added to a commit. git status is your best friend. When you run it, it will list out all of the files that have been changed. This includes everything from removing a comment from source code to adding a file to deleting a file. You can use this to know what files to add or remove from the commit.

The next obvious thing is how the heck can you see previous commits? If, for example, you can't remember why you made a change but you know you described that change in your commit message, you'd want to go back and look that commit over. You can do this via git log, which lists out all of your previous commits.

You're staring at the log and (hopefully) notice that SHA-1 name of each commit. Nifty. Let's say that since you can't remember why you made that change earlier, you want to hop back to where you were but retain the ability to come back to where you are now, you can use git checkout <name>. Boom. You're there. What if you just want to completely roll back the repository, though? You made a bunch of changes that you suddenly decide you hate and want to just get rid of. You can use git reset --hard <name and they'll all magically disappear. There's a bunch of magic that can go on with rolling back and reverting to commits. It can be a bit brain wrinkling. I recommend checking out this page for a pretty good explanation of which commands to use when.

You can do a lot of things with Git now. You can commit things. You can go back and see what you've done. If you want, you can even revert your code to what it used to look like several commits ago. You can do some pretty powerful things. But you've barely touched the surface of all the awesome stuff you can do with Git.

Branching

So, remember how I mentioned that you can have more than one head in a repository? You may not have noticed, but I sort of glossed over that and then went on to talk exclusively about the master head. Here's where things get cool. This is also where a lot of people seem to lose all understanding of Git.

You can create a branch using the command git branch <branch_name>. This will simply create a new head for you. Note that you're still on the master head, which we're going to start calling a branch right here and now. To switch branches, you use git checkout <branch_name>. Boom. All of a sudden, you're on a new branch. Now, it may not seem like anything big just happened. It did though, I promise. You see, the state of the current branch (the one you're on when you issue the git branch command) was just copied wholesale onto a new head. You then use git checkout to change the HEAD from your current branch to the new one. If this still doesn't seem big, listen to this. This new branch is pretty much a new repository. You can commit to it. You can do roll backs, reversions, and anything else you've been able to do before. You can completely change the whole dang thing. And it won't have any effect on the previous branch! If you're having trouble seeing why this is useful, think of the case of fixing a bug. You're on master and you find that you've got a weird bug in your code. You really need to fix the bug...but you don't want to mess up your code doing it. Awesome. Create a branch. You can even do it the easy way and just say git checkout -b <branch_name>. That creates the branch and automatically changes your HEAD to it. Now you can do whatever you want to the code without having to worry about your master branch becoming a nightmare of spaghetti code and print statements.

At some point, such as when you fix the bug, you're going to want to merge the two branches by putting the changes from one into the other. It's fairly simple. Just checkout the branch you want to merge the changes into and then plop git merge <name_of_branch> into your terminal. This will merge the changes from <branch_name> into the branch that you're currently on. In our bug fix example, let's say we called our new branch bugfix. We fixed the bug and want to merge those changes into master. Well, we would use:

   git checkout master
   git merge bugfix

You're probably beginning to see where this is powerful by now. If not, you can read more about it here, where they go very in depth on the topic.

If you've actually tried to test a few things out between here and the top of the document, you're probably in a state of near-but-not-quite-confusion. That's fine. You see, at this point, you know enough to get by with even fairly large personal projects. If you don't plan to host your code anywhere (think GitHub or Bitbucket), then you're golden. Just stop reading and go outside or something, if that's what floats your boat.

Don't think we don't just yet, though. Because we aren't.

Forking, Cloning, and Other Such Nonsense

If you plan to host your code out on the infamous cloud somewhere, you're going to need to know a few commands to make that happen. We'll start by pushing your code up to a repository. We're going to pretend it's GitHub. We're also going to pretend that you're a very smart person and can figure out how to set up an account and such (and that you know enough Git terms to Google any little errors in the set up process).

So you've got your GitHub account all set up and you've clicked the 'New Repository' button. You've given it a name and descriptionFor the sake of this tutorial, you haven't initialized it with a README, a license, or a .gitignore file. So you're now on a screen telling you a bunch of confusing things.

You can look at it for now like GitHub has just created a directory (named whatever your called the repo) that will hold your code in the future. For now, it's completely empty. It doesn't even have the .git directory we learned about earlier. You can remedy this in a couple of different ways, depending on what your goal is.

If you've got an existing repository of code, it's super easy. You just need to tell Git on your local machine where to push it when you want to send it to GitHub. To do that, you use the git remote add origin <url> command. This url is simply https://github.com/<your_username>/<repo_name>.git. Now you can use git push -u origin master to push everything on the master branch up to your repository. You'll need to push up each branch in a similar fashion, replacing master with the branch name. You can go to the repository on GitHub now and you'll see all of your code sitting there. Awesome.

If you're starting a new project or haven't set up Git in the directory, then you'll need to take a few extra steps before you can push code. First, you'll need to run git init to initialize Git. Then you'll need to actually create a commit, so that the push command will have something to push -- if there are no commits it fails. Finally, you'll proceed just like you would above. git remote add origin <url> followed by git push -u origin master. Boom.

Now that your code is on GitHub, you're open to a whole world of opportunities. One of the biggest uses is sharing code. Let's say that the repository you just pushed is a group project. Now you and all of the other members can ensure that all of the code is up to date and such. You can also, through the commit messages and file diffs, keep track of what exactly has changed between each commit. Now that you're using it for your group project, you'll want to make sure that your local machine has all of the most updated changes. To do this, you can run git pull origin <branch_name>.

Note: You'll also see stuff about git fetch used in the same context as git pull. fetch is just a pull followed by a merge.

Okay, so that's how you put your code onto GitHub and get it down. In the case that you find a repository on GitHub that you would like on your local machine, you'll need to use git remote clone <url> command. This will download the everything from the URL's Git repo down to your local machine. This includes all of the commit history and everything. It's as simple as that!

The final thing we're going to cover is forking a repository. Forking a repository is a simple, but super powerful, thing. Remember earlier how I said that a branch was "pretty much" a new repository copied from the old one? Well a fork is literally a new repository copied from the old one. The fork is now your very own copy of the original repository in the exact state it is whenever you forked it. Forking is especially useful for when you want a copy of code that isn't entirely yours but you want full control over. For example, if you wanted to use the Popcorn Time app but wanted to make your own personal changes to it, you could fork it. You can now do whatever you want to it, including get updates from the original repository and submit pull requests for your changes to be added to the original repo. Or you can do absolutely nothing to the code, which is useful when you need for the code to not change.

I've only ever created a fork in the context of GitHub and Bitbucket, both of which make the process super easy. You just click a button that says "Fork" and suddenly you have your own copy of the repository. You'll have to clone it and such, just like normal, but that's it. Easy as cake.

Wrap Up

If you've made it all the way down here, congratulations! In theory, you know a lot of stuff about Git at this point. Certainly not even almost everything, but enough to get by pretty successfully! In fact, if you can master the stuff I've talked about here, you'll be set up to do just about everything you'll need to do. You'll at least know enough about anything you encounter to know what to Google.

If you see any errors in, please let me know! Likewise, if there is anything you would like to see explained in more detail (or just plain better) let me know and I'll give it a go! I knocked all of this out over the course of a couple of hours, so it's definitely lacking a bit of polish here and there.

References

Information presented in this document is a mix of information from my experience with Git and from the following sources. These are all great reads and will probably explain a lot of things better than I have here.

The largest and best resource for myself has been the Git book by O'Reilly. I'd strongly suggest giving it a read, especially if you peruse the links and still have any nagging questions.