Using git without feeling stupid (part 2)

The comment you are replying to does not exist.
Tagged:  •    •  

In the first installment, I showed how basic usage of git does not need any concept that is unique to a particular version control system. In this installment, I'll introduce more usage of git that requires learning a concept or two.

First, I mentioned in the first installment that, even in the simplest single-developer usage of git, the diff subcommand can be useful to examine differences between revisions. To do this, however, you need to know git names revisions. They are named with a long string of 40 hexadecimal digits; there is no autoincrementing number as you have in arch or subversion. The hexadecimal string is a unique number that describes the content of a particular revision and all the history that led to that particular revision. You can get the name from git log

commit 10390f9021e7bee3845e7f69a3f378cd3d319b0f
Author: Paolo Bonzini 
Date:   Tue Jan 29 11:31:23 2008 +0100

    minor changes

git allows you to use any unique prefix of the string to identify a particular revision; usually one uses six to eight hex digits. So, git diff 10390f will give you the difference between the tree right after that commit and the current working tree.

There are several other ways to name commits. I'll mention three. HEAD is the commit to which you are making modifications in the working tree. 10390f^ is the commit before 10390f, so that you can say git diff 10390f{^,} to examine the differences introduced by one particular commit. Finally, master is the top of the trunk (it is actually a branch name).

With this in hand, I can introduce a new command before going on to the biggest topic of the day. The command is git checkout and, like cvs update -r, it allows you to go back in time and put a stored revision in the working tree. For example, git checkout 10390f^ will extract the revision just before 10390f, and git checkout master will go back to the top of the trunk.


The topic of the day is a particular concept in git, the index. However, I'll try to introduce it concretely, based on actual use cases. One, as I mentioned, is conflicts. The other is a clarification of two things I told you in the first installment I told you two things, asking you to believe me for a while:

  • git add won't fail if the file is already under version control
  • git commit will only add and remove files that you marked with git add or git rm

These were both imprecise, and both imprecisions have to do with the index. Frankly, the name is really badly chosen; it would be better to call it a staging area, for example.

What git add does is to move the current version of the named file to a special staging area, holding files that are ready to be committed. And what git commit (without other arguments) will do is to take the index and make a new revision out of what the index contains. git commit -a is just a convenience which adds all modified files to the index, and then commits the result.

How does this affect you? The first thing to remember is this one: only run git add on new files just before committing. Otherwise, you'll commit the wrong contents of the file. Case in point:

$ git init
   ... create hello.c ...
$ cat hello.c
int main ()
{
  printf ("hello world\n");
}
$ git add hello.c
$ gcc –o hello hello.c
hello.c: In function ’main’:
hello.c:3: warning: incompatible implicit declaration of built-in function ’printf’

Grrr. Warnings.

  ... fix warning ...
$ git commit

Surprise: the committed version will still have the warning! That's because git add did something more than preparing to add hello.c to version control. It also snapshotted its contents and placed them into the index.

No. As naughty as this might seem to be, there's no way around it. The only way around it is to start using the index more. For example, when you have to commit only some of the changes in the working tree, or when you are adding new files to the tree, you could not use the command line tools. Instead, fire git citool which is a graphical user interface to examine changes, stage them, and finally commit them. It is powerful, easy to use, and will provide a gentle introduction to the concept of a staging area. It also will catch mistakes such as the above one.

Now, what does the index have to do with conflicts? The answer is simple. If a merge has conflicts, git takes care of adding unconflicted files in the index, and leaves conflicted files out of the index. Read it again. Slowly. Once more. Then, go ahead.

Also, git diff, without any arguments, does not show changes that are staged in the index. In fact it diffs the working tree against the index, not against the repository. Read it again. Slowly. Once more. Then, go ahead.

Yes, now you can scream. What? Why do I have to go through all these mental contortions? But, think more about it. In most cases, you don't care about unconflicted files. Let's say you merge a huge patch and you have a couple of stupid conflicts in the makefile. Why should git diff spew the huge patch at you? You just want to see the problematic changes, i.e. the makefile. I hope the two above statements are now connected and (almost) make sense to you.

So, you just have to learn a couple of tricks of the trade. Here is how you go solving conflicts after git pull --rebase has failed:

  • git diff will show you changes in the conflicted files;
  • since git add will stop showing a file in git diff (until you change it again), you can use it to mark a file as resolved;
  • at the end, your git diff should be empty, since you should have resolved the conflicts;
  • now, use git commit (without -a) to commit the result of the merge. Why no -a? Because the result of the merge is already in the index; and without any parameters, git commit transforms the index into a commit.

You may need to read again the last one, but otherwise it shouldn't be too hard. Still, I can hear you complaining: sometimes you do want to look at the overall result of a merge.

And indeed, there are two ways to fix this. The first is to use git citool, which allows you to review changes. The second is to invoke git diff with arguments in order to tweak its operation mode; in particular:

  • git diff shows you the differences from index to working tree
  • git diff HEAD shows you the differences from trunk to working tree
  • git diff --cached shows you the differences from trunk to index

With this, you should be able to tackle conflicts pretty well.


Before concluding this installment, I'll point out a little hidden gem. git status has output that is very different from svn status, but the latter can be very useful. Luckily, you can obtain subversion-like output with the invocation git diff --name-status -r; since this is quite a mouthful, you can add this to your ~/.gitconfig file:

[alias]
       changes=diff --name-status -r

You have now created a git changes command that knows about all of git diff's option. In particular the three invocations git changes, git changes HEAD, or git changes --cached, will have the same meaning as the git diff commands above.

Since you are at it, add this to ~/.gitconfig too:

[diff]
       renames = true

and also tell git about your identity like this:

[user]
       name = Paolo Bonzini
       email = bonzini@gnu.org

The name and e-mail address will be used to identify you in commits.

The next installment should cover branches and merges. However, I have said enough for GNU Smalltalk users, so I don't really know when I'll write it. :-)

the first 2 were great. i'm a new git user and have battled understanding much of it. i've been tasked with learning git well enough to teach our team and implement it's usage in our developmnet environment so we can have a central repository of all our source code for various projects and feel confident about what we're doing with it... i still don't have the confidence with git but am getting there...

the installment for branches and merges is critical! LOL.. i'd really look forward to seeing it although this post is old enough that it's probably not goign to happen :(

Great article - demystifies it enough for me to be able to use it sensibly without having to read a long manual - thanks!!!

Unless we're using different versions git-diff has no -r option. Git seems to disregard erroneous options, which is why there wouldn't be an error, but your alias can be just changed to:
changes=diff --name-status

I have it here and it works fine :)

Nice post. Thanks.

You don't need to "git add" your changes, you can use "git commit -a" (there was a lengthty discussion on why it is not the default).

Yes -- see part 1. But my point is that usually when merging it's better to use "git add" (so that you have to mark all your conflicts as fixed, or "git commit" complains) or "git citool".

Paolo

"you can say git diff 10390f{^,} to examine the differences introduced by one particular commit"

You can also say

 git diff 10390f^!

(Yeah, it's all the way at the end of the git-rev-parse manpage, but it's very useful)

Segher

The most simple, clear still complete git howto i have ever read.
thank you.

pleas write out the 3th part ASAP!

Andrea.

Just wanted to say "+1 to the waiting-for-the-3rd-part list"

So I succeeded in my intent. If I don't write it "soon enough", go on with http://git.or.cz/course/svn.html -- very concise but after reading the two parts you should be able to follow it easily.

User login