Log in

December 31st, 2008

Previous Entry Share Next Entry
03:40 pm - Fun with msysgit 1.6.1 preview
Even though I almost never use Windows, I tried msysgit 1.6.1 preview today to see how it got improved since the last time I tried it, which was back when it still was called WinGit.

I was pleasantly impressed ;-)  I am so pleased that I decided to write about my experience, with some "git tutorial" sprinkled in.  It was fun especially because I usually do not use these GUI tools, and in this "tutorial", I try to do many things that I usually do from the command line without GUI.

Downloading from http://code.google.com/p/msysgit/ inside FireFox gave me a file that was only 5MB or so large without any indication of errors, even though while downloading the progress bar claimed that the final size should be around 8.3MB.  This must be where some mistaken "corrupt upload" reports I occasionally see on the msysgit mailing list come from (no, I am not subscribed to it, but sometimes peek at it through gmane). After re-downloading with wget to my primary machine and copying it from there, it launched a user friendly (aka bozo proof) "Setup Wizard":

The page to review GPLv2 is still there, and I recall a controversy about it a few months ago.  I think msysgit team did a good job finding a reasonable wording that makes it clear that the notice is merely informational, letting the end-user know about his/her rights, and making sure that the user can proceed without having to agree nor disagree to anything.

After a few more pages playing with the Wizard to reach the completion, I had a git icon on my menu bar.

The first order of business is to tell git who you are.  This is not necessary on a well managed UNIX host whose hostname is properly set to return where your e-mails should go, and whose user database records your human readable name in its GECOS field, but it certainly is necessary on a Windows host and probably on many amateur home installations of Linux boxes as well.  You run (of course, not with my name and address but replacing the values in the example with your own):

    $ git config --global user.name "Junio C Hamano"
    $ git config --global user.email "gitster@pobox.com"

to set this information globally (aka "in $HOME/.gitconfig"), so that it would apply to any repository you will have on this machine.  While you are at it, you might also want to set ui.color to "auto", and "core.autocrlf" to "false".  Then for my git experiments in this environment, I created "GitFarms" directory and moved into it:

By the way, I just mentioned "$HOME/.gitconfig", but this was my first encounter with msysgit environment, and I do not exactly know where my $HOME actually is.  The notebook I installed msysgit on was running Vista, and I created "GitFarms" in my "$HOME directory". Where is it???

The directory was found in "C:\Users\junio\".  No surprises.

I use gpodder podcatcher on the Linux side of the notebook and have a few personal quick-and-dirty patches that are never intended to be sent upstream.  I decided to test drive msysgit using this project as the guinea pig.

First, you need to locate where the official upstream is.  This project was housed on subversion long time ago (and I used git-svn to convert it for my own use when these q&d patches were made), but these days they are on git.  To clone:

    $ git clone git://repo.or.cz/gpodder.git

Then, I tried git-gui.

    $ git gui &

It shows nothing interesting, because there is nothing interesting in a repository immediately after a fresh clone.

Going to "Repository" and then "Visualize master's History" will give you gitk.

The development history of this project is very linear.  When you draw the history vertically, it is customary to put the latest commit near the top, with older ones near the bottom (in other words, time flows from bottom to top), and "gitk" is no exception.  It shows the title of each commit and how they relate to each other by the parent-child relationship on the top pane, and the bottom two panes are used to show the commit log message, diff, and the list of paths that are affected by the chosen commit.

You can type "Junio" in the Find text area and hitting the Enter key will find a small patch that I sent quite a while ago to this project:

One of my personal patches not in the official history is about setting the title of the podcast to the album tag.  I do not know why, but the upstream software only sets the title, artist and genre tags.  After editing an appropriate source file:

    $ git diff

shows what I just did (in color -- because I have "ui.color" set to "auto" earlier).

"git gui" (perhaps you need to "rescan") shows the same information per file.  Clicking on the filename shows the change in "Modified, not staged" state.  I prepare a commit log message in the text area at the right bottom corner to justify why I need to change the code in my way, and I can do this while reviewing the changes in the right top area:

Clicking the file icon next to the filename in "Unstaged Changes" area marks that the updated contents should be in my commit by moving it to "Staged Changes (Will Commit)" area.  If I did this by mistake, I can click the checkbox next to the filename to unstage it:

Clicking "Commit" will record the state after my change as a new commit.

If you go to "File"/"Update" in "gitk" at this point, it will show that the new commit is pointed by the "master" branch, while the tip of the upstream "master" branch is one commit behind it.

I notice that I did not end the sentence in the commit log message body with a period.  I can "Amend Last Commit" in "git gui", edit the commit log message, and commit again.  After doing so, "File"/"Update" in "gitk" will show that I have a forked history, two similar commits near the tip, with the corrected commit pointed by my "master", and another dangling one leading nowhere:

If I say "File"/"Reload" in "gitk" at this point, the dangling commit disappears.

In this example, the difference from the lost commit and the corrected one is so trivial, and I would never feel the need to resurrect it, but I can still get it back from the reflog mechanism if I really wanted to:

    $ git log -g
    commit 82e15f005ca16fae6236de2f843cf671be694b0f
    Reflog: HEAD@{0} (Junio C Hamano <gitster@pobox.com>)
    Reflog message: commit (amend): Set the title to album tag as well
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Wed Dec 31 13:29:48 2008 -0800

        Set the title to album tag as well

        Otherwise my Sansa shows it as "Unknown".

    commit 062752a51577246b8925f78dc7d49aa984b8aaa9
    Reflog: HEAD@{1} (Junio C Hamano <gitster@pobox.com>)
    Reflog message: commit (amend): Set the title to album tag as well
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Wed Dec 31 13:28:55 2008 -0800

        Set the title to album tag as well

        Otherwise my Sansa shows them as "Unknown"


The "-g" option to "git log" tells it to show how the tip of the branch progressed.  The original commit and the amended one are not parent-child (they are siblings), but "log -g" output shows them in the order they happened, i.e. the original was there, and then the amended one came. Oh, as usual, time flows from bottom to top when showing commits vertically, so the amended one is shown first and then the original one follows.

I added another change in a similar fashion, this time I used "git am" to apply the change I already made on the Linux side of my notebook, and it worked just fine (as expected):

    $ git am 0002-hack-tagupdate-force-using-v2.3-tag-with-utf16-le.patch

While I was tweaking the software to suit my personal needs, the upstream may have been making improvements, and I would want to make sure I stay up to date.

First, from "git gui", you can say "Remote"/"Fetch from..."/"origin". This will download the upstream changes and stores them in the remote tracking branches:

After this operation, "File"/"Update" in "gitk" will show that the histories have forked.  I have two commits on my "master" branch and the upstream has 6 commits on their own.

Now, I can choose from two ways to integrate the changes.  The most natural way I use for my primary project is to merge, so I try it first.

In "git gui", "Merge"/"Local Merge..." lets me choose the branches that I can merge.  I pick "origin/master" and it results in a clean merge.

When "gitk" is updated, I see two forked histories merged together at "master".

As you can see, the tip of the "master" branch is a merge commit that binds these two forked histories together.

Another way to stay up to date is to "rebase".

Instead of merging and recording the fact that the development history once forked to two tracks in the past, it lets me pretend that I made my two commits starting from the tip of the upstream.  To demonstrate it, I need to first undo the merge by getting back to the state before making the merge.  I can use "gitk" for that:

Right clicking on the commit that was at the tip of my "master" branch
before the merge shows a menu.

By choosing "Reset master branch to here", and telling it to do a "Hard" reset, I can discard everything that was caused by the merge.

Again, "File"/"Update" shows what the commit ancestry graph looks like after this operation.  I am back at where I was, and the merge commit is dangling (which you can remove from the view by "File"/"Reload" but you can still get to it by using the reflog mechanism):

I couldn't figure out how to rebase inside "git gui", and I wanted to see how the command line replicates the familiar feel of git in the Windows environment that is foreign to me, so I decided to try "rebase" from the command line.  First, to demonstrate that you can get the same information from the command line, I run:

    $ git show-branch origin master

You can see that the upstream have six commits while I have two commits on top of where we forked at.

Then the rebase itself:

    $ git rebase origin

This replays my two changes on top of the updated upstream.  Viewed in "gitk" after "File"/"Update", you will notice that near the tip of the "master" branch are two of my commits, but two extra commits that have the same description are left behind (again, you can remove them from the view by "File"/"Reload").

People may be wondering why there are two ways to stay up to date, and how to decide which one to use.  It largely depends on how you interact with the upstream.  Very roughly put, the less important your changes are to the upstream and more important they are to you personally, the more you tend to rebase.

In my case, I do not really "work on" gpodder.  My patches are purely for my personal use, because they are not generic enough to be sent upstream.

Perhaps there are people who (or MP3 players that) do not want "album" tag to be set for a Podcast track, in which case my addition needs to be made conditional.  Otherwise my change may end up breaking the software for other people and their MP3 players.  That means I need to learn more about the way how the gpodder codebase works in order to add a new configuration item and add a new UI item to flip that configuration bit, which is more effort than I can afford to spend.  I do not even have time to survey if adding "album" tag unconditionally set to the same string as "title" has any downside to other people.

The other change I have is to force the id3v2 version down to 2.3 with UTF16-LE encoding, to please my MP3 players that do not understand id3v2 version 2.4 tags, and I know that this change needs to be made conditional, if it ever needs to hit the upstream.

In short, I keep these two changes for my personal use only because I do not have time nor inclination to polish them to send upstream.  This is a prime example of my patch being not important at all for the upstream (and being important to me).  Thus, I'd be better off rebasing than merging.

Even though:

    $ git fetch
    $ git log --no-merges origin..

will show the commits I made and am keeping to myself whether I merge with origin or I rebase onto origin, I however find it generally is easier to manage the history if you keep rebasing, instead of merging, if only because my own changes will always float near the tip of the history in "gitk" and "git log" output, instead of getting buried very deep in the history.

Another advantage of rebasing your personal patch constantly is that it forces you a discipline to adjust your changes to the changes in the upstream as early as possible.  If you do not rebase and choose to use merge in your workflow, your personal changes will be buried deep in the history.  When one of your many later merges with the upstream made you resolve the conflicts with such old changes, two things happen:
  • You do not remember what your own change was about, and have a hard time resolving the conflict;
  • You may be able to resolve the conflict, but what you can extract from "git log --no-merges origin.." will not be something you can eventually send upstream.  You will need to rebase before submitting.
I think msysgit team has done a great job in packaging the system, and I had a lot of fun playing with it on Windows which is a platform I am unfamiliar with.  I hope the readers also enjoyed this article, and with luck they may have also learned something new.



(1 comment | Leave a comment)


Date:January 15th, 2009 05:50 am (UTC)

simply excellent

superb article... thank u very much!!!
Fun with msysgit 1.6.1 preview - gitster's journal

> Recent Entries
> Archive
> Friends
> Profile

Pages at the k.org
The latest blog by Gitster

> Go to Top