Log in

No account? Create an account

February 10th, 2009

Previous Entry Share Next Entry
09:41 pm - Fun with "git grep"

While I was reviewing the code in the git tree, I stumbled across this line:

       for (i = 0; i < 256; i++) {
               struct object_entry **next = c;;
               while (next < last) {

The double-semicolon is obviously wrong. But fixing only this one did not feel right. I wondered, "How many of these mistakes are there?"

If you were in a work tree controlled by CVS, SVN, Hg, or some other version control system, or if you were not using a revision control system at all, here is what you would normally do to answer that question:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune -o \
        -type f -print0 | xargs -0 grep -n -e ';;'

But this will still run grep on all the object files and other uninteresting files, so your find command would become a lot longer, like:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune \
      -type f \! -name '*.o' \! -name '*~" -print0 | xargs -0 grep -n -e ';;'

If you are lucky and are using git, you can say this instead:

    git grep -e ';;'

This runs grep only on the files in the work tree you have under version control, so it will not report hits in any object files nor editor backup files. If you want to do the same for only C source files (because we also have shell scripts, and double-semicolon is not an error there), you can further limit the search space by saying:

    git grep -e ';;' -- '*.c'

Because "git grep" can take options, pattern strings (when there is only one, it does not have to be preceded by -e, just like regular grep) and commits (see below), you use double-dash to mark the beginning of the optional list of path patterns that come at the end of the command line.

You can do the same search in a version you have scheduled for the next commit in the index:

    git grep --cached -e ';;' -- '*.c'

As with other git commands, --cached tells the command that usually works on the work tree to work only on the data in the index instead. Of course, you can search in an arbitrary commit, e.g. run the search in the current and in the "next" branch:

    git grep -e ';;' next HEAD -- '*.c'

In any of these, including the "find | xargs grep" pipe, there is one practical problem, that comes from a shortcoming of the set of options regular "grep" offers. You would see many lines that look like this in the output:

    for (;;) {

which is very legitimate and not interesting for our purpose.  We often see ugly construct like:

    grep -e ';;' *.c | grep -v 'for .*;;'

and it quickly becomes unmanageable once you need to start excluding many false matches.

Again, you are lucky that you are using git. With "git grep", you can do this:

    git grep -e ';;' --and --not -e 'for *(.*;;' -- '*.c'

When you give more than one patterns to grep with -e option, you can only say "lines matching this or that". But "git grep" allows you to say "lines matching this and that" by specifying --and. The above example goes one step further. By prefixing a term with --not, you can even say "lines matching this and not that".

A limited form of this grep facility is also available to look for commits by strings that appear in the commit log message via git-log family of commands, but that will be a separate topic.




(6 comments | Leave a comment)


[User Picture]
Date:March 5th, 2009 08:36 pm (UTC)

Re: --cached vs. --staged

There actually is not much to decide.

git has always called the act of telling it to remember the current state of the contents "to update the cache" (hence --cached), and the mechanism used for that has been called "the cache" throughout the system, both in code and in the UI. Even the .git directory itself was called .dircache in very early versions.

It is only fairly recently that some people started using different terminology "to stage" and "staging area", and patches were accepted to allow the use of this colloquial synonym to reduce confusion for new people who learned git from these folks.

It is just like natural language. Once enough people use an expression, it becomes part of the accepted language, even if it is initially considered gramatically incorrect.
gitster's journal

> Recent Entries
> Archive
> Friends
> Profile

Pages at the k.org
The latest blog by Gitster

> Go to Top