Log in

No account? Create an account

February 10th, 2009

Previous Entry Share Next Entry
09:41 pm - Fun with "git grep"

While I was reviewing the code in the git tree, I stumbled across this line:

       for (i = 0; i < 256; i++) {
               struct object_entry **next = c;;
               while (next < last) {

The double-semicolon is obviously wrong. But fixing only this one did not feel right. I wondered, "How many of these mistakes are there?"

If you were in a work tree controlled by CVS, SVN, Hg, or some other version control system, or if you were not using a revision control system at all, here is what you would normally do to answer that question:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune -o \
        -type f -print0 | xargs -0 grep -n -e ';;'

But this will still run grep on all the object files and other uninteresting files, so your find command would become a lot longer, like:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune \
      -type f \! -name '*.o' \! -name '*~" -print0 | xargs -0 grep -n -e ';;'

If you are lucky and are using git, you can say this instead:

    git grep -e ';;'

This runs grep only on the files in the work tree you have under version control, so it will not report hits in any object files nor editor backup files. If you want to do the same for only C source files (because we also have shell scripts, and double-semicolon is not an error there), you can further limit the search space by saying:

    git grep -e ';;' -- '*.c'

Because "git grep" can take options, pattern strings (when there is only one, it does not have to be preceded by -e, just like regular grep) and commits (see below), you use double-dash to mark the beginning of the optional list of path patterns that come at the end of the command line.

You can do the same search in a version you have scheduled for the next commit in the index:

    git grep --cached -e ';;' -- '*.c'

As with other git commands, --cached tells the command that usually works on the work tree to work only on the data in the index instead. Of course, you can search in an arbitrary commit, e.g. run the search in the current and in the "next" branch:

    git grep -e ';;' next HEAD -- '*.c'

In any of these, including the "find | xargs grep" pipe, there is one practical problem, that comes from a shortcoming of the set of options regular "grep" offers. You would see many lines that look like this in the output:

    for (;;) {

which is very legitimate and not interesting for our purpose.  We often see ugly construct like:

    grep -e ';;' *.c | grep -v 'for .*;;'

and it quickly becomes unmanageable once you need to start excluding many false matches.

Again, you are lucky that you are using git. With "git grep", you can do this:

    git grep -e ';;' --and --not -e 'for *(.*;;' -- '*.c'

When you give more than one patterns to grep with -e option, you can only say "lines matching this or that". But "git grep" allows you to say "lines matching this and that" by specifying --and. The above example goes one step further. By prefixing a term with --not, you can even say "lines matching this and not that".

A limited form of this grep facility is also available to look for commits by strings that appear in the commit log message via git-log family of commands, but that will be a separate topic.




(6 comments | Leave a comment)


Date:February 11th, 2009 05:41 pm (UTC)

You can always use 'ack'

Well, at least for recursive searching and for ignoring VCS directories and other uninteresting files you can use ack (http://petdance.com/ack/) instead of find + grep.
[User Picture]
Date:February 12th, 2009 08:08 am (UTC)

Re: You can always use 'ack'

$ ack
dash: ack: not found

Next question?

More seriously, I do not use "find | xargs grep" myself, so a replacement for it that improves it only a bit is not attractive to me at all.

When "ack" learns to search inside the git index or git commits, and boolean expressions such as the one that uses --and --not I showed in the main text, you may interest me in it, but not until then.

Date:March 6th, 2009 10:08 am (UTC)

Re: You can always use 'ack'

Ack's charms are very seductive; it gradually replaces grep in your mind if you start using it. If your search needs *always* (not just today but well into the foreseeable future) fit within ack's notion of what files it is willing to search, you will be very happy. If your needs are not so structured, you'll often have to go back to the real grep.

ack's notions of what files it is willing to search are, at this point in time, not documented very well, though the author has promised to update the docs when he finds time. It does have an option to tell you what files it is searching, but it would have been nicer if the option showed what files it is *ignoring*.

gitster's journal

> Recent Entries
> Archive
> Friends
> Profile

Pages at the k.org
The latest blog by Gitster

> Go to Top