gitster (Junio C Hamano) (gitster) wrote,
gitster (Junio C Hamano)

Fun with "git grep"

While I was reviewing the code in the git tree, I stumbled across this line:

       for (i = 0; i < 256; i++) {
               struct object_entry **next = c;;
               while (next < last) {

The double-semicolon is obviously wrong. But fixing only this one did not feel right. I wondered, "How many of these mistakes are there?"

If you were in a work tree controlled by CVS, SVN, Hg, or some other version control system, or if you were not using a revision control system at all, here is what you would normally do to answer that question:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune -o \
        -type f -print0 | xargs -0 grep -n -e ';;'

But this will still run grep on all the object files and other uninteresting files, so your find command would become a lot longer, like:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune \
      -type f \! -name '*.o' \! -name '*~" -print0 | xargs -0 grep -n -e ';;'

If you are lucky and are using git, you can say this instead:

    git grep -e ';;'

This runs grep only on the files in the work tree you have under version control, so it will not report hits in any object files nor editor backup files. If you want to do the same for only C source files (because we also have shell scripts, and double-semicolon is not an error there), you can further limit the search space by saying:

    git grep -e ';;' -- '*.c'

Because "git grep" can take options, pattern strings (when there is only one, it does not have to be preceded by -e, just like regular grep) and commits (see below), you use double-dash to mark the beginning of the optional list of path patterns that come at the end of the command line.

You can do the same search in a version you have scheduled for the next commit in the index:

    git grep --cached -e ';;' -- '*.c'

As with other git commands, --cached tells the command that usually works on the work tree to work only on the data in the index instead. Of course, you can search in an arbitrary commit, e.g. run the search in the current and in the "next" branch:

    git grep -e ';;' next HEAD -- '*.c'

In any of these, including the "find | xargs grep" pipe, there is one practical problem, that comes from a shortcoming of the set of options regular "grep" offers. You would see many lines that look like this in the output:

    for (;;) {

which is very legitimate and not interesting for our purpose.  We often see ugly construct like:

    grep -e ';;' *.c | grep -v 'for .*;;'

and it quickly becomes unmanageable once you need to start excluding many false matches.

Again, you are lucky that you are using git. With "git grep", you can do this:

    git grep -e ';;' --and --not -e 'for *(.*;;' -- '*.c'

When you give more than one patterns to grep with -e option, you can only say "lines matching this or that". But "git grep" allows you to say "lines matching this and that" by specifying --and. The above example goes one step further. By prefixing a term with --not, you can even say "lines matching this and not that".

A limited form of this grep facility is also available to look for commits by strings that appear in the commit log message via git-log family of commands, but that will be a separate topic.



Tags: git
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded