You are viewing gitster

February 10th, 2009


Previous Entry Add to Memories Share Next Entry
09:41 pm - Fun with "git grep"

While I was reviewing the code in the git tree, I stumbled across this line:

       for (i = 0; i < 256; i++) {
               struct object_entry **next = c;;
               while (next < last) {

The double-semicolon is obviously wrong. But fixing only this one did not feel right. I wondered, "How many of these mistakes are there?"

If you were in a work tree controlled by CVS, SVN, Hg, or some other version control system, or if you were not using a revision control system at all, here is what you would normally do to answer that question:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune -o \
        -type f -print0 | xargs -0 grep -n -e ';;'

But this will still run grep on all the object files and other uninteresting files, so your find command would become a lot longer, like:

    find . -type d \( -name CVS -o -name .svn -o -name RCS \) -prune \
      -type f \! -name '*.o' \! -name '*~" -print0 | xargs -0 grep -n -e ';;'

If you are lucky and are using git, you can say this instead:

    git grep -e ';;'

This runs grep only on the files in the work tree you have under version control, so it will not report hits in any object files nor editor backup files. If you want to do the same for only C source files (because we also have shell scripts, and double-semicolon is not an error there), you can further limit the search space by saying:

    git grep -e ';;' -- '*.c'

Because "git grep" can take options, pattern strings (when there is only one, it does not have to be preceded by -e, just like regular grep) and commits (see below), you use double-dash to mark the beginning of the optional list of path patterns that come at the end of the command line.

You can do the same search in a version you have scheduled for the next commit in the index:

    git grep --cached -e ';;' -- '*.c'

As with other git commands, --cached tells the command that usually works on the work tree to work only on the data in the index instead. Of course, you can search in an arbitrary commit, e.g. run the search in the current and in the "next" branch:

    git grep -e ';;' next HEAD -- '*.c'

In any of these, including the "find | xargs grep" pipe, there is one practical problem, that comes from a shortcoming of the set of options regular "grep" offers. You would see many lines that look like this in the output:

    for (;;) {

which is very legitimate and not interesting for our purpose.  We often see ugly construct like:
 

    grep -e ';;' *.c | grep -v 'for .*;;'

and it quickly becomes unmanageable once you need to start excluding many false matches.

Again, you are lucky that you are using git. With "git grep", you can do this:

    git grep -e ';;' --and --not -e 'for *(.*;;' -- '*.c'

When you give more than one patterns to grep with -e option, you can only say "lines matching this or that". But "git grep" allows you to say "lines matching this and that" by specifying --and. The above example goes one step further. By prefixing a term with --not, you can even say "lines matching this and not that".

A limited form of this grep facility is also available to look for commits by strings that appear in the commit log message via git-log family of commands, but that will be a separate topic.


 

 


Tags:

(6 comments | Leave a comment)

Comments:


From:(Anonymous)
Date:February 11th, 2009 05:41 pm (UTC)

You can always use 'ack'

(Link)
Well, at least for recursive searching and for ignoring VCS directories and other uninteresting files you can use ack (http://petdance.com/ack/) instead of find + grep.
[User Picture]
From:gitster
Date:February 12th, 2009 08:08 am (UTC)

Re: You can always use 'ack'

(Link)
$ ack
dash: ack: not found

Next question?

More seriously, I do not use "find | xargs grep" myself, so a replacement for it that improves it only a bit is not attractive to me at all.

When "ack" learns to search inside the git index or git commits, and boolean expressions such as the one that uses --and --not I showed in the main text, you may interest me in it, but not until then.

From:(Anonymous)
Date:March 6th, 2009 10:08 am (UTC)

Re: You can always use 'ack'

(Link)
Ack's charms are very seductive; it gradually replaces grep in your mind if you start using it. If your search needs *always* (not just today but well into the foreseeable future) fit within ack's notion of what files it is willing to search, you will be very happy. If your needs are not so structured, you'll often have to go back to the real grep.

ack's notions of what files it is willing to search are, at this point in time, not documented very well, though the author has promised to update the docs when he finds time. It does have an option to tell you what files it is searching, but it would have been nicer if the option showed what files it is *ignoring*.

Sitaram
[User Picture]
From:annodomini
Date:February 23rd, 2009 05:41 pm (UTC)
(Link)
Thank you for this! I just discovered git grep a day or two ago, and have found it much easier to use (not to mention faster) than 'find | xargs grep'. These tips are helpful to see more uses of git grep; I didn't realize it had boolean operations.
From:(Anonymous)
Date:March 5th, 2009 06:50 pm (UTC)

--cached vs. --staged

(Link)
Nice post! Thank you and the Git developers for providing "git grep". It's another great reason for choosing Git over Subversion and CVS. However, I'd like to comment not on the wisdom of "git grep", but on the choice and common usage of option name "--cached" when referring to changes in the index. Why was this name chosen over the perhaps more appropriate name "--staged"? Every time I see "--cached" I almost invariably translate this to "--staged" in my head. The term "cached" strikes me as too general purpose to suggest the "next changes to commit", but staged seems more specific in that sense. Easy Git, for example, uses "--staged" in favour of "--cached". What is your opinion on this decision?
[User Picture]
From:gitster
Date:March 5th, 2009 08:36 pm (UTC)

Re: --cached vs. --staged

(Link)
There actually is not much to decide.

git has always called the act of telling it to remember the current state of the contents "to update the cache" (hence --cached), and the mechanism used for that has been called "the cache" throughout the system, both in code and in the UI. Even the .git directory itself was called .dircache in very early versions.

It is only fairly recently that some people started using different terminology "to stage" and "staging area", and patches were accepted to allow the use of this colloquial synonym to reduce confusion for new people who learned git from these folks.

It is just like natural language. Once enough people use an expression, it becomes part of the accepted language, even if it is initially considered gramatically incorrect.
gitster's journal - Fun with "git grep"

> Recent Entries
> Archive
> Friends
> Profile

Links
Pages at the k.org
Gifts
貢ぎ物
The latest blog by Gitster

> Go to Top
LiveJournal.com