Needle in a haystack – Finding a text string within a bunch of files using ‘grep’


In general the ‘grep’ command is an incredibly useful command-line tool, but this is a particularly amazing bit of grep magic! Lets say you’re working on a website for a client who is using a template system, or framework, that you’re not familiar with and you need to change something like a label on a form. This should be a five minute job, at most. So you start opening files, searching for the currently used string of text, take a coffee break, search some more… and 15-20 minutes later you’re still baffled. Template systems and frameworks (especially frameworks!) can be a bitch if you’re not familiar with their methodology, or the MVC concept in general. There is an easier way find the needle in this heystack! Just SSH in to the server and navigate to the root of the template folder (or even the document root of the site, if in doubt) and run this grep command:

grep -H -r -i -n "some string" public_html/templates/

grep then returns something like this:

public_html/templates/obscure_file.php:13:<label>some strings are tough to find</label>

This may look a little odd, but first grep tells you what file it found the string in (public_html/templates/obscure_file.php) and then the line number (13), followed by the complete line of text the string was found in.

Also worth mentioning; if the string is found in multiple files, grep will print a line like that for each. When this happens the output can get kind of crazy if there are a lot of long lines involved. In cases like this I like to pipe grep’s output to the ‘cut’ command like so:

grep -H -r -i -n "some string" public_html/templates/ | cut -c 1-80

Where 80 would be some number which is less than the character width of your terminal, thus preventing the long lines from wrapping down to a new line and making the output a complete cluster f@$k .

*An explanation of command options used in these examples:

'grep'
   -H, --with-filename    print the file name for each match
   -r, --recursive        search within sub-directories
   -i, --ignore-case      ignore upper/lower case distinctions
   -n, --line-number      print line number with output
'cut'
   -c, --characters=LIST  select only these characters

 

One last bit of trickery worth metioning; if you wish to search for multiple strings in an ‘OR’ scenario you would do something like this:

cat /var/log/syslog | grep -i 'fail\|err'

What the above line does is it searches through the system log (syslog) file for either the string ‘fail’, or ‘err’ (as in ‘error’). Occasionally programs will just abbreviate the word ‘error’ as ‘err‘, so this way we catch both possibilities. The key here is the ‘\|’ part. That’s a backslash followed by a pipe character, and don’t forget to include the >’single quotes'< around the search terms, otherwise this wont work! And although I am simply searching through a single file here to keep my example simple (and show a different usage pattern), one could easily apply this to the above scenarios as well!

And remember, even though my example scenarios are primarily dealing with modifying template systems and frameworks that’s a purely a hypothetical scenario! Obviously, this technique could easily apply to a any number of commonly encountered problems a developer, system administrator or even an end-user might encounter on a day-to-day basis (logs, configs, emails, etc.). ‘grep’ is a powerful tool, and this little usage pattern is something worth remembering, or at least bookmarking this page so you can get back here in a flash when the need does arise, and it will! If you use Linux, or any *nix, grep should definitely be in your tool-box!

By the way, there are even grep tools available for MS Windows and OSX, so this might even be helpful non-Linux Losers ;)

* I try to keep the command option explanations as close as possible to standard ‘–help’ output, although slight changes have been made for clarity’s sake.
One Comment

Leave a Reply