Monday, July 14, 2008

http://eriwen.com/tools/grep-is-a-beautiful-tool/

grep is a beautiful tool
July 13th, 2008 | Category: Productivity, Tools

Global Regular Expression Print is a staple of every command-line user’s toolbox. As with find, it derives a lot of power from being combined with other tools and can increase your productivity significantly.

Following is a simple tutorial that will help you realize the power of this simple and most useful command. If you are on Windows and haven’t already, download and install Cygwin. If you are also new to regular expressions (regex), here is a great regular expressions reference to get you started.
Tutorial

Suppose we want to search for duplicate functions in all of our JavaScript files. Let’s start basic and work up to it. This technique can be used to search for a TON of duplicate items like:

* Duplicate HTML IDs
* Check how many times a CSS class is used
* Duplicate java classes
* many, many more…

1.
# Search JS files in this directory for "function"
2.
grep function *.js

# Search JS files in this directory for "function"
grep function *.js

The above command will print the lines containing "function" in all JavaScript files in the current directory (NOT subdirectories). Printing out line contents would be much more helpful if we knew what files they come from and their line numbers:

1.
# Print filenames, line #s, and lines that start with "(white space)function"
2.
grep -EHn "^\s*(function \w+|\w+ \= function)" *.js

# Print filenames, line #s, and lines that start with "(white space)function"
grep -EHn "^\s*(function \w+|\w+ \= function)" *.js

Depending on how you format your JavaScript files, something like this will omit comments, anonymous functions, and also words like "functionality" giving you better results.

1.
# Print a list of: function and sort it
2.
grep -Eho "^\s*function \w+" *.js | sort

# Print a list of: function and sort it
grep -Eho "^\s*function \w+" *.js | sort

-o prints only the part that matches the regular expression. -E options gives me extended regex and -h suppresses printing of the file name. I am then piping to sort which just sorts the output so it a list of function . If you don’t have a lot of files/functions to go through, you can just scan the list and then note the duplicate function names you see. Let’s go a step further for those that DO have a big list:

1.
# Print only duplicate function names
2.
grep -hEo "^\s*function \w+" *.js | sort | uniq -d

# Print only duplicate function names
grep -hEo "^\s*function \w+" *.js | sort | uniq -d

There we go! That will list only the duplcated functions. I know that we can expand this with awk or other stuff and get the file names and line numbers of the duplicates, but I don’t want to explaining the details of awk ;). I actually had it in this article and then removed it so leave a comment or contact me if you want the code for that.
Other Examples

1.
# Count the number of functions in all JS files
2.
grep -c function *.js
3.

4.
# Print lines that DO NOT have "function"
5.
grep -v function *.js
6.

7.
# List processes that match "pidgin" (non-Windows)
8.
ps -ef | grep pidgin

# Count the number of functions in all JS files
grep -c function *.js

# Print lines that DO NOT have "function"
grep -v function *.js

# List processes that match "pidgin" (non-Windows)
ps -ef | grep pidgin

Conclusion

grep is one of the most used command-line tools, often piped to for filtering output. Understanding it is essential to increasing productivity on the command-line. There is so much more to grep than what I’ve shown here, and it would be cool to see your best uses in the comments!

No comments:

Post a Comment