Sunday, 12 February 2012

Shell Scripting: Counting Occurences of a Character

I recently found myself needing to know the occurrences of a letter in a line from a shell script. I was working with a delimited file, and needed to know how many columns were in a given line (it would be consistent within the same file, but could differ between files, depending upon the version of the code that produced it).

Surprisingly, it took a bit more digging than I thought would be needed for such a simple task. Counting lines is easy, but characters within a line? It's not a difficult thing to do in Perl or awk, but launching their respective interpreters seemed a bit heavyweight.

tr to the rescue. This under-used command line utility translates from one character set to another (a set can just be a single character). It can also delete characters from its input and, using the -c (complement) switch, can work on a set that's the inverse of the one specified. Tying those loose threads together, you end up with this to pick out occurrences of the letter 'e':


danny@khisanth ~ [2] % echo one two three four five | tr -cd 'e'
eeee%


That % is my shell indicating that the line didn't terminate with a newline, so it was just the 'e's. Once you have those, wc -c can do the rest:


danny@khisanth ~ [3] % echo one two three four five | tr -cd e | wc -c
4


Annoyingly, wc on some Unixes indents its output, so if you just want the number itself, you might need to play with your shell's string manipulation functions to get something neater. e.g., in zsh:


danny@khisanth ~ [4] % echo ${$(echo one two three four five | tr -cd e | wc -c)// /}
4

No comments:

Post a Comment