Wednesday, 7 October 2009

Regular Expressions in Erlang

My favourite general-purpose language is still Perl. It's not that it's the best at everything, but that it's good enough for just about anything (and better than most).

Probably as a side-effect of what many would consider this misguided affection, one of the first things I look at in a new language is its regular expression support. Mostly, this is a no-brainer: Perl 5 is the de-facto standard, and its regex engine is available as a library to link into your favourite C-binding language.

So I was disappointed initially when looking for advice on regular expressions in Erlang. The first hit on Google (at the time of this writing) is here, and indicates the regex module. Unfortunately, that module is really, really limited compared to the mind-bending flexibility of the Perl 5 engine.

I was disappointed. I'm going to re-write my CTCS clone in Erlang, and it makes quite a lot of use of regular expressions to extract information from the various messages ctorrent sends throughout its lifetime. In this case, it was a bit of a show-stopper: there's absolutely no point in doing the re-write if it's going to be more difficult to create the new version than improve the old.

But then, salvation! The re module, an (almost) mapping onto PCRE!

Suddenly, extracting groups from a string was easy:


regex_example() ->
Name = "Danny Woods",
{ match, [ Forename, Surname ] } =
re:run(Name, "(\\w+)\\s+(\\w+)", [{capture,all_but_first,list}]),
{ Forename, Surname }.


The escaped backslashes are ugly, but the direct capturing of groups as a list is pretty sweet.

No comments:

Post a Comment