Regex Best Practices

Posted on : 11-12-2009 | By : Tony Stubblebine

Tags: ,

0

My book, Regular Expression Pocket Reference, has sold well over 30k copies and I’m constantly surprised how often I talk to someone who claims to have a copy of the book on their desk. The thing about that book, though, is that I’m not nearly smart enough from a nuts/bolts or math angle to be qualified to write it. I muddled through, and with the help of amazing tech reviewers and a lot more work than it should have taken, the end result is a pretty good book.

However, by virtue of not starting out as a regex expert, I have a lot more empathy for the every-day coder who just wants to get these suckers to work. So, once the book was published I started working on tips for every day use.

Here’s one of my favorites, a presentation on Regular Expression Best Practices. I think I gave this at a Perl Mongers meeting a few years ago. Excuse the Perl code, all of the ideas are universal.

The basic premise of the presentation is that regular expressions are inherently difficult to write, maintain, and get right, but that we could do much better if we applied a few simple (best) practices.

Here are the inherent reasons:
A.) They have a crummy, terse syntax.
B.) We (normal programmers) don’t use them enough to become proficient.
C.) They are applied to some dirty, hard-to-verify (that’s why we’re writing the regex) data.

Given that, we (normal programmers) then choose to ignore the normal practices of programming, practices that we use reliably with expressive clear languages that we are experts in. The presentation identifies those normal practices and then calls them regex best practices: use white space, code structure, and code verification/testing. Plus, the presentation has one of my favorite security gotchas, a favorite quote, and some common regex mistakes.

Slashdot Review for Regular Expression Pocket Reference

Posted on : 26-03-2008 | By : Tony Stubblebine

Tags: , ,

0


Michael J. Ross gave the second edition of Regular Expression Pocket Reference a score of 9/10 in his Slashdot review. He was particularly impressed by the lack of errors.

As of this writing, there are no unconfirmed errata (those submitted by readers but not yet checked by the author to see whether they are valid), and no confirmed ones, either. In fact, in my review of the first edition, published in 2004, it was noted that there were no unconfirmed errata, despite the book being out for some time prior to that review. The most likely explanation is that the author — in addition to any technical reviewers — did a thorough job of checking all of the regular expressions in the book, along with the sample code that make use of them. These efforts have paid off with the apparent absence of any errors in this new edition — something unseen in any other technical book with which I am familiar.

I’m sure that the book isn’t actually error free, but the fact that it can masquerade as so is a tribute to the tech reviewers, Jeffrey Friedl, Philip Hazel, Steve Friedl, Ola Bini, Ian Darwin, Zak Greant, Ron Hitchens, A.M. Kuchling, Tim Allwine, Schuyler Erle, David Lents, Evan Henshaw-Plath, Rich Bowen, Eric Eisenhart, and Brad Merrill, and to my editors Andy Oram, Nat Torkington, and Linda Mui. That’s a lot of people for such a small book but the draft I turned in warranted them. Thank you.

My goals for the second edition were to increase coverage for things that I used (it turns out that one of the best reasons to write a book is so you can look things up later) and to add content for system administrators (who, based on feedback, seemed like the biggest users of the book). I’m a ruby developer now, so this edition has a ruby chapter, plus I added an Apache chapter and a cookbook of common regular expressions for the system administrators.

People often ask me why I covered so many implementations and the answer is because as a web developer I used regular expressions in so many places: ruby/perl, javascript, shell, vim, and apache. I bet system administrators are the same way.

Make sure to buy a few copies from Amazon.

Regular Expression Pocket Reference, 2nd Edition

Posted on : 23-07-2007 | By : Tony Stubblebine

Tags: ,

0

The second edition of my Regular Expression Pocket Reference is now available on Amazon.

I added chapters for Ruby, Apache (including Rewrite Rules), and a cookbook for common recipes.

Nice Review of Regular Expression Pocket Reference

Posted on : 24-06-2006 | By : Tony Stubblebine

Tags: ,

0

Brian Turner wrote a nice review of my book, Regular Expression Pocket Reference, for Free Software Magazine. As part of the review process he describes the relevance to free software. Here’s how my book stands up:

By mastering regular expressions in any implementation, you have prepared yourself to use free software to the best advantage. Three of the P’s referred to in the LAMP acronym are represented: Perl, PHP, and Python. With this reference at your side, you can demonstrate the usefulness of free software by finding answers and solving problems quickly using regular expressions. Regular expressions are not unique to free software, but you will find them fully implemented in many free software tools. While not specifically promotional of proprietary software, .NET and C# are covered.

Guess I should have left out the .NET part.

Graphing Regular Expressions

Posted on : 26-04-2006 | By : Tony Stubblebine

Tags:

0

Cool article on graphing regular expressions over at Unix Review.

Habits for Successful Regular Expressions

Posted on : 17-06-2005 | By : Tony Stubblebine

Tags:

0

O’Reilly just announced Damian Conway’s Perl Best Practices book. I tech reviewed the regex chapter – it’s full of great advice.

If you find that’s too much advice, let me recommend my own Five Habits for Successful Regular Expressions.

If that’s still too much advice let me leave you with just two hints: use extended whitespace and test. Nobody can read regular expressions so the least you can do is put in some line breaks and comments. Also you’re going to be tweaking your regex. Everybody does this – regex is the ultimate for code and fix development. What you need is a list of test cases that you can run after each tweak. I recommended a quick and dirty Perl test harness in the article above.

Perl Best Practices

Posted on : 09-03-2005 | By : Tony Stubblebine

Tags:

0

Just saw PDFs for Damien Conway’s “Perl Best Practices.” I tech reviewed the regex chapter and was thrilled to see the extended white space option list as best practice #1.

It’s really amazing to me that people that care about indentation and commenting when writing very readable languages like Python or Java can stand to see a completely unreadable language like regular expressions written without any whitespace or any comments. Shouldn’t this offend the sensibilities of just about any programmer?