Regular Expression Matching |
Back to the Language Shootout Back to Doug's Homepage |
|
[Note: Values have been normalized to fall in the range of 0-10 for aesthetic reasons. Original value ranges are included on the X-axis. Click here for more detailed data and graphs. [Results last updated: Sun Sep 9 13:20:25 2001 CDT] |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Please Note: this test is due for an overhaul, because of the variety of solutions for this test that aren't really using regular expressions. I'll probably split this into 2 tests, one that does some kind of parsing/pattern matching, and another that calls for NFA regular expressions with capture buffers.
For this test, each program should be implemented in the same way.
The purpose of this test is to extract strings that look like phone numbers from a file and print them in a standard format. For the sake of this test, we aren't interested in I/O performance, so we read the file into an array before starting, then extract the phone numbers from the array N times, and on the last iteration, we print the extracted numbers in the standard format. See the detail page for different values of N.
Each program can assume that no line will exceed 128 characters (including newline).
The telephone number pattern we are trying to match can be described this way:
For the C program I wasn't going to implement my own regular expressions from scratch, I use the Perl Compatible Regular Expressions (PCRE) library.
The C++ program uses Bill Lear's PCRE library for C++.
Markus Mottl helped me use his PCRE library for Ocaml.
The Java program uses a 3rd party, mostly-Perl5-compatible regexp library, called ORO. Apparently this package, once available from oroinc.com (defunct), is now maintained by the Apache Jakarta project.
Bigloo's regular grammar facility is very powerful. I wish all languages offered this feature. I think it shows that while it's nice to be able to do complex pattern matching, it is really more important how easily you can do something with the matched data.
I have been a little sloppy in this test specification, and some languages don't implement the same exact pattern matching. I'll try to fix this soon by adding more test strings to the input to take care of some of the sloppy cases.
Erlang's regular expression support is minimal, it doesn't support captures (backreferences), and strings are represented as linked lists, so it doesn't do very well on this test.
|
Back to the Language Shootout Back to Doug's Homepage |
Send me comments or suggestions. |