Wednesday, June 15, 2005


The Perl Adventure

After reading just chapter 1 and part of chapter 2 of Programming Perl (also known as The Camel Book) I re-learned enough Perl to do what I needed to do.

What follows is a description of a simple problem I solved using Perl. It may be useful for somebody wondering what Perl is or how to use it.

Here's the problem I needed to solve: I have dozens of megabytes of log files, which are plain old ASCII text files with contents like this:

20050612 06:14:43 Product Loaded, id '686', pack '11298', dispenser '17'
20050612 07:14:44 Daily sales report printed
20050612 07:14:54 Weekly sales report printed
20050612 11:23:07 Product Sold, reg '3', id '686', pack '11298', dispenser '17', status 'Success'
20050612 12:00:00 System reset
20050612 12:45:05 Product Sold, reg '3', id '602', pack '14146', dispenser '20', status 'Success'
20050612 13:16:10 Product Sold, reg '1', id '604', pack '690', dispenser '1', status 'Success'
20050612 13:25:43 Dispenser communication error, dispenser '12'
20050612 21:16:06 Product Sold, reg '3', id '686', pack '11298', dispenser '17', status 'Success'
20050612 22:14:03 Technician login

(By the way, I apologize that my blog layout doesn't show data and code examples very well. I'll have to work on that someday.)

What I needed to do was get a list of all the "Product Sold" lines from all the log files and transform them into a tabular format that would be easy to load into a spreadsheet or database for further analysis. For example, given the above example log I'd want to generate this:

2005/06/12 11:23:07   3   686   11298   17   Success
2005/06/12 12:45:05   3   602   14146   20   Success
2005/06/12 13:16:10   1   604   690     1    Success
2005/06/12 21:16:06   3   686   11298   17   Success

We could have just asked some unpaid interns to go through all the log files and manually copy the data, but then they wouldn't have time to make our coffee or retrieve our paper airplanes from the rafters. An automated solution is preferable. This is not a particularly difficult problem for a programmer to solve, but Perl makes this a lot easier than most programming languages. Here is the complete Perl program that does it:

while (<>) {
  if (/(\d{4})(\d{2})(\d{2}) (/S+) Product Sold, reg '(\d+)', id '(\d+)', pack '(\d+)', dispenser '(\d+)', status '([^']+)'/){
    print "$1/$2/$3 $4\t$5\t$6\t$7\t$8\t$9\n";

OK, so that's probably gibberish to most people, but that gives an idea of what you can accomplish with a few lines of Perl. Here are the details of what the lines mean:

Many Perl programs are a lot more complicated than this one. I don't use Perl for complicated programs, but I think it's great for little tasks like this. Write the pattern, write the output format, then fill in a couple of keywords and punctuation marks, and you're done. There are no classes to define, no variables to declare, and no external libraries to import, If you try to port this application to another programming language, you'll probably need to write a lot more code to accomplish the same thing, unless you are using a very specialized text-processing language.

My many megabytes of data have been massaged into the form I need for more-detailed analysis. That's pretty cool, but now I have to figure out exactly what to do with all that data.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?