Wednesday, February 07, 2007

 

Logs, Timestamps, and Python

I spent a day helping people analyze log files from a set of systems. Something Really Bad had happened, and the world was going to come to an end if we didn't figure out exactly what had gone wrong, and how to fix it.

The world didn't come to an end. You can all thank us.

To analyze the log files, we used good-old grep. For those of you who aren't IT people: grep is the name of a program that can search text files for particular words, phrases, or patterns. For example, if you have a few dozen log files, containing millions of lines of data, and you just want to see all the lines that contain the word "error", you can type the command "grep error *.log". (Don't bother trying this on your Windows machine. grep is only standard on UNIX-ish operating systems.)

Unfortunately, the format of the log files in question is such that timestamps don't appear on lines containing log data. For example, it looks something like this:

02-03 15:23:16
Send CVB to device 432
Receive BVC from device 432
Send CVB to device 567
02-03 15:23:17
Receive BVC from device 567
Send UJK to device 12
Send UJK to device 23
02-03 15:23:18
02-03 15:23:19
02-03 15:23:20
Receive KJU from device 23
02-03 15:23:21
Receive KJU from device 12

So, if you are interested in seeing all the activity for device 12, you can try "grep 'device 12' *.log", but the problem is that you will not see any of the timestamp information; you'll just get this:

Send UJK to device 12
Receive KJU from device 12

You don't see what times these events happened. In this case, timing was a major issue: the client was claiming that things took too long, but this simple grep wasn't helping us figure out how long things were taking. "Grrr," the developers said, and it was decided that the next version of the software would put the timestamp on each line of the log files.

But the next version wouldn't help with the existing log files we had to analyze, so I decided I'd write a quick little script to re-format the logs. This is the kind of thing you can do in less than a minute if Perl, Python, Awk, or Ruby is something you use every day. Unfortunately, I know these languages just well enough to know that it should be easy, but I use them rarely enough that it usually takes me 15-20 minutes to re-familiarize myself with the syntax and libraries enough to write that one-minute script.

After twenty minutes of Googling and testing, I had my script written in Python. I'm not displaying it here to show off my amazing Python skills; I just want to be able to find it when I need to do something similar in the future:

#!/usr/bin/python

import sys
import re

# Timestamp regular expression (MM-DD HH:MM:SS)
tsre = re.compile('\d\d-\d\d \d\d:\d\d:\d\d')

def process_file(filename):
  currentts = ''
  f = file(filename)
  for line in f:
    match = tsre.match(line)
    if match:
      currentts = match.group()
    else:
      print currentts + '|' + line,
  f.close()

for filename in sys.argv[1:]:
  process_file(filename)

So, after running this script on the logs, we get output like this:

02-03 15:23:17|Send UJK to device 12
02-03 15:23:20|Receive KJU from device 12

Then we can devote the rest of the day to figuring out why it took three seconds to get a response from device 12.

(See The Perl Adventure for a similar exercise.)


Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?