2009-11-08

Apache access log quick & dirty busy report (from awk to Perl).

This is my second awk snippet that I've clumsily rewritten in Perl in my attempt to improve my Perl chops. I'll refactor for more elegant Perl later.

What this script does is spits out a report of the number of requests from Apache access logs (default common LogFormat) broken down by day and hour.

Here is what a line from Apache httpd access logs looks like.

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326


Here is the awk script.

#!/bin/gawk -f

BEGIN {
        months["Jan"] = 01
        months["Feb"] = 02
        months["Mar"] = 03
        months["Apr"] = 04
        months["May"] = 05
        months["Jun"] = 06
        months["Jul"] = 07
        months["Aug"] = 08
        months["Sep"] = 09
        months["Oct"] = 10
        months["Nov"] = 11
        months["Dec"] = 12
}

{
        key = substr($4, 2, 14)

        if (key in totals) {
                totals[key]++
        }
        else {
                totals[key] = 1
        }
}

END {

        printf("| *date* | *hour* | *total* | *req/min* |\n")

        sort = "sort -k1,2 -t'|'"
        for (indx in totals) {

                hour = substr(indx, 13, 2)
                day = substr(indx, 1, 2)
                month_word = substr(indx, 4, 3)
                month = months[month_word]
                year = substr(indx, 8, 4)
                rate = totals[indx] / 60

                printf("| %d-%02d-%02d | %02d | %d | %.0f |\n", year, month, day, hour, totals[indx], rate) | sort
        }
        close(sort)
}

Even my awk code has extraneous sytax. I did not need to test the associative array key before incrementing its value. Oh, well. My dirty laundry is here for all to see.

Here is an example of what the script output looks like.

| *date* | *hour* | *total* | *req/min* |
| 2009-11-06 | 09 | 188 | 3 |
| 2009-11-06 | 10 | 9 | 0 |
| 2009-11-06 | 11 | 29 | 0 |

You'll notice that is comes out formatted as TWiki – ahemFoswiki syntax for easy pasting. The format is also close enough to CSV that importation into spreadsheets and databases is not a challenge.

Here is my Perl equivalent.

#!/usr/bin/perl -anw

use warnings;
use strict;

my %month = (
             "Jan" => 1,
             "Feb" => 2,
             "Mar" => 3,
             "Apr" => 4,
             "May" => 5,
             "Jun" => 6,
             "Jul" => 7,
             "Aug" => 8,
             "Sep" => 9,
             "Oct" => 10,
             "Nov" => 11,
             "Dec" => 12,
);

our %total;
# Each outer hash key is a date stamp.
# Each outer hash value is an hash reference.
# Each inner hash has a key for reported hours of that day.
# The value at each inner key is the tally of the corresponding access requests.
$total{substr($F[3], 1, 11)}->{substr($F[3], 13, 2)}++;

END {

    print "| *date* | *hour* | *total* | *req/min* |\n";

    for my $t (sort(keys %total)) {
        my $day = substr $t, 0, 2;
        my $month_word = substr $t, 3, 3;
        my $month = $month{$month_word};
        my $year = substr $t, 7, 4;

        for my $hour (sort(keys %{$total{$t}})) {
            my $rate = $total{$t}->{$hour} / 60;
            printf "| %d-%02d-%02d | %02d | %d | %.0f |\n", ($year, $month, $day, $hour, $total{$t}->{$hour}, $rate);
        }
    }
}

Even my rough draft is a wee bit more elegant than the awk syntax. I did not rely on an external program to performing the sorting, thanks to Perl's sort.

Also, the total data structure is a bit different. Instead of using the full string "06/Nov/2009:09"as an array index like I did in awk, I broke the data down into a hash of hashes. In the Perl version, "06/Nov/2009" was the key to the outer hash, and "09" was the key to the inner hash.  This made sorting during output a lot easier.

At first, I tried using an array instead a hash to hold the hours of the day, but this turned out to be problematic. I think the problem had something to do with "09" being treated as an illegal octal digit in the array index. Treating the "09" as a string in the hash key was just easier and more flexible.

2 comments:

  1. You might want to look at Regexp-Log-Common [1] for some further hints :)

    [1] http://search.cpan.org/dist/Regexp-Log-Common/lib/Regexp/Log/Common.pm

    ReplyDelete
  2. Quick tip for shortening up the %month declaration:

    my %month;
    @month{
    qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
    } = 1 .. 12;

    ReplyDelete