2009-10-27

My set complement Perl script.

Currently, my awk-vs-perl knowledge slider is about like this.

awk <--O-------> Perl

I am trying to move it more like this.

awk <-----O---> Perl

To that end, I'll start by translating one of my handy-dandy mini-awk scripts to perl. Have you ever needed to find the elements in one list that were not in another list?

For example, suppose you have a list of all files in a directory. You also have a list of all files in a package, and some of those files in the package list are the ones in the directory list. Now you want to find out what files in the directory are not part of the package.

Here is the awk code:

#!/usr/bin/awk -f

# ** Purpose **
#
# Exclude items in lists A, B, ... from list Z.
#
# ** Usage **
#
# ./exclude_list.awk -v x="listA listB" listZ
#
# The list of files assigned to x is the list of files which contain the lists
# of items to exclude.  The last argument (the file awk processes line by
# line) is the list of items to exclude things from.

BEGIN {
    split(x, ex);
    for (f in ex) {
        while ((getline line < ex[f]) > 0) {
            exclude[line] = 1;
        }
    close(ex[f]);
    }
}

{
    if (exclude[$0] != 1) {
        print;
    }
}

Here is what I just wrote up in Perl.

#!/usr/bin/perl -w

my $set1 = shift;
my %set2=();

while (<>) {
    $set2{$_}++;
}

open my $fh, '<', $set1 or die "Can't open $set1";
while (<$fh>) {
   print $_ if ! exists $set2{$_};
}
close $fh;

To make the source code length comparison fair, I'll have to add POD to the Perl source.

Later on, I'll seek to expand the functionality a little bit, and also see if any Perl built-ins can do this job quicker than I've done here.

3 comments:

  1. You could do it a lot short in Perl, and probably shorter than my version:

    #!/usr/bin/perl -w
    use File::Slurp 'slurp';
    my $set1 = shift;
    my %set2 = map { $_ => 1 } <>;
    print (grep { ! exists $set2{$_} } slurp($set1));

    ReplyDelete
  2. or:


    open F,$ARGV[0] or die "$! while open $ARGV[0]";
    my %set = map { $_ => 1 } <>;
    for ( <F> ) { print unless $set{$_} };

    hth
    marc

    ReplyDelete
  3. Thanks a lot folks. Great stuff.

    I was fairly certain that I'd be using 'map' to shorten the code and/or speed things up, but I didn't know the exact syntax. I was going to read the details over on perldoc later. However, I would have never guessed that the _list_ argument of 'map' could be '<>'.

    ReplyDelete