2011-03-04

Perl throw-away: extracting info from giant list of file names.

I had a giant list of files that followed a pattern similar to this:
me@mybox:~/sandbox
$ ls -rlt
total 0
-rw-r--r-- 1 root root 0 Mar  4 18:10 bob.1234.x
-rw-r--r-- 1 root root 0 Mar  4 18:10 bob.9870.x
-rw-r--r-- 1 root root 0 Mar  4 18:10 bob.3245.x

Except there were thousands of these files, not just three.

I need to get a sorted list of the numbers in each file name. I did this:
me@mybox:~/sandbox
$ perl -e '@f = glob("bob*"); @s = map {substr($_,4,4)} @f; for $n (sort(@s)) {print "$n\n";}'
1234
3245
9870

Yes, I could have done this in the shell.
me@mybox:~/sandbox
$ ls -1 bob* | cut -c5-8 | sort
1234
3245
9870

Using join instead of for now looks pretty because of Anonymous comment below. Thanks, buddy.
jbm@Foucault:~/tmp
$ ls bob*
bob.0012.x bob.7123.x bob.9865.x
jbm@Foucault:~/tmp
$ perl -le '@f = glob("bob*"); @s = map {substr($_,4,4)} @f; print join("\n",sort(@s));'
0012
7123
9865

UPDATE

poisonbit and Yanick have more flexible solutions in the comments below because they don't rely on my brain dead method of counting the exact columns in which the digits appear in the file name.

Of course, poisonbit's solution requires cool bash features, and Yanick's requires newer Perl builds. Using their better answer in crustier Perl or Korn shell is left as an exercise for the reader.

4 comments:

  1. $ touch bob.{1234,9870,3245}.x
    $ for file in bob*; do printf '%s\n' "${file//[^0-9]}"; done
    1234
    3245
    9870

    ReplyDelete
  2. Another variant for Perl 5.10.0 and up:

    $ perl -E'say for sort { $a <=> $b } map { /(\d+)/ } '

    ReplyDelete
  3. Ooops, the comment ate the last part of the code, which was <bob*>

    ReplyDelete
  4. "The problem with use join instead of a for loop is that last missing newline."

    Use "perl -le" to get newlines handled properly.

    I often do this to add a missing newline to the end of my copied text region:

    xclip -o | perl -ple 1 > out.txt

    ReplyDelete