[RndTbl] Command line challenge: trim garbage from start and end of a file.
trevor at tecnopolis.ca
Sat Dec 25 14:50:10 CST 2010
On 2010-11-10 Sean Walberg wrote:
> Adam and I were having an offline discussion, and some testing shows
> that AWK outperforms SED by a slight margin:
I know it's an old thread... but I had to have a go at you awk/sed
My solution is perl regex:
perl -e '$/=undef;open I,$ARGV;$_=<I>;/(?:^|\n)(output start\n.*\noutput end\n)/s and print $1' infile
It's not a filter (requires a filename) but could probably easily be
made into one.
I recall reading in perl books that perl regex was faster than sed/awk
and the above takes advantage of the slurp-whole-file that $/ allows.
On my computer the awk/sed/perl times compare like so:
time sed -n '/output start/,/output end/p' < infile > /dev/null
0.264+0.002c 0:00.26s 100.0% 0+0<774k | 1+39cs 0+259pg 0sw 0sg
time awk '/output start/,/output end/' < infile > /dev/null
0.183+0.003c 0:00.18s 100.0% 0+0<774k | 1+28cs 0+298pg 0sw 0sg
time perl -e '$/=undef;open I,$ARGV;$_=<I>;/(?:^|\n)(output start\n.*\noutput end\n)/s and print $1' infile > /dev/null
0.032+0.017c 0:00.05s 80.0% 0+0<8168k | 1+19cs 0+4196pg 0sw 0sg
Wow! But yikes, look at the mem usage. Good thing RAM is plentiful
these days. In 1980 sed would be the better bet for sure.
> [sean at bob tmp]$ W=/usr/share/dict/words
> [sean at bob tmp]$ (tail -1000 $W; echo output start; cat $W; echo
> output end; head -1000 $W) > infile
> [sean at bob tmp]$ wc -l infile
> 481831 infile
More information about the Roundtable