[RndTbl] html pretty code

Trevor Cordes trevor at tecnopolis.ca
Thu Nov 18 09:12:38 CST 2010


Adam thought someone might find this useful, so here it is.  It's a perl 
program I wrote to "pretty" html for easy readability/debugging by 
applying indenting.  The neat thing is, it's entirely contained in 1 
regular expression; no loops!  Well, except for the weird nl while loop.  
This ain't your father's regex!

Yes, there's a zillion html pretty programs out there but none did what I 
wanted in a few ways:

1. Just fire & forget, no 200 options to worry about.

2. Works on random html snippets not just whole pages, so you can view 
source from the web and just paste a few lines from the middle of any web 
page and it will pretty it up.

3. Challenge to do something like this only using regex!

There may be a few tags it doesn't catch yet (just add them to the main 
tag list), but that won't hurt the output very much.

Run like:
html-pretty < html-file | less


html-pretty
=======================
#!/usr/bin/perl -w
#
# Copyright 2010 Trevor	E Cordes, Tecnopolis Enterprises
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

$l=-1; # indentation level

$s=join('',<>);

$s=~s#>\s+#>#g; # rm ws after tags
$s=~s#\s+<#<#g; # rm ws before tags
while ($s=~s#<([^>]*)\n([^>]*)>#<$1$2>#g) { 1; };	# rm nl in tags

$s=~s#>#>\n#g;  # put newlines after all tags
$s=~s!(?:<(/)?(?:(div|span|head|b|a|i|u|ul|ol|li|tr|td|th|form|table|p|style|script|body|html|head|title)\b)?([^>]*>\n)|([^<]+))!
        $4
        ?	# non-tag text
         (
          (' 'x($l+1)).$4."\n"
         )
        :
         (
          (
           $2
           ?    # a recognized triggers-indent tag
            (
             $1
             ?  # tag starts with /, decrease indent
              (
               $l--,($l<-1 and $l=-1) , ((' 'x($l+1)).'<'.(defined($1)?$1:'').$2.$3)
              )
             :  # tag is opening tag, increase indent
              (
               $l++ , ((' 'x$l).'<'.(defined($1)?$1:'').$2.$3)
              )
            )
           :    # a non-triggers-indent tag
            ( (' 'x($l+1)).'<'.(defined($1)?$1:'').$3 )
          )
         )
        !gemx; # indent

print $s;



More information about the Roundtable mailing list