[RndTbl] html pretty code
Trevor Cordes
trevor at tecnopolis.ca
Thu Nov 18 09:12:38 CST 2010
Adam thought someone might find this useful, so here it is. It's a perl
program I wrote to "pretty" html for easy readability/debugging by
applying indenting. The neat thing is, it's entirely contained in 1
regular expression; no loops! Well, except for the weird nl while loop.
This ain't your father's regex!
Yes, there's a zillion html pretty programs out there but none did what I
wanted in a few ways:
1. Just fire & forget, no 200 options to worry about.
2. Works on random html snippets not just whole pages, so you can view
source from the web and just paste a few lines from the middle of any web
page and it will pretty it up.
3. Challenge to do something like this only using regex!
There may be a few tags it doesn't catch yet (just add them to the main
tag list), but that won't hurt the output very much.
Run like:
html-pretty < html-file | less
html-pretty
=======================
#!/usr/bin/perl -w
#
# Copyright 2010 Trevor E Cordes, Tecnopolis Enterprises
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
$l=-1; # indentation level
$s=join('',<>);
$s=~s#>\s+#>#g; # rm ws after tags
$s=~s#\s+<#<#g; # rm ws before tags
while ($s=~s#<([^>]*)\n([^>]*)>#<$1$2>#g) { 1; }; # rm nl in tags
$s=~s#>#>\n#g; # put newlines after all tags
$s=~s!(?:<(/)?(?:(div|span|head|b|a|i|u|ul|ol|li|tr|td|th|form|table|p|style|script|body|html|head|title)\b)?([^>]*>\n)|([^<]+))!
$4
? # non-tag text
(
(' 'x($l+1)).$4."\n"
)
:
(
(
$2
? # a recognized triggers-indent tag
(
$1
? # tag starts with /, decrease indent
(
$l--,($l<-1 and $l=-1) , ((' 'x($l+1)).'<'.(defined($1)?$1:'').$2.$3)
)
: # tag is opening tag, increase indent
(
$l++ , ((' 'x$l).'<'.(defined($1)?$1:'').$2.$3)
)
)
: # a non-triggers-indent tag
( (' 'x($l+1)).'<'.(defined($1)?$1:'').$3 )
)
)
!gemx; # indent
print $s;
More information about the Roundtable
mailing list