[RndTbl] Wrong time of night for doing regex?

Trevor Cordes trevor at tecnopolis.ca
Sun Jan 5 04:10:01 CST 2020


On 2020-01-04 Hartmut W Sager wrote:
> 
> It turns out, at least in this regex implementation, that a pair of
> enclosing parentheses can only serve one of two purposes, not both,
> at the same time.  Those two purposes are:
> 
> 1.  Mark a group that can then be referred to by a variable like "\3"
> in the replacement string.
> 2.  Enclose a group with alternation (regex terminology) containing
> several alternatives separated by the "or" operator "|".

That's just plain evil.  Nasty!

The de facto standard is (obviously) PCRE and your program (you said
vi?) is obviously not PCRE.  I'd be shocked if vi doesn't offer you
some way to replace the regex engine?  Or at least out-source the regex
work to a filter?  Not sure, I don't use vi.

In PCRE each () serves both purposes, unless you use (?:) in which case
you only get purpose #2 (and save CPU cycles).

The others are correct, using \s in the right hand side is not PCRE.
In PCRE \s means "(most) any whitespace" in the regex, and will be just
"s" in the substitution.

PCRE = One Ring^H^H^H^HRegex to rule them all.  Most programs with
regex use the PCRE library now, or give the option, and if you always
use -P with grep you'll basically never have to touch another
substandard regex engine again! :-)  All the perl-haters might find it
amusing that they use "perl" on a daily basis because of PCRE :-)
(Well, sort of.)

> I am a bit suspicious of the ([0-9][0-9]|\s[0-9]) group re operator
> precedence of the "or"

In most (all?) regex engines (especially PCRE; but pretty sure all!)
the rule is "first, most".  So the order you put your alternates may
matter.  In the above case, order probably doesn't matter because
things surrounding that bit must be space/comma.  Order matters in
things where surrounding bits can match the same bits, and things like
eating escaped chars, like escaped double-quotes in CSVs:
/"(\\"|[^"])+"/ works, but
/"([^"]|\\")+"/ doesn't.

As always, the O'Reilly regex book is an amazing way to fully
understand exactly what is going on and will really open a lot of eyes!!


More information about the Roundtable mailing list