[RndTbl] Wrong time of night for doing regex?

Hartmut W Sager hwsager at marityme.net
Sat Jan 4 12:24:54 CST 2020


Hi Mark,

Actually, "\s" is a single space in a replacement string too, like in a
search string.  Almost all the escaped codings are quite fine in the
replacement string too, though not nearly as many are needed there than are
needed in the search string.

Thanks for your other thoughts too.  I did figure out the problem, and in
my main reply (to myself), you'll see a detailed explanation.

Hartmut W Sager - Tel +1-204-339-8331


On Sat, 4 Jan 2020 at 10:58, Mark Campbell <nitrodist at gmail.com> wrote:

> I don't think you can use \s in the replacement regex as it has no special
> meaning there. In my local testing with perl, it seems to treat it as a
> literal escape for the letter s. What tool are you using to run the regex?
>
> Substitute in a space, seems to work as expected:
>
> 2020-01-04 10:45:30 ~ TOR-M001 %: ccat test | perl -pe
> 's/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s\,]+/\1
> 0\2 /'
> From AncientBBS1 Thu  Jan 07 1986  20:50:00
> 2020-01-04 10:45:35 ~ TOR-M001 %: ccat test
> From AncientBBS1 Thu  Jan  7, 1986  20:50:00
>
> What might be easier (and more readable) is if each line has a fixed
> length from the beginning, you can match perhaps a little more clearly by
> doing something like s/^(.{23}) (\d),/\1 0\2/ if I'm understanding what you
> want to do (prepend 0s to dates and remove the comma).
>
>
> On Sat, Jan 4, 2020 at 10:27 AM Hartmut W Sager <hwsager at marityme.net>
> wrote:
>
>> This might be the wrong time of night for doing regex (i.e., my mistake),
>> or my trusty Vedit text editor has a bug in its regex implementation.
>>
>> Original search string: ^(From
>> AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s\,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s\,]+(19[0-9][0-9])[\s\,]+([0-9][0-9]\:[0-9][0-9]\:[0-9][0-9])\s*$
>> Replacement string: <Nah, skip it>
>>
>> The above search string gives a syntax error.  I am a bit suspicious of
>> the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and
>> proceeded to stepwise simplification to narrow it down.  I finally got down
>> to:
>>
>> Search string:
>> (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s\,]+
>> Replacement string: \1\s0\2\s
>>
>> The new search works fine (as did some of the previous stepwise
>> simplified ones), but the replacements are baffling me.
>> The line
>> From AncientBBS1 Thu  Jan  2, 1986  20:50:00
>> gets changed to
>> From AncientBBS1 Thu   02 1986  20:50:00
>>
>> I.e., the variable \1 seems to get lost.  In my previous stepwise
>> simplified cases, multiple variables got lost when the search worked at all.
>>
>> Why am I doing this?  I need to massage some old BBS messages into the
>> retarded mbox format, whose date format (on the "From " line) of "Tue Nov
>> 05 19:02:00 1985" is particularly illogical.  Be that as it may, The two
>> sources of these messages I am processing had further sloppiness in their
>> dates, done by some ancient BBS bozos.  I did successfully fix a lot of
>> that already with regex.
>>
>> Hartmut W Sager - Tel +1-204-339-8331
>>
>> _______________________________________________
>> Roundtable mailing list
>> Roundtable at muug.ca
>> https://muug.ca/mailman/listinfo/roundtable
>>
> _______________________________________________
> Roundtable mailing list
> Roundtable at muug.ca
> https://muug.ca/mailman/listinfo/roundtable
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://muug.ca/pipermail/roundtable/attachments/20200104/ab943022/attachment.html>


More information about the Roundtable mailing list