[RndTbl] sshd: Corrupted MAC on input.

Gilbert E. Detillieux gedetil at cs.umanitoba.ca
Tue Aug 4 09:56:59 CDT 2020


On 2020-07-31 11:42 p.m., Trevor Cordes wrote:
> On 2020-07-31 Hartmut W Sager wrote:
>> Oops, *sorry Gilbert*, I looked at this thread again, and it was
>> *your* position on checksums that I'm supporting.  Maybe I have some
>> bad/failing Chinese capacitors in my head.  :)
> 
> Your creator should have sprung for the 5c better caps! ;-)
> 
> Looks like you guys are right.  TCP only has a 16-bit checksum, and
> it's a simple sum then 1's complement over (most) of the whole packet.

That's what I thought, but it's been a while since I looked at TCP 
headers in detail.

> Some post says microsoft says (paraphrased): "Basically transmit 100MB+
> over a typical Internet connection and you are very likely to see a
> silent failure."
> 
> I don't know about that!  But, yes, even if you get 1 error through
> every 1GB TCP, that's pretty awful to contemplate.

As I said earlier, the Ethernet frame layer catches most bit errors for 
you in the typical network setup.  I think there's a 32-bit CRC there...

https://en.wikipedia.org/wiki/Frame_check_sequence

We were often seeing undetected TCP bit errors in PPP over serial 
(modem) connections (ages ago), where there was no Ethernet frame layer 
to do the heavy lifting.

But if the NIC isn't doing its work correctly (either at the Ethernet 
frame level or TCP checksum offloads), this could result in bad data 
getting up the food chain.

> I guess rsync detects/corrects for this automatically, but
> unfortunately Gilbert is seeing errors in the ssh wrapper layer, in a
> place where ssh is sensitive to errors and wants to barf instead of
> retry.  It almost would be better if ssh would just pass up junk to
> rsync at let it deal with it.

I haven't found a way to disable the MAC support in ssh, only a way to 
select protocols at both the client and server end.

> Gilbert, to confirm, your bug hits after you have transferred lots of
> data, right?  It's not giving this error right at the beginning upon
> connection, right?  Do you have any stats on approx how much data goes
> across each time before the error hits?  Is it consistent or all over
> the map?

Yes, these are on large file transfers.  Yes, they are occasional, and 
random.  But I had to restart a large (41-ish GB?) file transfer 3 times 
last week due to repeated errors.  A typical nightly backup results in 
160 GB or so to transfer, and lately, the rsync fails (somewhere along 
the way) more often than not.

Gilbert

-- 
Gilbert E. Detillieux        E-mail:  <gedetil at cs.umanitoba.ca>
Dept. of Computer Science    Web:     http://www.cs.umanitoba.ca/~gedetil/
University of Manitoba       Phone:   (204)474-8161
Winnipeg MB CANADA  R3T 2N2  Fax:     (204)474-7609


More information about the Roundtable mailing list