[RndTbl] poor drive/array performance

Thu Apr 14 04:27:52 CDT 2016

On 2016-04-11 Trevor Cordes wrote:
> I'm getting extremely bizarre low FS performance.
> 
> 262144000 bytes (262 MB) copied, 26.1413 s, 10.0 MB/s
> 
> My only thoughts now are firmware problems, bios setting problems, or 
> cable problem.  I need to go onsite to check all three.

Problem was...

Drum roll please.....

cable!

But a lesson to learn.  First, after I looked long and hard at the
cable that came with the Intel server board, the one I had used, it
probably was only a 3G/s cable.  They gave me two weird cables that had
2 SATA cables taped together on each.  Since this is a 1 year old
board, and no other cables were included, I figured they gave me 6G/s
cables.  Nah, let's confuse the system builders and give them no 6's
with this board!

Second mistake, I had put this 3G/s cable in the 2 6G/s ports.  I think
the rust drive is too old to be 6G/s, but the SSD is surely 6.

Here's where it gets interesting: I guess SATA doesn't autodetect cable
capability, like, say, IDE with 40 vs 80 conductor.  I'm not terribly
surprised, but still, one would have hoped it would auto-negotiate
*the cable capacity*.  I know this because SMART confirmed the drive was
running in 6G/s mode. Lesson learned

Even weirder was that this even worked at all to produce a relatively
stable, non-data-corrupting setup that would give consistent 7-14MB/s
speed.  It's like it was shooting electrons down at way too high a
speed and the odd one would get through ok.  I'm sure it must be
checksumming/ECC on the SATA bus that was saving the day.  Must be
robust!  I actually find this quite amusing.

Stranger still was the asymmetric wonkiness: my read tests were showing
~400MB/s reads while writes were still ~10MB/s.  Huh?  Still puzzled on
that one.  Maybe the drive speaks to the controller with slightly more
voltage due to different manufacturing tolerances?  Who knows.  Or
maybe it's some weird effect of the placement of the pair of wires for
R vs W, like outside edge of the cable vs inside?

Finally, I think I've guessed why the write test to /boot, which was
non-degraded RAID-1 (1 SSD + 1 rust), was faster than the test to
just / (1 SSD): dd with fdatasync on top of the RAID-1 layer must have
been waiting for the RAID layer to say things were synced, and the RAID
layer *must* be satisfied things are synced when only *1* drive
completes.  That would make sense, though in my mind I would have
thought it would demand 2 be synced.  I suppose RAID and its superblock
updating or whatever is being really smart about this.  Just a guess.

End of the day I'm now getting 400MB/s read on big files, and 500MB/s
write using the previously discussed tests.  Woohoo!