[RndTbl] AIT3 Problems

swallbri at mail.synack-hosting.com swallbri at mail.synack-hosting.com
Fri Dec 13 17:27:58 CST 2002


Hi, I am having a hell of a time with a new AIT3 drive we have. I have a
folder with 21GB of files, all around 12MB, that I want to write to tape.
I am running Debian 3.0 with the 2.4.18 kernel. The machine is Dual Athlon
1500's on a Tyan TigerMP motherboard with 1.5GB of ram (dmesg is below).
The tape drive is a Sony SDX-700C connected with a very high quality
Granite Digital U320 certified SCSI cable and terminator. It is the only
device on the controller . The machine also has a 3Ware IDE Raid
controller with 8-80GB Maxtor drives in a raid 5 array.

It startes writing out fine then it dies after a random number of files.
The exact command I am using is....

pinky:/# tar -b 1024 -cvf /dev/nst0 /Rogue/Renders/CorePost/CO-202/ >
tape0001_CO-202.log
tar: Removing leading `/' from member names
tar: /dev/nst0: Wrote only 65536 of 524288 bytes
tar: Error is not recoverable: exiting now

The tapes are 100GB tapes and I am only trying to write 21GB, so it's not
that.

pinky:/# du -hs /Rogue/Renders/CorePost/CO-202/
21G     /ntfs/Rogue/Renders/CorePost/CO-202

In syslog I get....

Dec 13 12:05:24 pinky kernel: (scsi0:A:6:0): Unexpected busfree in
Data-out phase
Dec 13 12:05:24 pinky kernel: SEQADDR == 0x8a
Dec 13 12:05:24 pinky kernel: st0: Error 70000 (sugg. bt 0x0, driver bt
0x0, host bt 0x7).
Dec 13 12:05:24 pinky kernel: st0: Error 8 (sugg. bt 0x0, driver bt 0x0,
host bt 0x0).
Dec 13 12:05:24 pinky kernel: st0: Error 8 (sugg. bt 0x0, driver bt 0x0,
host bt 0x0).
Dec 13 12:05:24 pinky kernel: st0: Error on write filemark.

At one point I got a much nastier message, but it's pretty huge, so it's
at the bottom of the email.

I did some google'ing and I didn't find much (well, I did find lots of
posts with the 'Error on write filemark" but they all had a Medium Sense
error first.). I have tried all kinds of different block sizes for tar
(1024 is the highest I have tried), I have tried different SCSI
cards/cables/terminators. Pretty much everything I can think of.

After talking with Tier1 support at Sony a week ago, they thought the
drive was dead, so they sent out anther one, which does the exact same
thing. I spent an hour on the phone with Tier 1 again today (he didn't
even know what tar was), I think I have finally been bumped up to Tier 2,
but I am waiting to hear back from them.

Does anyone have any ideas? I really need to get this thing working ( I
have 3.5TB to archive, ideally before I leave next weekend).

thanks
shawn




shawn at pinky:~$ dmesg
Linux version 2.4.18-200211301 (root at pinky) (gcc version 2.95.4 20011002
(Debian
prerelease)) #1 SMP Sun Dec 1 13:37:47 CST 2002
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e4800 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003ffffc00 (ACPI data)
BIOS-e820: 000000003ffffc00 - 0000000040000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
found SMP MP-table at 000f7510
hm, page 000f7000 reserved twice.
hm, page 000f8000 reserved twice.
hm, page 0009f000 reserved twice.
hm, page 000a0000 reserved twice.
On node 0 totalpages: 262128
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32752 pages.
Intel MultiProcessor Specification v1.4
   Virtual Wire compatibility mode.
OEM ID: TYAN     Product ID: GUINNESS     APIC at: 0xFEE00000
Processor #1 Pentium(tm) Pro APIC version 16
Processor #0 Pentium(tm) Pro APIC version 16
I/O APIC #2 Version 17 at 0xFEC00000.
Processors: 2
Kernel command line: BOOT_IMAGE=Linux ro root=302
Initializing CPU#0
Detected 1533.419 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 3060.53 BogoMIPS
Memory: 1028468k/1048512k available (2068k kernel code, 19656k reserved,
504k da
ta, 232k init, 131008k highmem)
Dentry-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff c1cbfbff 00000000 00000000
CPU:             Common caps: 0383fbff c1cbfbff 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff c1cbfbff 00000000 00000000
CPU:             Common caps: 0383fbff c1cbfbff 00000000 00000000
CPU0: AMD Athlon(tm) MP Processor 1800+ stepping 02
per-CPU timeslice cutoff: 731.39 usecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 3060.53 BogoMIPS
CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 0383fbff c1cbfbff 00000000 00000000
CPU:             Common caps: 0383fbff c1cbfbff 00000000 00000000
CPU1: AMD Athlon(tm) Processor stepping 02
Total of 2 processors activated (6121.06 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23
not co
nnected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  00 000 00  1    0    0 
 0   0    0    0    00
01 003 03  0    0    0   0   0    1    1    39
02 003 03  0    0    0   0   0    1    1    31
03 003 03  0    0    0   0   0    1    1    41
04 003 03  0    0    0   0   0    1    1    49
05 003 03  1    1    0   1   0    1    1    51
06 003 03  0    0    0   0   0    1    1    59
07 003 03  0    0    0   0   0    1    1    61
08 003 03  0    0    0   0   0    1    1    69
09 003 03  0    0    0   0   0    1    1    71
0a 003 03  1    1    0   1   0    1    1    79
0b 003 03  1    1    0   1   0    1    1    81
0c 003 03  0    0    0   0   0    1    1    89
0d 003 03  0    0    0   0   0    1    1    91
0e 003 03  0    0    0   0   0    1    1    99
0f 003 03  0    0    0   0   0    1    1    A1
10 000 00  1    0    0   0   0    0    0    00
11 000 00  1    0    0   0   0    0    0    00
12 000 00  1    0    0   0   0    0    0    00
13 000 00  1    0    0   0   0    0    0    00
14 000 00  1    0    0   0   0    0    0    00
15 000 00  1    0    0   0   0    0    0    00
16 000 00  1    0    0   0   0    0    0    00
17 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1533.3676 MHz.
..... host bus clock speed is 266.6726 MHz.
cpu: 0, clocks: 2666726, slice: 888908
CPU0<T0:2666720,T1:1777808,D:4,S:888908,C:2666726>
cpu: 1, clocks: 2666726, slice: 888908
CPU1<T0:2666720,T1:888896,D:8,S:888908,C:2666726>
checking TSC synchronization across CPUs: passed.
Waiting on wait_init_idle (map = 0x2)
All processors have done init_idle
PCI: PCI BIOS revision 2.10 entry at 0xfd7e0, last bus=3
PCI: Using configuration type 1
PCI: Probing PCI hardware
Unknown bridge resource 0: assuming transparent
Unknown bridge resource 0: assuming transparent
Unknown bridge resource 2: assuming transparent
BIOS failed to enable PCI standards compliance, fixing this error.
I/O APIC: AMD Errata #22 may be present. In the event of instability try
       : booting with the "noapic" option.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces
Journalled Block Device driver loaded
NTFS driver v1.1.22 [Flags: R/O]
SGI XFS with ACLs, quota, no debug enabled
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ
SERIAL_PCI en
abled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
block: 128 slots per queue, batch=32
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7411: IDE controller on PCI bus 00 dev 39
AMD7411: chipset revision 1
AMD7411: not 100% native mode: will probe irqs later
   ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
   ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
hda: MAXTOR 6L020J1, ATA DISK drive
hdc: HL-DT-ST CD-ROM GCR-8520B, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 40132503 sectors (20548 MB) w/1818KiB Cache, CHS=2498/255/63
hdc: ATAPI 52X CD-ROM drive, 128kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
Partition check:
hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
FDC 0 is a post-1991 82077
eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://www.scyld.com/network/eepro100.
html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<saw at sa
w.sw.com.sg> and others
eth0: Intel Corp. 82557 [Ethernet Pro 100], 00:03:47:00:49:AF, IRQ 10.
 Receiver lock-up bug exists -- enabling work-around.
 Board assembly 711269-005, Physical connectors present: RJ45
 Primary interface chip i82555 PHY #1.
 General self-test: passed.
 Serial sub-system self-test: passed.
 Internal registers self-test: passed.
 ROM checksum self-test: passed (0x24c9f043).
 Receiver lock-up workaround activated.
eth1: Intel Corp. 82557 [Ethernet Pro 100] (#2), 00:03:47:00:49:B0, IRQ 11.
 Receiver lock-up bug exists -- enabling work-around.
 Board assembly 711269-005, Physical connectors present: RJ45
 Primary interface chip i82555 PHY #1.
 General self-test: passed.
 Serial sub-system self-test: passed.
 Internal registers self-test: passed.
 ROM checksum self-test: passed (0x24c9f043).
 Receiver lock-up workaround activated.
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 816M
agpgart: Detected AMD AMD 760MP chipset
agpgart: AGP aperture is 64M @ 0xf8000000
SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.4
       <Adaptec 29160 Ultra160 SCSI adapter>
       aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

 Vendor: SONY      Model: SDX-700C          Rev: 0101
 Type:   Sequential-Access                  ANSI SCSI revision: 02
(scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 127, 16bit)
3ware Storage Controller device driver for Linux v1.02.00.016.
scsi1 : Found a 3ware Storage Controller at 0x1450, IRQ: 11, P-chip: 1.3
scsi1 : 3ware Storage Controller
 Vendor: 3ware     Model: 3w-xxxx           Rev: 1.0
 Type:   Direct-Access                      ANSI SCSI revision: 00
st: Version 20020205, bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16
Attached scsi tape st0 at scsi0, channel 0, id 2, lun 0
Attached scsi disk sda at scsi1, channel 0, id 4, lun 0
SCSI device sda: 1120591360 512-byte hdwr sectors (-525768 MB)
sda: sda1
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
uhci.c: USB Universal Host Controller Interface driver v1.1
usb-ohci.c: USB OHCI at membase 0xc00dc000, IRQ 11
usb-ohci.c: usb-00:07.4, Advanced Micro Devices [AMD] AMD-765 [Viper] USB
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
usb-ohci.c: USB OHCI at membase 0xf882f000, IRQ 5
usb-ohci.c: usb-03:08.0, NEC Corporation USB
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 3 ports detected
usb-ohci.c: USB OHCI at membase 0xf8831000, IRQ 10
usb-ohci.c: usb-03:08.1, NEC Corporation USB (#2)
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
Initializing USB Mass Storage driver...
usb.c: registered new driver usb-storage
USB Mass Storage support registered.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 232k freed
Adding Swap: 498004k swap-space (priority -1)
_____________________________________________________

Nasty SCSI Error:
Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Attempting to queue an ABORT
message
Dec 10 15:00:55 pinky kernel: scsi0: Dumping Card State in Command phase,
at SEQADDR 0x168
Dec 10 15:00:55 pinky kernel: ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4,
ARG_2 = 0x0
Dec 10 15:00:55 pinky kernel: HCNT = 0x0
Dec 10 15:00:55 pinky kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Dec 10 15:00:55 pinky kernel:  DFCNTRL = 0x4, DFSTATUS = 0x89
Dec 10 15:00:55 pinky kernel: LASTPHASE = 0x80, SCSISIGI = 0x84, SXFRCTL0
= 0x88
Dec 10 15:00:55 pinky kernel: SSTAT0 = 0x7, SSTAT1 = 0x0
Dec 10 15:00:55 pinky kernel: SCSIPHASE = 0x0
Dec 10 15:00:55 pinky kernel: STACK == 0x175, 0x160, 0x0, 0x34
Dec 10 15:00:55 pinky kernel: SCB count = 4
Dec 10 15:00:55 pinky kernel: Kernel NEXTQSCB = 3
Dec 10 15:00:55 pinky kernel: Card NEXTQSCB = 3
Dec 10 15:00:55 pinky kernel: QINFIFO entries:
Dec 10 15:00:55 pinky kernel: Waiting Queue entries:
Dec 10 15:00:55 pinky kernel: Disconnected Queue entries:
Dec 10 15:00:55 pinky kernel: QOUTFIFO entries:
Dec 10 15:00:55 pinky kernel: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Dec 10 15:00:55 pinky kernel: Pending list: 2
Dec 10 15:00:55 pinky kernel: Kernel Free SCB list: 1 0
Dec 10 15:00:55 pinky kernel: Untagged Q(6): 2
Dec 10 15:00:55 pinky kernel: DevQ(0:6:0): 0 waiting
Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Device is active, asserting ATN
Dec 10 15:00:55 pinky kernel: Recovery code sleeping
Dec 10 15:01:00 pinky kernel: Recovery code awake
Dec 10 15:01:00 pinky kernel: Timer Expired
Dec 10 15:01:00 pinky kernel: aic7xxx_abort returns 0x2003

I got this before I had the new SCSI cable and terminator (I was using the
built in termination on the drive).







More information about the Roundtable mailing list