I’m looking to add some large Enterprise HDD’s to my array and was wondering if a Long Smart Test would be sufficient before putting the drive into service?
I use Windows/Snapraid and have/use HDD Sentinel and also could run Read and/or write tests.
I’m curious what others testing methods are after a drive is shipped to them before putting it into service?
I don’t test. I just slap it in.
Here is my over the top method.
++++++++++++++++++++++++++++++++++++++++++++++++++++
My Testing methodology
This is something I developed to stress both new and used drives so that if there are any issues they will appear.
Testing can take anywhere from 4-7 days depending on hardware. I have a dedicated testing server setup.I use a server with ECC RAM installed, but if your RAM has been tested with MemTest86+ then your are probably fine.
- SMART Test, check stats
smartctl -i /dev/sdxx
smartctl -A /dev/sdxx
smartctl -t long /dev/sdxx
- BadBlocks -This is a complete write and read test, will destroy all data on the drive
badblocks -b 4096 -c 65535 -wsv /dev/sdxx > $disk.log
- Real world surface testing, Format to ZFS -Yes you want compression on, I have found checksum errors, that having compression off would have missed. (I noticed it completely by accident. I had a drive that would produce checksum errors when it was in a pool. So I pulled and ran my test without compression on. It passed just fine. I would put it back into the pool and errors would appear again. The pool had compression on. So I pulled the drive re ran my test with compression on. And checksum errors. I have asked about. No one knows why this happens but it does. This may have been a bug in early versions of ZOL that is no longer present.)
zpool create -f -o ashift=12 -O logbias=throughput -O compress=lz4 -O dedup=off -O atime=off -O xattr=sa TESTR001 /dev/sdxx
zpool export TESTR001
sudo zpool import -d /dev/disk/by-id TESTR001
sudo chmod -R ugo+rw /TESTR001
- Fill Test using F3 + 5) ZFS Scrub to check any Read, Write, Checksum errors.
sudo f3write /TESTR001 && f3read /TESTR001 && zpool scrub TESTR001
If everything passes, drive goes into my good pile, if something fails, I contact the seller, to get a partial refund for the drive or a return label to send it back. I record the wwn numbers and serial of each drive, and a copy of any test notes
8TB wwn-0x5000cca03bac1768 -Failed, 26 -Read errors, non recoverable, drive is unsafe to use.
8TB wwn-0x5000cca03bd38ca8 -Failed, CheckSum Errors, possible recoverable, drive use is not recommend.
++++++++++++++++++++++++++++++++++++++++++++++++++++
Long SMART test, dd if=/dev/urandom of=/dev/[new disk], long smart test. That’s pretty much it.
Run the health check in hdtune and then also look at smart stats. Also check drive health using WD Dashboard software. Then I put into work and observe it.