Anand LaShimpi of Anandtech released an article on Intel SSD longevity a few days ago. A few of my friends were talking about it, so of course my boss asked me for numbers on our SSD longevity. First off, Anand is somewhat over-complicating things by trying to determine drive lifetime by how much data he's written so far. It's an admirable attempt, but it involves too much hedging and guessing. Managers and finance departments intensely dislike guessing. So what's a guy to use instead?
Easy, use the Media_Wearout_Indicator instead. Media wearout you ask? Sounds great! It is great. This value starts at 100 and is a percentage of the drive's estimated lifetime with regards to writes. If it's at 100, the drive says you have 100% of your writes left. If it's at 80, you have 80% of life left. Etc etc. Intel uses this value to determine warranty. If it hits 0, your warranty is up.
How do you grab this counter? Pretty easy, it shows up in output of smartctl -A. It's value number 233 (not value number 226). How do you grab this value if you're not running JBOD, but instead are using RAID behind a Dell PERC 6/i, H700, etc? Things get a little more complicated, but it's definitely possible. First off, dump the ancient version of smartmontools that is standard on CentOS and Red Hat based systems. I'm using version 5.42 so anything that or higher should work. Red Hat specifically removes support for Dell PERCs in their builds, so you'll have to compile your own. Luckily this is pretty easy. I've uploaded my own RPM spec file so you can build your own updated package. Once you've got it going, run the command like this:
smartctl -A --device=sat+megaraid,0 /dev/sda
This assumes that your RAID device is sda, and the SSD is on SAS ID 0. Increment the value after megaraid for each disk you have in your SSD RAID group. For example, to list the percentages for drives 0-5 in /dev/sda:
for i in {0..5}; do smartctl -A --device=sat+megaraid,$i /dev/sda | grep 'Media_Wearout_Indicator' | awk '{print $4}'; done
For me that outputs:
100
100
100
100
100
100
Easy peasy. You can graph/store those numbers if you want (OpenTSDB would be good for this). But that might be overkill. Putting it in a NRPE check script would be better. Actually, smartd.conf would be an excellent candidate for this...
One person asked why I like this counter better than the one that Anand used. Well one is corporate reality instead of nerdom wankery. Every IT/ops team depreciates gear in 3 or 5 year periods. If you can say that your gear lasts at least 3 or 5 years, you're golden. Who cares when it blows up? It'll be replaced by the time it's no longer useful. All you care is that it lasts at least as long as it takes to depreciate and replace.
Easy, use the Media_Wearout_Indicator instead. Media wearout you ask? Sounds great! It is great. This value starts at 100 and is a percentage of the drive's estimated lifetime with regards to writes. If it's at 100, the drive says you have 100% of your writes left. If it's at 80, you have 80% of life left. Etc etc. Intel uses this value to determine warranty. If it hits 0, your warranty is up.
How do you grab this counter? Pretty easy, it shows up in output of smartctl -A. It's value number 233 (not value number 226). How do you grab this value if you're not running JBOD, but instead are using RAID behind a Dell PERC 6/i, H700, etc? Things get a little more complicated, but it's definitely possible. First off, dump the ancient version of smartmontools that is standard on CentOS and Red Hat based systems. I'm using version 5.42 so anything that or higher should work. Red Hat specifically removes support for Dell PERCs in their builds, so you'll have to compile your own. Luckily this is pretty easy. I've uploaded my own RPM spec file so you can build your own updated package. Once you've got it going, run the command like this:
smartctl -A --device=sat+megaraid,0 /dev/sda
This assumes that your RAID device is sda, and the SSD is on SAS ID 0. Increment the value after megaraid for each disk you have in your SSD RAID group. For example, to list the percentages for drives 0-5 in /dev/sda:
for i in {0..5}; do smartctl -A --device=sat+megaraid,$i /dev/sda | grep 'Media_Wearout_Indicator' | awk '{print $4}'; done
For me that outputs:
100
100
100
100
100
100
Easy peasy. You can graph/store those numbers if you want (OpenTSDB would be good for this). But that might be overkill. Putting it in a NRPE check script would be better. Actually, smartd.conf would be an excellent candidate for this...
One person asked why I like this counter better than the one that Anand used. Well one is corporate reality instead of nerdom wankery. Every IT/ops team depreciates gear in 3 or 5 year periods. If you can say that your gear lasts at least 3 or 5 years, you're golden. Who cares when it blows up? It'll be replaced by the time it's no longer useful. All you care is that it lasts at least as long as it takes to depreciate and replace.
| smartmontools.spec |

RSS Feed