Monitoring Intel SSD health behind a Dell PERC RAID card

3/2/2012

In my last post, I ventured into the topic of monitoring individual SSD health using Intel's SMART stats, specifically, the Media_Wearout_Indicator. I contrasted this to someone's approach of monitoring for total number of bytes written. In the post, I also threw out the idea of monitoring these counters with smartd. Well, smartd wouldn't do what I wanted it to do (watch this counter and throw a fit if it dropped below a value). Sooooo, I did what any UNIX admin would do and replaced it with a shell script. We use OpenNMS and NRPE to trigger commandlets like this, so here's the script I wrote. It should work in Nagios, too. You'll probably have to customize the script to your liking, but it's straightforward and has some easy to tweak variables in the beginning. If you can't figure these variables out, time find a new line of work.

Full inline script after the jump (if you want to see what you can download).

ssdhealth.sh
File Size:	0 kb
File Type:	sh

Download File

#!/bin/bash

DISKS=( sda sdb sdc )
NUMDRIVES=8
SMARTCTL="/usr/sbin/smartctl"

SSDPRESENT=0

for disk in "${DISKS[@]}"; do
for i in $(seq 0 ${NUMDRIVES-1}); do
    if [ `/usr/bin/sudo -n $SMARTCTL -A --device=sat+megaraid,$i /dev/$disk | grep -c "Media_Wearout_Indicator"` -eq 1 ]; then
      SSDPRESENT=1
      if [ `/usr/bin/sudo -n $SMARTCTL -A --device=sat+megaraid,$i /dev/$disk | grep "Media_Wearout_Indicator" | awk '{print $4}'` -lt 20 ]; then
        echo "CRITICAL: SSD $i IN $DISK FAILING!"
        exit 1
      fi
    fi
done
done

if [ $SSDPRESENT -eq 1 ]; then
echo "OK: All SSD pass"
exit 0
else
echo "CRITICAL: No SSD present"
exit 1
fi

Monitoring Intel SSD health behind a Dell PERC RAID card

Author

Archives

Categories