In my last post, I ventured into the topic of monitoring individual SSD health using Intel's SMART stats, specifically, the Media_Wearout_Indicator. I contrasted this to someone's approach of monitoring for total number of bytes written. In the post, I also threw out the idea of monitoring these counters with smartd. Well, smartd wouldn't do what I wanted it to do (watch this counter and throw a fit if it dropped below a value). Sooooo, I did what any UNIX admin would do and replaced it with a shell script. We use OpenNMS and NRPE to trigger commandlets like this, so here's the script I wrote. It should work in Nagios, too. You'll probably have to customize the script to your liking, but it's straightforward and has some easy to tweak variables in the beginning. If you can't figure these variables out, time find a new line of work. Full inline script after the jump (if you want to see what you can download).
#!/bin/bash
DISKS=( sda sdb sdc ) NUMDRIVES=8 SMARTCTL="/usr/sbin/smartctl" SSDPRESENT=0 for disk in "${DISKS[@]}"; do for i in $(seq 0 ${NUMDRIVES-1}); do if [ `/usr/bin/sudo -n $SMARTCTL -A --device=sat+megaraid,$i /dev/$disk | grep -c "Media_Wearout_Indicator"` -eq 1 ]; then SSDPRESENT=1 if [ `/usr/bin/sudo -n $SMARTCTL -A --device=sat+megaraid,$i /dev/$disk | grep "Media_Wearout_Indicator" | awk '{print $4}'` -lt 20 ]; then echo "CRITICAL: SSD $i IN $DISK FAILING!" exit 1 fi fi done done if [ $SSDPRESENT -eq 1 ]; then echo "OK: All SSD pass" exit 0 else echo "CRITICAL: No SSD present" exit 1 fi Comments are closed.
|
AuthorA NOLA native just trying to get by. I live in San Francisco and work as a digital plumber for the joint that runs this thing. (Square/Weebly) Thoughts are mine, not my company's. Archives
May 2021
Categories
All
|