As mentioned earlier, there is no warning mechanism built-into the PBS to let you know if your drives are having issues, although you can manually view SMART information from the GUI. We can implement our own, yet it depends on whether you use proper SSDs or magnetic cards (Option 1) OR a micro-SD card that does not support SMART reading in the same way.
- Firstly, let’s set up an email notification service that would be used by either option (using gmail.com as an example):
# Install mutt sudo apt install mutt -y # Edit the config sudo nano /etc/Muttrc # Paste the following text set smtp_url = "smtps://[email protected]:465/" set smtp_pass = <app_password> set ssl_force_tls = yes set realname = "Sender Name" set from = "[email protected]" set use_from = yes # Save and exit and run a test from your terminal: echo "hello" | mutt -s "Test" [email protected]
Option 1: Storage Monitoring for SSDs and Magnetic drives
- You can use
smartmontools
to fetch information about your drives:
# Install the tool sudo apt install smartmontools # Run a manual check (replace sda with your drive) sudo smartctl -a /dev/sda
- Ideally, such tests would be done regularly on a scheduled basis with an email notification (or some other method) to notify you before things get serious. You can get a list of your drives by running
sudo
. Create a simple Bash script:fdisk -l
sudo nano /usr/local/bin/check_disk_health.sh #!/bin/bash # --- Configuration --- EMAIL_TO="[email protected]" HOSTNAME=$(hostname) # --- Check for errors and capture output --- # This command looks for lines containing "mmc0" AND either "error" OR "timeout", # skipping the first 9 lines of the dmesg log. ERROR_LOG=$(dmesg | tail -n +10 | grep -i "mmc0" | grep -i -E "error|timeout") # --- Send email if any errors were found --- if [ -n "$ERROR_LOG" ]; then SUBJECT="Critical SD Card Alert on $HOSTNAME" # The email body will contain the header and the actual error lines found. EMAIL_BODY="Warning: The following SD card errors or timeouts were detected on $HOSTNAME:" echo -e "$EMAIL_BODY\n\n$ERROR_LOG" | mutt -s "$SUBJECT" "$EMAIL_TO" fi
Option 2: Micro-SD and other storage monitoring options that do not support SMART
- Fetch a list of your drives and their partition by running
sudo fdisk -l
- Create a script which will check the last X amount of rows in dmesg log looking for the mention of the micro SD card and the words ‘error’ or ‘timeout’:
#!/bin/bash # --- Configuration --- EMAIL_TO="[email protected]" HOSTNAME=$(hostname) # --- Check for errors and capture output --- # This command looks for lines containing "mmc0" AND either "error" OR "timeout", # checking the last 20 rows. ERROR_LOG=$(dmesg | tail -n -20 | grep -i "mmc0" | grep -i -E "error|timeout") # --- Send email if any errors were found --- if [ -n "$ERROR_LOG" ]; then SUBJECT="Critical SD Card Alert on $HOSTNAME" # The email body will contain the header and the actual error lines found. EMAIL_BODY="Warning: The following SD card errors or timeouts were detected on $HOSTNAME:" echo -e "$EMAIL_BODY\n\n$ERROR_LOG" | mutt -s "$SUBJECT" "$EMAIL_TO" fi
Shared for both options
- Modify permissions on the file to make it executable & set up a cron job:
# Make it executable sudo chmod +x /usr/local/bin/check_disk_health.sh # Do a test run to see that it works as expected: bash /usr/local/bin/check_disk_health.sh # Set a cron job at 3am each day, for example: sudo crontab -e 0 3 * * * /bin/bash /usr/local/bin/check_disk_health.sh