The troubleshooting information about the following system statuses are covered in this topic:
smartctl_err
ssd_life_left_10
ssd_life_left_20
This check reads healthcheck information directly from the disk using
Log Message
Smartctl Errors Detected
Command Executed
/usr/sbin/smartctl -A -f hex,id /dev/[device], where device is either “sda” or “mmcblk0p”.
Example
/usr/sbin/smartctl -A -f hex,id /dev/sda
Issue Identification
Summary Message: Command smartctl did not execute cleanly: <value>
Summary Message: Command smartctl aborted: read SMART data error
These errors might indicate either that the command itself failed to execute or that the disk is self-reporting errors. Both these cases might be important issues or not. The full output of the command, as well as other filesystem and disk indicators, should be analyzed.
In the first case, <value> is defined by a bitmask, each bit has a meaning indicating 8 possible errors, potential errors, or faults:
Bit 0: Command line did not parse.
Bit 1: Device open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode
Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure
Bit 3: SMART status check returned "DISK FAILING".
Bit 4: Found prefail Attributes <= threshold.
Bit 5: SMART status check returned "DISK OK" but smartctl found that some (usage or prefail) Attributes have been <= threshold at some time in the past.
Bit 6: The device error log contains records of errors.
Bit 7: The device self-test log contains records of errors. Failed self-tests outdated by a newer successful extended self-test are ignored.
Verify Details
Review smartctl output for specific error codes or failing health attributes.
Cross-check with other diagnostic tools to see if the issue is consistent across systems.
Check SMART status of the SSDs for more detailed information about potential hardware failure.
Suggested Actions or Workaround:
Run the smartctl diagnostic again to get updated data.
Check if the SSD requires further SMART error analysis, such as analyzing reallocated sectors or pending sectors.
If problem indications persist, consider backing up important data stored in the filesystem. In extreme cases, the disk may need to be replaced.
Important
SMART errors can be early indicators of disk failure, but not all SMART errors lead to immediate failure.
Proactive monitoring and backups are essential if SMART errors are detected.
Log Message
SSD Life Remaining Below 20%
SSD Life Remaining Below 10%
Issue Identification
The system has detected that the predicted lifespan of the SSD is less than 20% or 10%. This warning is based on SMART attributes and indicates potential imminent failure.
Verify Details
Check smartctl output for detailed life remaining statistics.
Identify the cause of the low life left (e.g., excessive writes, temperature issues, etc.).
Review system logs for additional signs of impending failure or degradation.
Suggested Actions or Workaround:
Back up all critical data immediately.
If possible, replace the unit. If under warranty, an RMA may apply in some cases.
Review the SSD usage pattern for any issues that might have accelerated wear and consider reducing workload on the disk.
Important
This warning is not necessarily an indication of immediate failure but signals that the SSD may be nearing the end of its useful life.