smartctl_err, ssd_life_left_10, and ssd_life_left_20

Prev Next

The troubleshooting information about the following system statuses are covered in this topic:

  • smartctl_err

  • ssd_life_left_10

  • ssd_life_left_20

This check reads healthcheck information directly from the disk using Self-Monitoring, Analysis and Reporting Technology (SMART).

Log Message

Smartctl Errors Detected

Command Executed

/usr/sbin/smartctl -A -f hex,id /dev/[device], where device is either “sda” or “mmcblk0p”.

Example

/usr/sbin/smartctl -A -f hex,id /dev/sda

Issue Identification

  • Summary Message:  Command smartctl did not execute cleanly: <value>

  • Summary Message:  Command smartctl aborted: read SMART data error

These errors might indicate either that the command itself failed to execute or that the disk is self-reporting errors. Both these cases might be important issues or not. The full output of the command, as well as other filesystem and disk indicators, should be analyzed.

In the first case, <value> is defined by a bitmask, each bit has a meaning indicating 8 possible errors, potential errors, or faults:

Bit 0: Command line did not parse.

Bit 1: Device open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode

Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure

Bit 3: SMART status check returned "DISK FAILING".

Bit 4: Found prefail Attributes <= threshold.

Bit 5: SMART status check returned "DISK OK" but smartctl found that some (usage or prefail) Attributes have been <= threshold at some time in the past.

Bit 6: The device error log contains records of errors.

Bit 7: The device self-test log contains records of errors. Failed self-tests outdated by a newer successful extended self-test are ignored.

Verify Details

  1. Review smartctl output for specific error codes or failing health attributes.

  2. Cross-check with other diagnostic tools to see if the issue is consistent across systems.

  3. Check SMART status of the SSDs for more detailed information about potential hardware failure.

Suggested Actions or Workaround:

  • Run the smartctl diagnostic again to get updated data.

  • Check if the SSD requires further SMART error analysis, such as analyzing reallocated sectors or pending sectors.

  • If problem indications persist, consider backing up important data stored in the filesystem. In extreme cases, the disk may need to be replaced.

Important

SMART errors can be early indicators of disk failure, but not all SMART errors lead to immediate failure.

Proactive monitoring and backups are essential if SMART errors are detected.

Log Message

SSD Life Remaining Below 20%

SSD Life Remaining Below 10%

Issue Identification

The system has detected that the predicted lifespan of the SSD is less than 20% or 10%. This warning is based on SMART attributes and indicates potential imminent failure.

Verify Details

  • Check smartctl output for detailed life remaining statistics.

  • Identify the cause of the low life left (e.g., excessive writes, temperature issues, etc.).

  • Review system logs for additional signs of impending failure or degradation.

Suggested Actions or Workaround:

  • Back up all critical data immediately.

  • If possible, replace the unit. If under warranty, an RMA may apply in some cases.

  • Review the SSD usage pattern for any issues that might have accelerated wear and consider reducing workload on the disk.

Important

This warning is not necessarily an indication of immediate failure but signals that the SSD may be nearing the end of its useful life.