注意

本文档适用于 Ceph 开发版本。

健康检查

概述

Ceph 集群有一组健康状态，这些状态被称为健康检查。每个健康检查都有一个唯一的标识符。

标识符是一个简洁的人类可读的字符串——也就是说，标识符的读取方式与典型的变量名非常相似。它旨在使工具（例如，监控和 UI）能够理解健康检查并以反映其含义的方式呈现它们。

本页面列出了由监控和管理守护进程引发的健康检查。除此之外，您可能会看到源自 CephFS MDS 守护进程的健康检查（见CephFS健康消息）以及由ceph-mgr模块定义的健康检查。

定义

监控器

DAEMON_OLD_VERSION

一个或多个 Ceph 守护进程正在运行旧的 Ceph 版本。如果检测到多个版本，则会引发健康检查。要引发健康检查，此条件必须存在的时间必须大于mon_warn_older_version_delay（默认设置为一周）才能引发健康检查。这允许大多数升级在不引发预期且短暂的警告的情况下进行。如果升级被暂停了很长时间，health mute可以通过运行ceph health mute DAEMON_OLD_VERSION --sticky来使用。但是，请确保在升级完成后运行ceph health unmute DAEMON_OLD_VERSION，以便任何未来的意外实例不会被掩盖。

MON_DOWN

一个或多个 Ceph 监控守护进程宕机。集群需要大多数（超过一半）的配置监控器可用。当一个或多个监控器宕机时，客户端可能更难形成它们与集群的初始连接，因为它们可能需要尝试额外的 IP 地址才能到达一个正在运行的监控器。

宕机的监控守护进程应尽快恢复或重新启动，以降低额外的监控器故障导致服务中断的风险。

MON_CLOCK_SKEW

运行 Ceph 监控守护进程的主机的时钟没有很好地同步。如果集群检测到时钟偏差大于mon_clock_drift_allowed.

，则会引发此健康检查。通过使用像ntpd这样的工具或更新的chrony这样的工具同步时钟是最好的解决方法。理想情况下，应配置 NTP 守护进程以针对多个内部和外部源进行同步，以提高弹性；协议将自适应地确定最佳可用源。还受益于 Ceph 监控主机上的 NTP 守护进程彼此同步，因为与其他任何东西相比，同步监控器彼此同步更为重要。

如果保持时钟紧密同步不切实际，则可以增加mon_clock_drift_allowed阈值。但是，此值必须保持在mon_lease间隔显著以下，以便监控集群正常工作。使用高质量的 NTP 或 PTP 配置，子毫秒同步并不难，因此只有在非常、非常少数情况下才适合更改此值。

MON_MSGR2_NOT_ENABLED

The ms_bind_msgr2选项已启用，但一个或多个监控器没有在集群的 monmap 中配置为绑定到 v2 端口。这意味着特定于 msgr2 协议的功能（例如，加密）在某些或所有连接上不可用。

在大多数情况下，可以通过运行以下命令来纠正此问题：

ceph mon enable-msgr2

运行此命令后，任何配置为在旧默认端口（6789）上监听的监控器将继续在 6789 上监听 v1 连接，并开始在新的默认端口 3300 上监听 v2 连接。

如果监控器配置为在非标准端口（即 6789 之外的端口）上监听 v1 连接，则需要手动修改 monmap。

MON_DISK_LOW

一个或多个监控器存储空间不足。如果监控数据库（通常为/var/lib/ceph/mon）上可用空间的百分比低于百分比值mon_data_avail_warn（默认：30%），则会引发此健康检查。

此警报可能表示系统上其他进程或用户正在填充监控使用的文件系统。它也可能表示监控数据库太大（见MON_DISK_BIG下面）。另一种常见情况是，为了解决问题而提高了 Ceph 日志子系统级别，但没有后续恢复为默认级别。持续的详细日志记录很容易填满包含/var/log的文件系统。如果您修剪当前打开的日志，请记住重新启动或指示您的 syslog 或其他守护进程重新打开日志文件。

如果无法释放空间，可能需要将监控器的数据目录移动到另一个存储设备或文件系统（此重新定位过程必须在监控守护进程未运行时进行）。

MON_DISK_CRIT

一个或多个监控器存储空间严重不足。如果监控数据库（通常为/var/lib/ceph/mon）上可用空间的百分比低于百分比值mon_data_avail_crit（默认：5%），则会引发此健康检查。见MON_DISK_LOW，上面。

MON_DISK_BIG

一个或多个监控器的数据库非常大。如果监控数据库的大小大于mon_data_size_warn（默认：15 GiB），则会引发此健康检查。

一个大数据库是不寻常的，但并不一定表示有问题。当存在未达到active+clean状态很长时间的放置组，或者最近发生了广泛的集群恢复、扩展或拓扑更改时，监控数据库可能会变大。

此警报也可能表示监控器的数据库没有正确地压缩，这在一些较旧的 RocksDB 版本中观察到过这个问题。使用ceph daemon mon.<id> compact强制压缩可能足以缩小数据库的存储使用量。

此警报也可能表示监控器存在一个阻止其修剪它存储的集群元数据的错误。如果问题仍然存在，请报告一个错误。

要调整警告阈值，请运行以下命令：

ceph config set global mon_data_size_warn <size>

MON_NETSPLIT

Ceph 监控器之间发生了网络分区。当一个或多个监控器检测到至少两个 Ceph 监控器失去连接或可达性，基于它们各自的连接分数，这些分数经常更新时，会引发此健康检查。只有当集群配置了至少三个 Ceph 监控器并且它们使用connectivity选举策略时，才会出现此警告。

网络分区以两种方式报告：

作为位置级网络分区（例如，“在 dc1 和 dc2 之间检测到网络分区”），当一个位置中的所有监控器都无法与另一个位置中的所有监控器通信时。
作为单独的监控器网络分区（例如，“在 mon.a 和 mon.d 之间检测到网络分区”），当特定监控器跨位置断开连接时。

系统优先考虑在可能的情况下，在最高拓扑级别（例如，datacenter, rack）报告，以更好地帮助操作员识别基础设施级网络问题。

AUTH_INSECURE_GLOBAL_ID_RECLAIM

连接到集群的一个或多个客户端或守护进程在重新连接到监控器时没有安全地回收他们的global_id（集群中每个实体的唯一编号）。auth_allow_insecure_global_id_reclaim选项设置为true（这可能需要在所有 Ceph 客户端都已升级之前这样做）并且auth_expose_insecure_global_id_reclaim选项设置为true（这允许监控器通过强制那些客户端在初始身份验证后立即重新连接来更快地检测到具有“不安全回收”的客户端）。

要识别哪些客户端正在使用未打补丁的 Ceph 客户端代码，请运行以下命令：

ceph health detail

如果您收集了连接到单个监控器的客户端的转储，并检查转储输出中的global_id_status字段，您可以看到这些客户端的global_id回收行为。这里reclaim_insecure表示客户端未打补丁并且正在导致此健康检查。

ceph tell mon.\* sessions

我们强烈建议系统中的所有客户端都升级到更新的 Ceph 版本，该版本可以正确回收global_id值。所有客户端更新后，请运行以下命令以停止允许不安全的重新连接：

ceph config set mon auth_allow_insecure_global_id_reclaim false

如果立即升级所有客户端不切实际，您可以通过运行以下命令暂时抑制此警报：

ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w   # 1 week

虽然我们不推荐这样做，但您也可以通过运行以下命令永久禁用此警报：

ceph config set mon mon_warn_on_insecure_global_id_reclaim false

AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED

Ceph 目前配置为允许客户端使用不安全的进程重新连接到监控器以回收他们以前的global_id。这种回收是允许的，因为默认情况下，auth_allow_insecure_global_id_reclaim设置为true。可能需要保持此设置启用，以便现有的 Ceph 客户端升级到更新的 Ceph 版本，这些版本可以正确且安全地回收他们的global_id.

如果未设置AUTH_INSECURE_GLOBAL_ID_RECLAIM。如果auth_expose_insecure_global_id_reclaim setting has not been disabled (it is enabled by default), then there are currently no clients connected that need to be upgraded. In that case, it is safe to disable insecure global_id reclaim:

ceph config set mon auth_allow_insecure_global_id_reclaim false

。另一方面，如果仍有客户端需要升级，则可以通过运行以下命令暂时抑制此警报：

ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w   # 1 week

虽然我们不推荐这样做，但您也可以通过运行以下命令永久禁用此警报：

ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false

管理员

MGR_DOWN

所有 Ceph 管理员守护进程当前都宕机。集群通常至少应有一个正在运行的管理员（ceph-mgr）守护进程。如果没有管理员守护进程在运行，集群自我监控的能力将受到损害，管理 API 的一部分将变得不可用（例如，仪表板将无法工作，并且大多数报告指标或运行时状态的自定义命令将阻止）。但是，集群仍然能够执行客户端 I/O 操作并从故障中恢复。

MGR_MODULE_DEPENDENCYceph -s information is available and up to date, and so that metrics can be scraped by Prometheus).

MGR_MODULE_DEPENDENCY

一个启用的管理员模块正在失败其依赖项检查。此健康检查通常附带来自模块的说明性消息。

例如，模块可能会报告缺少必要的软件包：在这种情况下，您应该安装所需的软件包并重新启动您的管理员守护进程。

此健康检查仅适用于启用的模块。如果模块未启用，您可以在ceph module ls.

的输出中看到它是否报告依赖项问题。

一个管理员模块遇到了意外的错误。通常，这意味着模块的serve函数引发了未处理的异常。如果异常没有提供有用的自我描述，则人类可读的错误描述可能晦涩难懂。

此健康检查可能表示存在错误：如果您认为您遇到了错误，请打开一个 Ceph 错误报告。

但是，如果您认为错误是暂时的，您可以重新启动您的管理员守护进程或使用ceph mgr fail在活动守护进程上强制故障转移到另一个守护进程。

OSDs

OSD_DOWN

一个或多个 OSD 被标记为down。ceph-osd 守护进程或其主机可能已崩溃或停止，或者对等 OSD 可能无法通过公共或专用网络连接到 OSD。常见原因包括守护进程停止或崩溃、主机“宕机”或网络故障。

验证主机是否健康，守护进程是否已启动，网络是否正常工作。如果守护进程崩溃，守护进程日志文件/var/log/ceph/ceph-osd.*可能包含故障排除信息。

OSD_<crush type>_DOWN

（例如，OSD_HOST_DOWN、OSD_ROOT_DOWN）

特定 CRUSH 子树中的所有 OSD 都被标记为“宕机”（例如，主机上的所有 OSD）。

OSD_ORPHAN

OSD 在 CRUSH 映射层次结构中被引用，但不存在。

要从 CRUSH 映射层次结构中删除 OSD，请运行以下命令：

ceph osd crush rm osd.<id>

OSD_OUT_OF_ORDER_FULL

用于nearfull, backfillfull, full，以及/或failsafe_full的利用率阈值没有上升。特别是，预期模式如下：nearfull < backfillfull, backfillfull < full, and full < failsafe_full。这可能导致集群行为异常。

要调整这些利用率阈值，请运行以下命令：

ceph osd set-nearfull-ratio <ratio>
ceph osd set-backfillfull-ratio <ratio>
ceph osd set-full-ratio <ratio>

OSD_FULL

一个或多个 OSD 超过了full阈值，并阻止集群服务写入。

要按池检查利用率，请运行以下命令：

ceph df

要查看当前定义的full比率，请运行以下命令：

ceph osd dump | grep full_ratio

恢复写入可用性的短期解决方法是稍微提高完整阈值。要这样做，请运行以下命令：

ceph osd set-full-ratio <ratio>

应在适当的 CRUSH 故障域内部署额外的 OSD 以增加容量，并且/或删除现有数据以释放集群空间。有一种微妙的情况是，可能使用rados bench工具测试一个或多个池的性能，并且生成的 RADOS 对象没有被清理。可以通过调用rados ls对每个池进行检查，并查找以bench或其他作业名称开头的对象来查找这种情况。然后可以手动但非常、非常仔细地删除这些对象以回收容量。

OSD_BACKFILLFULL

一个或多个 OSD 超过了backfillfull阈值或如果当前映射的 backfill 完成后，将超过它，这将阻止数据重新平衡到此 OSD。这是一个早期警告，表示重新平衡可能无法完成，并且集群正在接近完整。 exceed it if the currently-mapped backfills were to finish, which will prevent data from rebalancing to this OSD. This alert is an early warning that rebalancing might be unable to complete and that the cluster is approaching full.

要按池检查利用率，请运行以下命令：

ceph df

OSD_NEARFULL

一个或多个 OSD 超过了nearfull阈值。这是一个早期警告，表示集群正在接近完整。

要按池检查利用率，请运行以下命令：

ceph df

OSDMAP_FLAGS

集群的一个或多个感兴趣的标志已被设置。这些标志包括：

full- 集群被标记为完整并且无法服务写入
pauserd, pausewr- 存在暂停的读取或写入
noup- OSD 不允许启动
nodown- 忽略 OSD 故障报告，这意味着监控器不会将 OSD 标记为“宕机”
noin- 之前标记为out的 OSD 在启动时不会被标记in: 如果这些 OSD 处于“宕机”状态，它们在配置间隔后不会自动标记
noout - “down” OSDs are not automatically being marked outnobackfill
nobackfill, norecover, norebalance- 恢复或数据重新平衡被暂停
noscrub, nodeep_scrub- 缓存层级化被禁用
notieragent- 缓存层级化活动被暂停

除full外，这些标志可以由运行以下命令设置或清除：

ceph osd set <flag>
ceph osd unset <flag>

OSD_FLAGS

一个或多个 OSD 或 CRUSH {节点、设备类} 被设置了感兴趣的标志。这些标志包括：

noup: 这些 OSD 不允许启动
nodown: 这些 OSD 的故障报告将被忽略
noin: 如果这些 OSD 之前被标记为out在故障后自动标记，则它们在启动时不会被标记in: 如果这些 OSD 处于“宕机”状态，它们在配置间隔后不会自动标记
noout: if these OSDs are “down” they will not automatically be marked out: 这些 OSD 的数据不会在配置间隔后被标记为 back

要批量设置和清除这些标志，请运行以下命令：

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

例如：

ceph osd set-group noup,noout osd.0 osd.1
ceph osd unset-group noup,noout osd.0 osd.1
ceph osd set-group noup,noout host-foo
ceph osd unset-group noup,noout host-foo
ceph osd set-group noup,noout class-hdd
ceph osd unset-group noup,noout class-hdd

OLD_CRUSH_TUNABLES

CRUSH 映射使用非常旧的设置，应该更新。可以使用的最旧的设置集（即可以连接到集群的最旧的客户端版本）由mon_crush_min_required_version配置选项确定。有关更多信息，请参阅Tunables.

OLD_CRUSH_STRAW_CALC_VERSION

CRUSH 映射使用较旧、非最优的方法来计算straw桶的中间权重值。

应该更新 CRUSH 映射以使用更新的方法（即：straw_calc_version=1）。有关更多信息，请参阅Tunables.

CACHE_POOL_NO_HIT_SET

一个或多个缓存池没有配置hit set来跟踪利用率。这个问题阻止了层级化代理识别要刷新和从缓存中删除的冷对象。

要在缓存池上配置 hit set，请运行以下命令：

ceph osd pool set <poolname> hit_set_type <type>
ceph osd pool set <poolname> hit_set_period <period-in-seconds>
ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>

OSD_NO_DOWN_OUT_INTERVAL

No pre-Luminous v12.y.z OSDs are running, but the sortbitwise flag has not been set.

The sortbitwise flag must be set in order for OSDs running Luminous v12.y.z or newer to start. To safely set the flag, run the following command:

ceph osd set sortbitwise

OSD_FILESTORE

Warn if OSDs are running the old Filestore back end. The Filestore OSD back end is deprecated; the BlueStore back end has been the default object store since the Ceph Luminous release.

The ‘mclock_scheduler’ is not supported for Filestore OSDs. For this reason, the default ‘osd_op_queue’ is set to ‘wpq’ for Filestore OSDs and is enforced even if the user attempts to change it.

ceph report | jq -c '."osd_metadata" | .[] | select(.osd_objectstore | contains("filestore")) | {id, osd_objectstore}'

In order to upgrade to Reef or a later release, you must first migrate any Filestore OSDs to BlueStore.

If you are upgrading a pre-Reef release to Reef or later, but it is not feasible to migrate Filestore OSDs to BlueStore immediately, you can temporarily silence this alert by running the following command:

ceph health mute OSD_FILESTORE

Since migration of Filestore OSDs to BlueStore can take a considerable amount of time to complete, we recommend that you begin the process well in advance of any update to Reef or to later releases.

OSD_UNREACHABLE

The registered v1/v2 public address or addresses of one or more OSD(s) is or are out of the defined public_network subnet, which prevents these unreachable OSDs from communicating with ceph clients properly.

Even though these unreachable OSDs are in up state, rados clients will hang till TCP timeout before erroring out due to this inconsistency.

POOL_FULL

One or more pools have reached quota and no longer allow writes.

To see pool quotas and utilization, run the following command:

ceph df detail

If you opt to raise the pool quota, run the following commands:

ceph osd pool set-quota <poolname> max_objects <num-objects>
ceph osd pool set-quota <poolname> max_bytes <num-bytes>

If not, delete some existing data to reduce utilization.

BLUEFS_SPILLOVER

One or more OSDs that use the BlueStore back end have been allocated db partitions (that is, storage space for metadata, normally on a faster device), but because that space has been filled, metadata has “spilled over” onto the slow device. This is not necessarily an error condition or even unexpected behavior, but may result in degraded performance. If the administrator had expected that all metadata would fit on the faster device, this alert indicates that not enough space was provided.

To disable this alert on all OSDs, run the following command:

ceph config set osd bluestore_warn_on_bluefs_spillover false

Alternatively, to disable the alert on a specific OSD, run the following command:

ceph config set osd.123 bluestore_warn_on_bluefs_spillover false

To secure more metadata space, you can destroy and reprovision the OSD in question. This process involves data migration and recovery.

It might also be possible to expand the LVM logical volume that backs the db storage. If the underlying LV has been expanded, you must stop the OSD daemon and inform BlueFS of the device-size change by running the following command:

ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$ID

BLUEFS_AVAILABLE_SPACE

To see how much space is free for BlueFS, run the following command:

ceph daemon osd.123 bluestore bluefs available

This will output up to three values: BDEV_DB free, BDEV_SLOW free, and available_from_bluestore. BDEV_DB和BDEV_SLOW report the amount of space that has been acquired by BlueFS and is now considered free. The value available_from_bluestore indicates the ability of BlueStore to relinquish more space to BlueFS. It is normal for this value to differ from the amount of BlueStore free space, because the BlueFS allocation unit is typically larger than the BlueStore allocation unit. This means that only part of the BlueStore free space will be available for BlueFS.

BLUEFS_LOW_SPACE

If BlueFS is running low on available free space and there is not much free space available from BlueStore (in other words, available_from_bluestore has a low value), consider reducing the BlueFS allocation unit size. To simulate available space when the allocation unit is different, run the following command:

ceph daemon osd.123 bluestore bluefs available <alloc-unit-size>

BLUESTORE_FRAGMENTATION

BLUESTORE_FRAGMENTATION indicates that the free space that underlies BlueStore has become fragmented. This is normal and unavoidable, but excessive fragmentation causes slowdown. To inspect BlueStore fragmentation, run the following command:

ceph daemon osd.123 bluestore allocator score block

The fragmentation score is given in a [0-1] range. [0.0 .. 0.4] tiny fragmentation [0.4 .. 0.7] small, acceptable fragmentation [0.7 .. 0.9] considerable, but safe fragmentation [0.9 .. 1.0] severe fragmentation, might impact BlueFS’s ability to get space from BlueStore

To see a detailed report of free fragments, run the following command:

ceph daemon osd.123 bluestore allocator dump block

For OSD processes that are not currently running, fragmentation can be inspected with ceph-bluestore-tool. To see the fragmentation score, run the following command:

ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-score

To dump detailed free chunks, run the following command:

ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-123 --allocator block free-dump

BLUESTORE_LEGACY_STATFS

One or more OSDs have BlueStore volumes that were created prior to the Nautilus release. (In Nautilus, BlueStore tracks its internal usage statistics on a granular, per-pool basis.)

如果all OSDs are older than Nautilus, this means that the per-pool metrics are simply unavailable. But if there is a mixture of pre-Nautilus and post-Nautilus OSDs, the cluster usage statistics reported by ceph df will be inaccurate.

The old OSDs can be updated to use the new usage-tracking scheme by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update osd.123，请运行以下命令：

systemctl stop ceph-osd@123
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
systemctl start ceph-osd@123

To disable this alert, run the following command:

ceph config set global bluestore_warn_on_legacy_statfs false

BLUESTORE_NO_PER_POOL_OMAP

One or more OSDs have volumes that were created prior to the Octopus release. (In Octopus and later releases, BlueStore tracks omap space utilization by pool.)

If there are any BlueStore OSDs that do not have the new tracking enabled, the cluster will report an approximate value for per-pool omap usage based on the most recent deep scrub.

The OSDs can be updated to track by pool by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update osd.123, run the following commands:

systemctl stop ceph-osd@123
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
systemctl start ceph-osd@123

To disable this alert, run the following command:

ceph config set global bluestore_warn_on_no_per_pool_omap false

BLUESTORE_NO_PER_PG_OMAP

One or more OSDs have volumes that were created prior to Pacific. (In Pacific and later releases Bluestore tracks omap space utilitzation by Placement Group (PG).)

Per-PG omap allows faster PG removal when PGs migrate.

The older OSDs can be updated to track by PG by stopping each OSD, running a repair operation, and then restarting the OSD. For example, to update osd.123，请运行以下命令：

systemctl stop ceph-osd@123
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-123
systemctl start ceph-osd@123

To disable this alert, run the following command:

ceph config set global bluestore_warn_on_no_per_pg_omap false

BLUESTORE_DISK_SIZE_MISMATCH

One or more BlueStore OSDs have an internal inconsistency between the size of the physical device and the metadata that tracks its size. This inconsistency can lead to the OSD(s) crashing in the future.

The OSDs that have this inconsistency should be destroyed and reprovisioned. Be very careful to execute this procedure on only one OSD at a time, so as to minimize the risk of losing any data. To execute this procedure, where $N is the OSD that has the inconsistency, run the following commands:

ceph osd out osd.$N
while ! ceph osd safe-to-destroy osd.$N ; do sleep 1m ; done
ceph osd destroy osd.$N
ceph-volume lvm zap /path/to/device
ceph-volume lvm create --osd-id $N --data /path/to/device

Note

Wait for this recovery procedure to completely on one OSD before running it on the next.

BLUESTORE_NO_COMPRESSION

One or more OSDs is unable to load a BlueStore compression plugin. This issue might be caused by a broken installation, in which the ceph-osd binary does not match the compression plugins. Or it might be caused by a recent upgrade in which the ceph-osd daemon was not restarted.

To resolve this issue, verify that all of the packages on the host that is running the affected OSD(s) are correctly installed and that the OSD daemon(s) have been restarted. If the problem persists, check the OSD log for information about the source of the problem.

BLUESTORE_SPURIOUS_READ_ERRORS

One (or more) BlueStore OSDs detects read errors on the main device. BlueStore has recovered from these errors by retrying disk reads. This alert might indicate issues with underlying hardware, issues with the I/O subsystem, or something similar. Such issues can cause permanent data corruption. Some observations on the root cause of spurious read errors can be found here: https://tracker.ceph.com/issues/22464

This alert does not require an immediate response, but the affected host might need additional attention: for example, upgrading the host to the latest OS/kernel versions and implementing hardware-resource-utilization monitoring.

To disable this alert on all OSDs, run the following command:

ceph config set osd bluestore_warn_on_spurious_read_errors false

Or, to disable this alert on a specific OSD, run the following command:

ceph config set osd.123 bluestore_warn_on_spurious_read_errors false

BLOCK_DEVICE_STALLED_READ_ALERT

There are BlueStore log messages that reveal storage drive issues that can cause performance degradation and potentially data unavailability or loss. These may indicate a storage drive that is failing and should be evaluated and possibly removed and replaced.

read stalled read 0x29f40370000~100000 (buffered) since 63410177.290546s, timeout is 5.000000s

However, this is difficult to spot because there no discernible warning (a health warning or info in ceph health detail for example). More observations can be found here: https://tracker.ceph.com/issues/62500

Also because there can be false positive stalled read instances, a mechanism has been added to increase accuracy. If in the last bdev_stalled_read_warn_lifetime seconds the number of stalled read events is found to be greater than or equal to bdev_stalled_read_warn_threshold for a given BlueStore block device, this warning will be reported in ceph health detail. The warning state will be removed when the condition clears.

The defaults for bdev_stalled_read_warn_lifetime和bdev_stalled_read_warn_threshold may be overridden globally or for specific OSDs.

To change this, run the following command:

ceph config set global bdev_stalled_read_warn_lifetime 10
ceph config set global bdev_stalled_read_warn_threshold 5

This may be done for specific OSDs or a given mask. For example, to apply only to SSD OSDs:

ceph config set osd.123 bdev_stalled_read_warn_lifetime 10
ceph config set osd.123 bdev_stalled_read_warn_threshold 5
ceph config set class:ssd bdev_stalled_read_warn_lifetime 10
ceph config set class:ssd bdev_stalled_read_warn_threshold 5

WAL_DEVICE_STALLED_READ_ALERT

The warning state WAL_DEVICE_STALLED_READ_ALERT is raised to indicate stalled read instances on a given BlueStore OSD’s WAL_DEVICE. This warning can be configured via the bdev_stalled_read_warn_lifetime和bdev_stalled_read_warn_threshold options with commands similar to those described in the BLOCK_DEVICE_STALLED_READ_ALERT warning section.

DB_DEVICE_STALLED_READ_ALERT

The warning state DB_DEVICE_STALLED_READ_ALERT is raised to indicate stalled read instances on a given BlueStore OSD’s DB_DEVICE. This warning can be configured via the bdev_stalled_read_warn_lifetime和bdev_stalled_read_warn_threshold options with commands similar to those described in the BLOCK_DEVICE_STALLED_READ_ALERT warning section.

BLUESTORE_SLOW_OP_ALERT

There are BlueStore log messages that reveal storage drive issues that can lead to performance degradation and data unavailability or loss. These indicate that the storage drive may be failing and should be investigated and potentially replaced.

log_latency_fn slow operation observed for _txc_committed_kv, latency = 12.028621219s, txc = 0x55a107c30f00 log_latency_fn slow operation observed for upper_bound, latency = 6.25955s log_latency slow operation observed for submit_transaction..

As there can be false positive slow ops instances, a mechanism has been added for more reliability. If in the last bluestore_slow_ops_warn_lifetime seconds the number of slow ops indications are found greater than or equal to bluestore_slow_ops_warn_threshold for a given BlueStore OSD, this warning will be reported in ceph health detail. The warning state is cleared when the condition clears.

The defaults for bluestore_slow_ops_warn_lifetime和bluestore_slow_ops_warn_threshold may be overidden globally or for specific OSDs.

To change this, run the following command:

ceph config set global bluestore_slow_ops_warn_lifetime 10
ceph config set global bluestore_slow_ops_warn_threshold 5

this may be done for specific OSDs or a given mask, for example:

ceph config set osd.123 bluestore_slow_ops_warn_lifetime 10
ceph config set osd.123 bluestore_slow_ops_warn_threshold 5
ceph config set class:ssd bluestore_slow_ops_warn_lifetime 10
ceph config set class:ssd bluestore_slow_ops_warn_threshold 5

Device health

DEVICE_HEALTH

One or more OSD devices are expected to fail soon, where the warning threshold is determined by the mgr/devicehealth/warn_threshold config option.

Because this alert applies only to OSDs that are currently marked in, the appropriate response to this expected failure is (1) to mark the OSD out so that data is migrated off of the OSD, and then (2) to remove the hardware from the system. Note that this marking out is normally done automatically if mgr/devicehealth/self_heal is enabled (as determined by mgr/devicehealth/mark_out_threshold). If an OSD device is compromised but the OSD(s) on that device are still up, recovery can be degraded. In such cases it may be advantageous to forcibly stop the OSD daemon(s) in question so that recovery can proceed from surviving healthly OSDs. This must be done with extreme care and attention to failure domains so that data availability is not compromised.

To check device health, run the following command:

ceph device info <device-id>

Device life expectancy is set either by a prediction model that the Ceph Manager runs or by an external tool that runs a command the following form:

ceph device set-life-expectancy <device-id> <from> <to>

You can change the stored life expectancy manually, but such a change usually doesn’t accomplish anything. The reason for this is that whichever tool originally set the stored life expectancy will probably undo your change by setting it again, and a change to the stored value does not affect the actual health of the hardware device.

DEVICE_HEALTH_IN_USE

One or more devices (that is, OSDs) are expected to fail soon and have been marked out of the cluster (as controlled by mgr/devicehealth/mark_out_threshold), but they are still participating in one or more Placement Groups. This might be because the OSD(s) were marked out only recently and data is still migrating, or because data cannot be migrated off of the OSD(s) for some reason (for example, the cluster is nearly full, or the CRUSH hierarchy is structured so that there isn’t another suitable OSD to migrate the data to).

This message can be silenced by disabling self-heal behavior (that is, setting mgr/devicehealth/self_healtofalse), by adjusting mgr/devicehealth/mark_out_threshold, or by addressing whichever condition is preventing data from being migrated off of the ailing OSD(s).

DEVICE_HEALTH_TOOMANY

Too many devices (that is, OSDs) are expected to fail soon, and because mgr/devicehealth/self_heal behavior is enabled, marking out all of the ailing OSDs would exceed the cluster’s mon_osd_min_in_ratio ratio. This ratio prevents a cascade of too many OSDs from being automatically marked out.

You should promptly add new OSDs to the cluster to prevent data loss, or incrementally replace the failing OSDs.

Alternatively, you can silence this health check by adjusting options including mon_osd_min_in_ratio或mgr/devicehealth/mark_out_threshold. Be warned, however, that this will increase the likelihood of unrecoverable data loss.

Data health (pools & placement groups)

PG_AVAILABILITY

Data availability is reduced. In other words, the cluster is unable to service potential read or write requests for at least some data in the cluster. More precisely, one or more Placement Groups (PGs) are in a state that does not allow I/O requests to be serviced. Any of the following PG states are problematic if they do not clear quickly: 对等, stale, incomplete, and the lack of active.

For detailed information about which PGs are affected, run the following command:

ceph health detail

In most cases, the root cause of this issue is that one or more OSDs are currently down: see OSD_DOWN above.

To see the state of a specific problematic PG, run the following command:

ceph tell <pgid> query

PG_DEGRADED

Data redundancy is reduced for some data: in other words, the cluster does not have the desired number of replicas for all data (in the case of replicated pools) or erasure code fragments (in the case of erasure-coded pools). More precisely, one or more Placement Groups (PGs):

have the degraded或undersized flag set, which means that there are not enough instances of that PG in the cluster; or
have not had the clean state set for a long time.

For detailed information about which PGs are affected, run the following command:

ceph health detail

In most cases, the root cause of this issue is that one or more OSDs are currently “down”: see OSD_DOWN above.

To see the state of a specific problematic PG, run the following command:

ceph tell <pgid> query

PG_RECOVERY_FULL

Data redundancy might be reduced or even put at risk for some data due to a lack of free space in the cluster. More precisely, one or more Placement Groups have the recovery_toofull flag set, which means that the cluster is unable to migrate or recover data because one or more OSDs are above the full threshold.

For steps to resolve this condition, see OSD_FULL above.

PG_BACKFILL_FULL

Data redundancy might be reduced or even put at risk for some data due to a lack of free space in the cluster. More precisely, one or more Placement Groups have the backfill_toofull flag set, which means that the cluster is unable to migrate or recover data because one or more OSDs are above the backfillfull threshold.

For steps to resolve this condition, see OSD_BACKFILLFULL above.

PG_DAMAGED

Data scrubbing has discovered problems with data consistency in the cluster. More precisely, one or more Placement Groups either (1) have the inconsistent或snaptrim_error flag set, which indicates that an earlier data scrub operation found a problem, or (2) have the repair flag set, which means that a repair for such an inconsistency is currently in progress.

更多信息，请参阅故障排除 PGs.

OSD_SCRUB_ERRORS

Recent OSD scrubs have discovered inconsistencies. This alert is generally paired with PG_DAMAGED (see above).

更多信息，请参阅故障排除 PGs.

OSD_TOO_MANY_REPAIRS

The count of read repairs has exceeded the config value threshold mon_osd_warn_num_repaired(默认：10). Because scrub handles errors only for data at rest, and because any read error that occurs when another replica is available is repaired immediately so that the client can get the object data, there might exist failing disks that are not registering any scrub errors. This repair count is maintained as a way of identifying any such failing disks.

In order to allow clearing of the warning, a new command ceph tell osd.# clear_shards_repaired [count] has been added. By default it will set the repair count to 0. A count value can be passed to the command. Thus, the administrator has the option to re-enable the warning by passing the value of mon_osd_warn_num_repaired (or above) to the command. An alternative to using clear_shards_repaired is to mute the OSD_TOO_MANY_REPAIRS alert with ceph health mute.

LARGE_OMAP_OBJECTS

One or more pools contain large omap objects, as determined by osd_deep_scrub_large_omap_object_key_threshold (the threshold for the number of keys to determine what is considered a large omap object) or osd_deep_scrub_large_omap_object_value_sum_threshold (the threshold for the summed size in bytes of all key values to determine what is considered a large omap object) or both. To find more information on object name, key count, and size in bytes, search the cluster log for ‘Large omap object found’. This issue can be caused by RGW-bucket index objects that do not have automatic resharding enabled. For more information on resharding, see RGW Dynamic Bucket Index Resharding.

To adjust the thresholds mentioned above, run the following commands:

ceph config set osd osd_deep_scrub_large_omap_object_key_threshold <keys>
ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold <bytes>

CACHE_POOL_NEAR_FULL

A cache-tier pool is nearly full, as determined by the target_max_bytes和target_max_objects properties of the cache pool. When the pool reaches the target threshold, write requests to the pool might block while data is flushed and evicted from the cache. This state normally leads to very high latencies and poor performance.

To adjust the cache pool’s target size, run the following commands:

ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
ceph osd pool set <cache-pool-name> target_max_objects <objects>

There might be other reasons that normal cache flush and evict activity are throttled: for example, reduced availability of the base tier, reduced performance of the base tier, or overall cluster load.

TOO_FEW_PGS

The number of Placement Groups (PGs) that are in use in the cluster is below the configurable threshold of mon_pg_warn_min_per_osd PGs per OSD. This can lead to suboptimal distribution and suboptimal balance of data across the OSDs in the cluster, and a reduction of overall performance.

If data pools have not yet been created, this condition is expected.

To address this issue, you can increase the PG count for existing pools or create new pools. For more information, see 选择 PG 数量.

POOL_PG_NUM_NOT_POWER_OF_TWO

One or more pools have a pg_num value that is not a power of two. Although this is not strictly incorrect, it does lead to a less balanced distribution of data because some Placement Groups will have roughly twice as much data as others have.

This is easily corrected by setting the pg_num value for the affected pool(s) to a nearby power of two. To do so, run the following command:

ceph osd pool set <pool-name> pg_num <value>

To disable this health check, run the following command:

ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false

POOL_TOO_FEW_PGS

One or more pools should probably have more Placement Groups (PGs), given the amount of data that is currently stored in the pool. This issue can lead to suboptimal distribution and suboptimal balance of data across the OSDs in the cluster, and a reduction of overall performance. This alert is raised only if the pg_autoscale_mode property on the pool is set to warn.

To disable the alert, entirely disable auto-scaling of PGs for the pool by running the following command:

ceph osd pool set <pool-name> pg_autoscale_mode off

To allow the cluster to automatically adjust the number of PGs for the pool, run the following command:

ceph osd pool set <pool-name> pg_autoscale_mode on

Alternatively, to manually set the number of PGs for the pool to the recommended amount, run the following command:

ceph osd pool set <pool-name> pg_num <new-pg-num>

更多信息，请参阅选择 PG 数量和Autoscaling placement groups.

TOO_MANY_PGS

The number of Placement Groups (PGs) in use in the cluster is above the configurable threshold of mon_max_pg_per_osd PGs per OSD. If this threshold is exceeded, the cluster will not allow new pools to be created, pool pg_num to be increased, or pool replication to be increased (any of which, if allowed, would lead to more PGs in the cluster). A large number of PGs can lead to higher memory utilization for OSD daemons, slower peering after cluster state changes (for example, OSD restarts, additions, or removals), and higher load on the Manager and Monitor daemons.

The simplest way to mitigate the problem is to increase the number of OSDs in the cluster by adding more hardware. Note that, because the OSD count that is used for the purposes of this health check is the number of in OSDs, marking out OSDs in (if there are any out OSDs available) can also help. To do so, run the following command:

ceph osd in <osd id(s)>

更多信息，请参阅选择 PG 数量.

POOL_TOO_MANY_PGS

One or more pools should probably have fewer Placement Groups (PGs), given the amount of data that is currently stored in the pool. This issue can lead to higher memory utilization for OSD daemons, slower peering after cluster state changes (for example, OSD restarts, additions, or removals), and higher load on the Manager and Monitor daemons. This alert is raised only if the pg_autoscale_mode property on the pool is set to warn.

To disable the alert, entirely disable auto-scaling of PGs for the pool by running the following command:

ceph osd pool set <pool-name> pg_autoscale_mode off

To allow the cluster to automatically adjust the number of PGs for the pool, run the following command:

ceph osd pool set <pool-name> pg_autoscale_mode on

Alternatively, to manually set the number of PGs for the pool to the recommended amount, run the following command:

ceph osd pool set <pool-name> pg_num <new-pg-num>

更多信息，请参阅选择 PG 数量和Autoscaling placement groups.

POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

One or more pools does have a target_size_bytes property that is set in order to estimate the expected size of the pool, but the value or values of this property are greater than the total available storage (either by themselves or in combination with other pools).

This alert is usually an indication that the target_size_bytes value for the pool is too large and should be reduced or set to zero. To reduce the target_size_bytes value or set it to zero, run the following command:

ceph osd pool set <pool-name> target_size_bytes 0

The above command sets the value of target_size_bytes to zero. To set the value of target_size_bytes to a non-zero value, replace the 0 with that non-zero value.

更多信息，请参阅指定预期池大小.

POOL_HAS_TARGET_SIZE_BYTES_AND_RATIO

One or more pools have both target_size_bytes和target_size_ratio set in order to estimate the expected size of the pool. Only one of these properties should be non-zero. If both are set to a non-zero value, then target_size_ratio takes precedence and target_size_bytes is ignored.

To reset target_size_bytes to zero, run the following command:

ceph osd pool set <pool-name> target_size_bytes 0

更多信息，请参阅指定预期池大小.

TOO_FEW_OSDS

The number of OSDs in the cluster is below the configurable threshold of osd_pool_default_size. This means that some or all data may not be able to satisfy the data protection policy specified in CRUSH rules and pool settings.

SMALLER_PGP_NUM

One or more pools have a pgp_num value less than pg_num. This alert is normally an indication that the Placement Group (PG) count was increased without any increase in the placement behavior.

This disparity is sometimes brought about deliberately, in order to separate out the 被分割 step when the PG count is adjusted from the data migration that is needed when pgp_num is changed.

This issue is normally resolved by setting pgp_num to match pg_num, so as to trigger the data migration, by running the following command:

ceph osd pool set <pool> pgp_num <pg-num-value>

MANY_OBJECTS_PER_PG

One or more pools have an average number of objects per Placement Group (PG) that is significantly higher than the overall cluster average. The specific threshold is determined by the mon_pg_warn_max_object_skew configuration value.

This alert is usually an indication that the pool(s) that contain most of the data in the cluster have too few PGs, or that other pools that contain less data have too many PGs. See TOO_MANY_PGS above.

To silence the health check, raise the threshold by adjusting the mon_pg_warn_max_object_skew config option on the managers.

The health check is silenced for a specific pool only if pg_autoscale_mode被设置为on.

POOL_APP_NOT_ENABLED

A pool exists but the pool has not been tagged for use by a particular application.

To resolve this issue, tag the pool for use by an application. For example, if the pool is used by RBD, run the following command:

rbd pool init <poolname>

Alternatively, if the pool is being used by a custom application (here ‘foo’), you can label the pool by running the following low-level command:

ceph osd pool application enable foo

更多信息，请参阅将池与应用程序关联.

POOL_FULL

One or more pools have reached (or are very close to reaching) their quota. The threshold to raise this health check is determined by the mon_pool_quota_crit_threshold configuration option.

Pool quotas can be adjusted up or down (or removed) by running the following commands:

ceph osd pool set-quota <pool> max_bytes <bytes>
ceph osd pool set-quota <pool> max_objects <objects>

To disable a quota, set the quota value to 0.

POOL_NEAR_FULL

One or more pools are approaching a configured fullness threshold.

One of the several thresholds that can raise this health check is determined by the mon_pool_quota_warn_threshold configuration option.

Pool quotas can be adjusted up or down (or removed) by running the following commands:

ceph osd pool set-quota <pool> max_bytes <bytes>
ceph osd pool set-quota <pool> max_objects <objects>

To disable a quota, set the quota value to 0.

Other thresholds that can raise the two health checks above are mon_osd_nearfull_ratio和mon_osd_full_ratio. For details and resolution, see 存储容量和没有可用的驱动器空间.

OBJECT_MISPLACED

One or more objects in the cluster are not stored on the node that CRUSH prefers that they be stored on. This alert is an indication that data migration due to a recent cluster change has not yet completed.

Misplaced data is not a dangerous condition in and of itself; data consistency is never at risk, and old copies of objects will not be removed until the desired number of new copies (in the desired locations) has been created.

OBJECT_UNFOUND

One or more objects in the cluster cannot be found. More precisely, the OSDs know that a new or updated copy of an object should exist, but no such copy has been found on OSDs that are currently online.

Read or write requests to unfound objects will block.

Ideally, a “down” OSD that has a more recent copy of the unfound object can be brought back online. To identify candidate OSDs, check the peering state of the PG(s) responsible for the unfound object. To see the peering state, run the following command:

ceph tell <pgid> query

On the other hand, if the latest copy of the object is not available, the cluster can be told to roll back to a previous version of the object. For more information, see 找不到的对象.

SLOW_OPS

One or more OSD requests or monitor requests are taking a long time to process. This alert might be an indication of extreme load, a slow storage device, or a software bug.

To query the request queue for the daemon that is causing the slowdown, run the following command from the daemon’s host:

ceph daemon osd.<id> ops

To see a summary of the slowest recent requests, run the following command:

ceph daemon osd.<id> dump_historic_ops

To see the location of a specific OSD, run the following command:

ceph osd find osd.<id>

PG_NOT_SCRUBBED

One or more Placement Groups (PGs) have not been scrubbed recently. PGs are normally scrubbed within an interval determined by osd_scrub_max_interval globally. This interval can be overridden on per-pool basis by changing the value of the variable scrub_max_interval. This health check is raised if a certain percentage (determined by mon_warn_pg_not_scrubbed_ratio) of the interval has elapsed after the time the scrub was scheduled and no scrub has been performed.

PGs are scrubbed only if they are flagged as clean (which means that they are to be cleaned, and not that they have been examined and found to be clean). Misplaced or degraded PGs will not be flagged as clean的远程文件系统PG_AVAILABILITY和PG_DEGRADED上文）自动配置该参数。

To manually initiate a scrub of a clean PG, run the following command:

PG_NOT_DEEP_SCRUBBED

One or more Placement Groups (PGs) have not been deep scrubbed recently. PGs are normally scrubbed every osd_deep_scrub_interval seconds at most. This health check is raised if a certain percentage (determined by mon_warn_pg_not_deep_scrubbed_ratio) of the interval has elapsed after the time the scrub was scheduled and no scrub has been performed.

PGs will receive a deep scrub only if they are flagged as clean (which means that they are to be cleaned, and not that they have been examined and found to be clean). Misplaced or degraded PGs might not be flagged as clean的远程文件系统PG_AVAILABILITY和PG_DEGRADED上文）自动配置该参数。

This document offers two methods of setting the value of osd_deep_scrub_interval设置未启用（默认情况下已启用），则当前没有客户端需要升级。在这种情况下，安全地禁用osd_deep_scrub_interval globally. The second method listed here changes the value of osd_deep scrub interval for OSDs and for the Manager daemon.

First Method

To manually initiate a deep scrub of a clean PG, run the following command:

ceph pg deep-scrub <pgid>

Under certain conditions, the warning PGs not deep-scrubbed in time appears. This might be because the cluster contains many large PGs, which take longer to deep-scrub. To remedy this situation, you must change the value of osd_deep_scrub_interval globally.

通过运行ceph health detail returns a pgs not deep-scrubbed in time warning:

# ceph health detail
HEALTH_WARN 1161 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1161 pgs not deep-scrubbed in time
pg 86.fff not deep-scrubbed since 2024-08-21T02:35:25.733187+0000

Change osd_deep_scrub_interval globally:

ceph config set global osd_deep_scrub_interval 1209600

The above procedure was developed by Eugen Block in September of 2024.

请参阅Eugen Block’s blog post for much more detail.

请参阅Redmine tracker issue #44959.

Second Method

To manually initiate a deep scrub of a clean PG, run the following command:

ceph pg deep-scrub <pgid>

Under certain conditions, the warning PGs not deep-scrubbed in time appears. This might be because the cluster contains many large PGs, which take longer to deep-scrub. To remedy this situation, you must change the value of osd_deep_scrub_interval for OSDs and for the Manager daemon.

通过运行ceph health detail returns a pgs not deep-scrubbed in time warning:

# ceph health detail
HEALTH_WARN 1161 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1161 pgs not deep-scrubbed in time
pg 86.fff not deep-scrubbed since 2024-08-21T02:35:25.733187+0000

Change the osd_deep_scrub_interval for OSDs:

ceph config set osd osd_deep_scrub_interval 1209600

Change the osd_deep_scrub_interval for Managers:

ceph config set mgr osd_deep_scrub_interval 1209600

The above procedure was developed by Eugen Block in September of 2024.

请参阅Eugen Block’s blog post for much more detail.

请参阅Redmine tracker issue #44959.

PG_SLOW_SNAP_TRIMMING

The snapshot trim queue for one or more PGs has exceeded the configured warning threshold. This alert indicates either that an extremely large number of snapshots was recently deleted, or that OSDs are unable to trim snapshots quickly enough to keep up with the rate of new snapshot deletions.

The warning threshold is determined by the mon_osd_snap_trim_queue_warn_on option (default: 32768).

This alert might be raised if OSDs are under excessive load and unable to keep up with their background work, or if the OSDs’ internal metadata database is heavily fragmented and unable to perform. The alert might also indicate some other performance issue with the OSDs.

The exact size of the snapshot trim queue is reported by the snaptrimq_len field of ceph pg ls -f json-detail.

拉伸模式

INCORRECT_NUM_BUCKETS_STRETCH_MODE

Stretch mode currently only support 2 dividing buckets with OSDs, this warning suggests that the number of dividing buckets is not equal to 2 after stretch mode is enabled. You can expect unpredictable failures and MON assertions until the condition is fixed.

We encourage you to fix this by removing additional dividing buckets or bump the number of dividing buckets to 2.

UNEVEN_WEIGHTS_STRETCH_MODE

The 2 dividing buckets must have equal weights when stretch mode is enabled. This warning suggests that the 2 dividing buckets have uneven weights after stretch mode is enabled. This is not immediately fatal, however, you can expect Ceph to be confused when trying to process transitions between dividing buckets.

We encourage you to fix this by making the weights even on both dividing buckets. This can be done by making sure the combined weight of the OSDs on each dividing bucket are the same.

NONEXISTENT_MON_CRUSH_LOC_STRETCH_MODE

The CRUSH location specified for the monitor must belong to one of the dividing buckets when stretch mode is enabled. With the tiebreaker monitor being the only exception.

This warning suggests that one or more monitors have a CRUSH location that does not belong to any of the dividing buckets in stretch mode.

We encourage you to fix this by making sure the CRUSH location of the monitor belongs to one of the dividing buckets.

NVMeoF Gateway

NVMEOF_SINGLE_GATEWAY

One of the gateway group has only one gateway. This is not ideal because it makes high availability (HA) impossible with a single gatway in a group. This can lead to problems with failover and failback operations for the NVMeoF gateway.

It’s recommended to have multiple NVMeoF gateways in a group.

NVMEOF_GATEWAY_DOWN

Some of the gateways are in the GW_UNAVAILABLE state. If a NVMeoF daemon has crashed, the daemon log file (found at /var/log/ceph/) may contain troubleshooting information.

NVMEOF_GATEWAY_DELETING

Some of the gateways are in the GW_DELETING state. They will stay in this state until all the namespaces under the gateway’s load balancing group are moved to another load balancing group ID. This is done automatically by the load balancing process. If this alert persist for a long time, there might be an issue with that process.

Miscellaneous

RECENT_CRASH

One or more Ceph daemons have crashed recently, and the crash(es) have not yet been acknowledged and archived by the administrator. This alert might indicate a software bug, a hardware problem (for example, a failing disk), or some other problem.

To list recent crashes, run the following command:

ceph crash ls-new

To examine information about a specific crash, run the following command:

ceph crash info <crash-id>

To silence this alert, you can archive the crash (perhaps after the crash has been examined by an administrator) by running the following command:

ceph crash archive <crash-id>

Similarly, to archive all recent crashes, run the following command:

ceph crash archive-all

Archived crashes will still be visible by running the command ceph crash ls, but not by running the command ceph crash ls-new.

The time period that is considered recent is determined by the option mgr/crash/warn_recent_interval (default: two weeks).

To entirely disable this alert, run the following command:

ceph config set mgr/crash/warn_recent_interval 0

RECENT_MGR_MODULE_CRASH

One or more ceph-mgr modules have crashed recently, and the crash(es) have not yet been acknowledged and archived by the administrator. This alert usually indicates a software bug in one of the software modules that are running inside the ceph-mgr daemon. The module that experienced the problem might be disabled as a result, but other modules are unaffected and continue to function as expected.

As with the RECENT_CRASH health check, a specific crash can be inspected by running the following command:

ceph crash info <crash-id>

To silence this alert, you can archive the crash (perhaps after the crash has been examined by an administrator) by running the following command:

ceph crash archive <crash-id>

Similarly, to archive all recent crashes, run the following command:

ceph crash archive-all

Archived crashes will still be visible by running the command ceph crash ls but not by running the command ceph crash ls-new.

The time period that is considered recent is determined by the option mgr/crash/warn_recent_interval (default: two weeks).

To entirely disable this alert, run the following command:

ceph config set mgr/crash/warn_recent_interval 0

TELEMETRY_CHANGED

Telemetry has been enabled, but because the contents of the telemetry report have changed in the meantime, telemetry reports will not be sent.

Ceph developers occasionally revise the telemetry feature to include new and useful information, or to remove information found to be useless or sensitive. If any new information is included in the report, Ceph requires the administrator to re-enable telemetry. This requirement ensures that the administrator has an opportunity to (re)review the information that will be shared.

To review the contents of the telemetry report, run the following command:

ceph telemetry show

Note that the telemetry report consists of several channels that may be independently enabled or disabled. For more information, see 遥测模块.

To re-enable telemetry (and silence the alert), run the following command:

ceph telemetry on

To disable telemetry (and silence the alert), run the following command:

ceph telemetry off

AUTH_BAD_CAPS

One or more auth users have capabilities that cannot be parsed by the monitors. As a general rule, this alert indicates that there are one or more daemon types that the user is not authorized to use to perform any action.

This alert is most likely to be raised after an upgrade if (1) the capabilities were set with an older version of Ceph that did not properly validate the syntax of those capabilities, or if (2) the syntax of the capabilities has changed.

To remove the user(s) in question, run the following command:

ceph auth rm <entity-name>

(This resolves the health check, but it prevents clients from being able to authenticate as the removed user.)

Alternatively, to update the capabilities for the user(s), run the following command:

ceph auth <entity-name> <daemon-type> <caps> [<daemon-type> <caps> ...]

For more information about auth capabilities, see User Management.

OSD_NO_DOWN_OUT_INTERVAL

The mon_osd_down_out_interval选项设置为零，这意味着当 OSD 宕机时，系统不会自动执行任何修复或恢复操作。相反，管理员或外部协调器必须手动将“宕机” OSD 标记为out（通过运行ceph osd out <osd-id>）以触发恢复。

此选项通常设置为五分钟或十分钟，这应该足以让主机重新启动或重新启动。

要抑制此警报，请设置mon_warn_on_osd_down_out_interval_zerotofalse:

ceph config global mon mon_warn_on_osd_down_out_interval_zero false

DASHBOARD_DEBUG

仪表板调试模式已启用。这意味着如果在处理 REST API 请求时出现错误，HTTP 错误响应将包含 Python 跟踪。此模式应在生产环境中禁用，因为跟踪可能包含并暴露敏感信息。

要禁用调试模式，请运行以下命令：

ceph dashboard debug disable

由 Ceph 基金会带给您

Ceph 文档是一个社区资源，由非盈利的 Ceph 基金会资助和托管Ceph Foundation. 如果您想支持这一点和我们的其他工作，请考虑加入现在加入.