注意

本文档适用于 Ceph 开发版本。

Prometheus 模块

管理器prometheus该模块实现了一个Prometheus导出器，用于从管理器中的收集点暴露Ceph性能计数器。管理器接收来自所有进程（包括mons和OSDs）的性能计数器模式数据和计数器数据，并维护最新样本的循环缓冲区。该模块监听一个HTTP端点，并在抓取时检索每个计数器的最新样本。HTTP路径和查询参数被忽略。所有报告实体的所有现有计数器都以Prometheus说明格式返回。（参见Prometheus的MMgrReport messages from all MgrClient processes (including mons and OSDs) with performance counter schema data and counter data, and maintains a circular buffer of the latest samples. This module listens on an HTTP endpoint and retrieves the latest sample of every counter when scraped. The HTTP path and query parameters are ignored. All extant counters for all reporting entities are returned in the Prometheus exposition format. (See the Prometheus 文档。）

启用Prometheus输出

启用故障prometheus模块，通过运行以下命令：

ceph mgr module enable prometheus

配置

Note

The prometheus管理器模块必须重新启动以应用配置更改。

server_addr

该模块监听HTTP请求的IPv4或IPv6地址

type:

str

default:

::

server_port

该模块监听HTTP请求的端口

type:

int

default:

9283

scrape_interval

type:

float

default:

15.0

缓存

type:

bool

default:

true

stale_cache_strategy

type:

str

default:

log

rbd_stats_pools

type:

str

default:

<空字符串>

rbd_stats_pools_refresh_interval

type:

int

default:

300

standby_behaviour

type:

str

default:

default

standby_error_status_code

type:

int

default:

500

允许范围:

[400, 599]

exclude_perf_counters

从单个Prometheus导出器收集性能计数器可能会降低ceph-mgr性能，特别是在大型集群中。默认情况下，现在使用Ceph-exporter守护进程来收集性能计数器。只有在没有部署ceph-exporters时，才应禁用此功能。

type:

bool

default:

true

默认情况下，该模块将在端口9283上接受所有主机上的IPv4和IPv6地址的HTTP请求。端口和监听地址可以使用ceph config set进行配置，键为mgr/prometheus/server_addr和mgr/prometheus/server_port。此端口注册到Prometheus的注册表中.

ceph config set mgr mgr/prometheus/server_addr 0.0.0.
ceph config set mgr mgr/prometheus/server_port 9283

警告

The mgr/prometheus/scrape_interval的此模块应与Prometheus的抓取间隔匹配才能正常工作。

模块中的抓取间隔用于缓存目的，并用于确定缓存何时过期。

不建议使用低于10秒的抓取间隔。然而，在某些情况下，增加抓取间隔可能有用。建议使用15秒作为抓取间隔。

要在Prometheus模块中设置不同的抓取间隔，请将scrape_interval设置为所需值：

ceph config set mgr mgr/prometheus/scrape_interval 20

在大型集群（>1000 OSDs）中，获取指标的时间可能会变得很重要。如果没有缓存，Prometheus管理器模块可能会，特别是在多个Prometheus实例的情况下，过载管理器并导致Ceph管理器实例无响应或崩溃。因此，缓存默认启用。这意味着缓存有可能过期。当从Ceph获取指标的时间超过配置的mgr/prometheus/scrape_interval.

时，缓存被视为过期。如果是这种情况，将记录警告，并且该模块将要么响应503 HTTP状态代码（服务不可用），要么返回缓存的内容，即使它可能已过期。

此行为可以配置。默认情况下，它将返回503 HTTP状态代码（服务不可用）。您可以使用ceph config set commands.

设置其他选项来配置模块以使用可能过期的数据响应return:

ceph config set mgr mgr/prometheus/stale_cache_strategy return

设置模块以使用“服务不可用”响应fail:

ceph config set mgr mgr/prometheus/stale_cache_strategy fail

如果您确信不需要缓存，可以禁用它：

ceph config set mgr mgr/prometheus/cache false

如果您使用prometheus模块作为反向代理或负载均衡器，可以通过切换到error-模式：

ceph config set mgr mgr/prometheus/standby_behaviour error

如果设置，当请求prometheus module will respond with a HTTP error when requesting /从备用实例时，该模块将响应HTTP错误。默认错误代码是500，但您可以使用：

ceph config set mgr mgr/prometheus/standby_error_status_code 503

有效错误代码在400-599之间。

要切换回默认行为，只需将配置键设置为default:

ceph config set mgr mgr/prometheus/standby_behaviour default

Ceph 健康检查

管理器prometheus该模块跟踪并维护Ceph健康检查的历史记录，将其作为离散指标暴露给Prometheus服务器。这允许为特定的健康检查事件配置Alertmanager规则。

指标采用以下形式：

# HELP ceph_health_detail healthcheck status by type (0=inactive, 1=active)
# TYPE ceph_health_detail gauge
ceph_health_detail{name="OSDMAP_FLAGS",severity="HEALTH_WARN"} 0.0
ceph_health_detail{name="OSD_DOWN",severity="HEALTH_WARN"} 1.0
ceph_health_detail{name="PG_DEGRADED",severity="HEALTH_WARN"} 1.0

健康检查历史记录可以通过运行以下命令进行检索和清除：

ceph healthcheck history ls [--format {plain|json|json-pretty}]
ceph healthcheck history clear

The ceph healthcheck ls该命令提供有关集群自上次clear命令发出以来遇到的健康检查的概述：

[ceph: root@c8-node1 /]# ceph healthcheck history ls
Healthcheck Name          First Seen (UTC)      Last seen (UTC)       Count  Active
OSDMAP_FLAGS              2021/09/16 03:17:47   2021/09/16 22:07:40       2    No
OSD_DOWN                  2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
PG_DEGRADED               2021/09/17 00:11:59   2021/09/17 00:11:59       1   Yes
3 health check(s) listed

RBD IO统计

The prometheus该模块可以选择通过启用动态OSD性能计数器来收集每个图像的RBD IO统计。为池中指定的mgr/prometheus/rbd_stats_pools配置参数中的所有图像收集统计信息。该参数是逗号或空格分隔的条目列表。如果未指定RBD命名空间，则收集池中所有命名空间的统计信息。pool[/namespace] entries. If the RBD namespace is not specified, statistics are collected for all namespaces in the pool.

要启用对名为pool1, pool2和poolN:

ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN"

的RBD池的统计信息收集，可以使用通配符来指示所有池或命名空间：

ceph config set mgr mgr/prometheus/rbd_stats_pools "*"

该模块通过扫描指定的池和命名空间来维护所有可用图像的列表。刷新周期可通过mgr/prometheus/rbd_stats_pools_refresh_interval参数配置，默认为300秒（5分钟）。如果模块检测到来自先前未知RBD图像的统计信息，它将提前强制刷新。

要将同步间隔设置为10分钟，请运行以下命令：

ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600

Ceph守护进程性能计数器指标

随着引入ceph-exporter守护进程，该prometheus模块将不再默认将Ceph守护进程性能计数器作为Prometheus指标导出。但是，可以通过设置模块选项exclude_perf_counterstofalse:

ceph config set mgr mgr/prometheus/exclude_perf_counters false

统计名称和标签

这些Prometheus统计名称是Ceph原生名称，其中非法字符., -和::转换为_，并ceph_预先添加。

所有守护进程统计信息都有一个ceph_daemon标签，其值标识了它们来自的守护进程的类型和ID，例如osd.123。一个给定的指标可能由多种类型的守护进程报告，因此例如当查询OSD RocksDB统计信息时，您可以使用形式为ceph_daemon=~'osd.*'的模式来约束查询，以便排除Monitor RocksDB指标。

集群统计信息（即全局属于Ceph集群的统计信息）具有适当的标签，以报告相应的实体。例如，与池相关的指标有一个pool_id标签。

长期平均值表示Ceph统计直方图，它们由成对的<name>_sum和<name>_count指标表示。这类似于Prometheus中直方图表示的方式，并且它们被类似地处理.

池和OSD元数据系列

系列导出以促进在特定元数据字段上显示和查询。

池有一个形式为ceph_pool_metadata的指标：

ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0

OSD有一个形式为ceph_osd_metadata的指标：

ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0

将驱动器统计与node_exporter关联

Ceph集群Prometheus指标与来自Prometheusnode_exporter.

的通用主机指标一起使用node_exporter的驱动器统计信息，Ceph创建形式为以下的系列：

ceph_disk_occupation_human{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}

要通过OSD ID查询驱动器指标，请在Prometheus查询中使用and运算符或*运算符。所有元数据指标（如ceph_disk_occupation_human）的值1，以便它们以中立的方式与PromQL*运算符组合。使用*允许使用group_left和group_right分组修饰符，以便查询结果从查询的一侧获得额外的标签。

请参阅prometheus文档了解有关构建PromQL查询和通过Prometheus表达式浏览器交互式探索的更多信息。

例如，我们可以运行如下查询：

rate(node_disk_written_bytes_total[30s]) and
on (device,instance) ceph_disk_occupation_human{ceph_daemon="osd.0"}

默认情况下，上述查询不会返回任何指标，因为两个指标的instance标签不匹配。的instance标签ceph_disk_occupation_human将是当前活动的Manager。

以下部分概述了两种解决此问题的方法。

Note

如果您需要在ceph_daemon标签而不是device和instance标签上分组，使用ceph_disk_occupation_human可能无法可靠地工作。建议使用ceph_disk_occupation instead.

。区别在于ceph_disk_occupation_human可能会在多个OSD共享设备的情况下将多个OSD分组到单个ceph_daemon标签的值中。

使用label_replace

The label_replace函数（cp.label_replace文档）

要关联OSD与其驱动器的写入速率，可以使用如下形式的查询：

label_replace(
    rate(node_disk_written_bytes_total[30s]),
    "exported_instance",
    "$1",
    "instance",
    "(.*):.*"
) and on (device, exported_instance) ceph_disk_occupation_human{ceph_daemon="osd.0"}

配置Prometheus服务器

honor_labels

要启用Ceph输出与任何主机相关的正确标记数据，请在将管理器端点添加到您的Prometheus配置时使用honor_labels设置。

这指示Ceph导出正确的instance标签，而Prometheus在摄取时不覆盖它们。如果没有此设置，Prometheus会应用一个instance标签，其中包括从中抓取每个指标的端点的主机名和端口。由于Ceph集群有多个Manager守护进程，这会导致instance标签在活动Manager守护进程更改时发生变化。

如果这不受欢迎，可以在Prometheus目标配置中设置一个自定义instance标签。您可能希望将其设置为您的第一个Manager的主机名，或者像ceph_cluster.

node_exporter主机名标签

将您的instance标签设置为与Ceph的OSD元数据中的instance字段中显示的内容相匹配。这通常是节点的短主机名。

只有在您希望将Ceph统计信息与主机统计信息关联时才需要这样做，但您可能会发现这样做有助于将来关联历史数据。

示例配置

此示例显示了一个具有Manager和node_exporter放置在名为senta04的服务器上的部署。请注意，这要求为每个instance标签。node_exporter目标添加一个适当且唯一的

这只是一个示例：配置Prometheus抓取目标和标签重写规则还有其他方法。

prometheus.yml

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    file_sd_configs:
      - files:
        - node_targets.yml
  - job_name: 'ceph'
    honor_labels: true
    file_sd_configs:
      - files:
        - ceph_targets.yml

ceph_targets.yml

[
    {
        "targets": [ "senta04.mydomain.com:9283" ],
        "labels": {}
    }
]

node_targets.yml

[
    {
        "targets": [ "senta04.mydomain.com:9100" ],
        "labels": {
            "instance": "senta04"
        }
    }
]

Notes

计数器和测量值被导出。直方图和长期平均值目前未导出。Ceph的2-D直方图有可能减少为两个单独的1-D直方图，长期平均值有可能作为Prometheus的Summary类型的指标导出。

时间戳，与许多导出器一样，由Prometheus在摄取时设置为Prometheus服务器抓取时间。Prometheus期望它同步地轮询实际计数器进程。可以随统计报告提供时间戳，但Prometheus团队强烈建议不要这样做。这意味着时间戳可能会延迟不可预测的量。不清楚这是否会存在问题，但值得了解。

由 Ceph 基金会带给您

Ceph 文档是一个社区资源，由非盈利的 Ceph 基金会资助和托管Ceph Foundation. 如果您想支持这一点和我们的其他工作，请考虑加入现在加入.