注意

本文档适用于 Ceph 开发版本。

使用LTTng跟踪Ceph

使用LTTng配置Ceph

使用-DWITH_LTTNG选项(默认:ON):

./do_cmake -DWITH_LTTNG=ON

追踪配置选项必须在ceph.conf中设置为true。

bluestore_tracing
event_tracing (-DWITH_EVENTTRACE)
osd_function_tracing (-DWITH_OSD_INSTRUMENT_FUNCTIONS)
osd_objectstore_tracing (actually filestore tracing)
rbd_tracing
osd_tracing
rados_tracing
rgw_op_tracing
rgw_rados_tracing

测试追踪

启动LTTng守护进程:

lttng-sessiond --daemonize

使用启用追踪选项运行vstart集群:

../src/vstart.sh -d -n -l -e -o "osd_tracing = true"

列出可用的追踪点:

lttng list --userspace

你将得到类似以下内容:

UST events:
-------------
PID: 100859 - Name: /path/to/ceph-osd
    pg:queue_op (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
    osd:do_osd_op_post (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
    osd:do_osd_op_pre_unknown (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
    osd:do_osd_op_pre_copy_from (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
    osd:do_osd_op_pre_copy_get (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
    ...

创建追踪会话,启用追踪点并开始追踪:

lttng create trace-test
lttng enable-event --userspace osd:*
lttng start

执行一些Ceph操作:

rados bench -p ec 5 write

停止追踪并查看结果:

lttng stop
lttng view

销毁追踪会话:

lttng destroy

使用Blkin跟踪Ceph

自版本以来已弃用此功能:该功能已在Squid版本中弃用,将在后续版本中移除。

Ceph可以使用由Marios Kogias等人创建的Blkin库,该库能够从请求进入系统的时间点开始,直到最终由RADOS提供服务,跟踪特定的请求。

通常,Blkin实现了Dapper追踪语义,以显示IO请求可能触发的不同处理阶段之间的因果关系。目标是系统内请求的端到端可视化,并附带有关每个处理阶段的延迟信息。由于LTTng,这可以以最小的开销和实时进行。然后可以使用Twitter的Zipkin.

使用Blkin配置Ceph

使用-DWITH_BLKIN选项(需要-DWITH_LTTNG):

./do_cmake -DWITH_LTTNG=ON -DWITH_BLKIN=ON

Blkin配置选项必须在ceph.conf中设置为true。

rbd_blkin_trace_all
osd_blkin_trace_all
osdc_blkin_trace_all

测试Blkin

测试Ceph的Blkin追踪很容易。假设你还没有运行Ceph,并且你已经使用Blkin支持编译了Ceph,但你没有安装它。然后使用Ceph的src目录中的vstart.sh脚本

OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
lttng list --userspace

你将看到类似以下内容:

UST events:
-------------
PID: 8987 - Name: ./ceph-osd
      zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)

PID: 8407 - Name: ./ceph-mon
      zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)

...

接下来停止Ceph,以便启用追踪点。:

../src/stop.sh

启动一个LTTng会话并启用追踪点。:

lttng create blkin-test
lttng enable-event --userspace zipkin:timestamp
lttng enable-event --userspace zipkin:keyval_integer
lttng enable-event --userspace zipkin:keyval_string
lttng start

然后再次启动Ceph。:

OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"

你可能想要检查ceph是否已启动。:

ceph status

现在使用rados放入一些内容,检查它是否成功,取回它,并删除它。:

ceph osd pool create test-blkin
rados put test-object-1 ../src/vstart.sh --pool=test-blkin
rados -p test-blkin ls
ceph osd map test-blkin test-object-1
rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
md5sum vstart*
rados rm test-object-1 --pool=test-blkin

你也可以使用examples/librados/rados bench.

中的示例。然后停止LTTng会话并查看收集到的内容。:

lttng stop
lttng view

你将看到类似以下内容:

[15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" }
[15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 }
[15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" }

安装Zipkin

使用Blkin的一个优点是你可以使用Zipkin查看追踪。用户应该将Zipkin作为追踪点收集器和一个Web服务运行。可执行jar在端口9410上运行收集器,在端口9411上运行Web界面

下载Zipkin软件包:

git clone https://github.com/openzipkin/zipkin && cd zipkin
wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
java -jar zipkin.jar

或者,启动docker镜像:

docker run -d -p 9411:9411 openzipkin/Zipkin

在Zipkin-web中显示Ceph的Blkin追踪

下载babeltrace-zipkin项目。该项目使用blkin生成的追踪并将其通过scribe发送到Zipkin收集器:

git clone https://github.com/vears91/babeltrace-zipkin
cd babeltrace-zipkin

将lttng数据发送到Zipkin:

python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}

Example:

python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1

在网页上检查Ceph追踪:

Browse http://${zipkin-collector-ip}:9411
Click "Find traces"

由 Ceph 基金会带给您

Ceph 文档是一个社区资源,由非盈利的 Ceph 基金会资助和托管Ceph Foundation. 如果您想支持这一点和我们的其他工作,请考虑加入现在加入.