注意
本文档适用于 Ceph 的开发版本。
ceph-bluestore-tool -- bluestore 管理工具
概要
描述
ceph-bluestore-tool是一个用于对 BlueStore 实例执行低级管理操作的实用程序。
Commands
help
显示帮助
fsck[ --deep ](开|关) 或 (是|否) 或 (1|0) 或 (真|假)
在 BlueStore 元数据上运行一致性检查。如果--deep指定,则还会读取所有对象数据并验证校验和。
repair
运行一致性检查和修复我们可以修复的任何错误。
qfsck
在 BlueStore 元数据上运行一致性检查,比较分配器数据(当存在时来自 RocksDB CFB,如果不存在则使用分配文件)与 ONodes 状态。
allocmap
执行 qfsck 执行的相同检查,然后存储新的分配文件(命令默认禁用,需要特殊构建)
restore_cfb
反转新 NCB 代码所做的更改(无论是通过 ceph 重启还是运行 allocmap 命令)并恢复 RocksDB B 列族(分配器映射)。
bluefs-export
将 BlueFS(即 RocksDB 文件)的内容导出到输出目录。
bluefs-bdev-sizes--pathosd 路径
将 BlueFS 理解的设备大小打印到标准输出。
bluefs-bdev-expand--pathosd 路径
指示 BlueFS 检查其块设备的大小,如果它们已扩展,则使用额外的空间。请注意,如果首选块设备有足够的空间,则 BlueFS 创建的新文件将分配在该块设备上,并且当 RocksDB 执行压缩时,已溢出到慢速设备的现有文件将逐渐被删除。换句话说,如果任何数据溢出到慢速设备,它将随着时间的推移被移动到快速设备。
bluefs-bdev-new-wal--pathosd 路径--dev-target新设备
向 BlueFS 添加 WAL 设备,如果 WAL 设备已存在则失败。
bluefs-bdev-new-db--pathosd 路径--dev-target新设备
向 BlueFS 添加 DB 设备,如果 DB 设备已存在则失败。
bluefs-bdev-migrate--dev-target新设备--devs-source设备1[--devs-source设备2]
将 BlueFS 数据从源设备迁移到目标设备。源设备(除主设备外)在成功时被删除。扩展目标存储(更新大小标签),使“bluefs-bdev-expand”不再需要。目标设备可以是新设备或已连接的设备。如果设备是
if the source list has DB volume - the target device replaces it.
if the source list has WAL volume - the target device replaces it.
if the source list has slow volume only - the operation isn’t permitted and requires explicit allocation via a new-DB/new-WAL command.
show-label --dev 设备 […]
Show device label(s). The label may be printed while an OSD is running.
show-label-at --dev 设备--offsetlba […]
Show device label at specific disk location. Dedicated DB/WAL volumes have a single label at offset 0. Main device could have valid labels at multiple locations: 0/1GiB/10GiB/100GiB/1000GiB. The labels at some locations might not exist though. The label may be printed while an OSD is running.
free-dump--pathosd 路径[ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
Dump all free regions in allocator.
free-score--pathosd 路径[ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
Give a [0-1] number that represents quality of fragmentation in allocator. 0 represents case when all free space is in one chunk. 1 represents worst possible fragmentation.
bluefs-stats--pathosd 路径
Shows summary of BlueFS occupied space with split on devices: block/db/wal and roles: wal/log/db.
bluefs-files--pathosd 路径
Lists all BlueFS managed files, printing name, size and space used on devices.
reshard--pathosd 路径--sharding新分片 [ --resharding-ctrl 控制字符串 ]
Changes sharding of BlueStore’s RocksDB. Sharding is build on top of RocksDB column families. This option allows to test performance of 新分片 without need to redeploy OSD. Resharding is usually a long process, which involves walking through entire RocksDB key space and moving some of them to different column families. Option --resharding-ctrl provides performance control over resharding process. Interrupted resharding will prevent OSD from running. Interrupted resharding does not corrupt data. It is always possible to continue previous resharding, or select any other sharding scheme, including reverting to original one.
show-sharding--pathosd 路径
Show sharding that is currently applied to BlueStore’s RocksDB.
- command:
trim--pathosd 路径
An SSD that has been used heavily may experience performance degradation. This operation uses TRIM / discard to free unused blocks from BlueStore and BlueFS block devices, and allows the drive to perform more efficient internal housekeeping. If BlueStore runs with discard enabled, this option may not be useful.
- command:
zap-device --dev 设备路径
Zeros all device label locations. This effectively makes device appear empty.
- command:
revert-wal-to-plain--pathosd 路径
Changes WAL files from envelope mode to the legacy plain mode. Useful for downgrades, or if you might want to disable this new feature (bluefs_wal_envelope_mode).
选项
- --dev *device*
Add 设备 to the list of devices to consider
- -i *osd_id*
Operate as OSD osd_id. Connect to monitor for OSD specific options. If monitor is unavailable, add --no-mon-config to read from ceph.conf instead.
- --devs-source *device*
Add 设备 to the list of devices to consider as sources for migrate operation
- --dev-target *device*
Specify target 设备 migrate operation or device to add for adding new DB/WAL.
- --path *osd path*
Specify an osd path. In most cases, the device list is inferred from the symlinks present in osd 路径. This is usually simpler than explicitly specifying the device(s) with --dev. Not necessary if -i osd_id is provided.
- --out-dir *dir*
Output directory for bluefs-export
- -l, --log-file *log file*
file to log to
- --log-level *num*
debug log level. Default is 30 (extremely verbose), 20 is very verbose, 10 is verbose, and 1 is not very verbose.
- --deep
deep scrub/repair (read and validate object data, not just metadata)
- --allocator *name*
Useful for free-dump和free-score actions. Selects allocator(s).
- --resharding-ctrl *control string*
Provides control over resharding process. Specifies how often refresh RocksDB iterator, and how large should commit batch be before committing to RocksDB. Option format is: <iterator_refresh_bytes>/<iterator_refresh_keys>/<batch_commit_bytes>/<batch_commit_keys> Default: 10000000/10000/1000000/1000
Additional ceph.conf options
Any configuration option that is accepted by OSD can be also passed to ceph-bluestore-tool. Useful to provide necessary configuration options when access to monitor/ceph.conf is impossible and -i option cannot be used.
Device labels
Every BlueStore block device has a block label at the beginning of the device. Main device might optionaly have additional labels at different locations for the sake of OSD robustness. You can dump the contents of the label with:
ceph-bluestore-tool show-label --dev *device*
The main device will have a lot of metadata, including information that used to be stored in small files in the OSD data directory. The auxiliary devices (db and wal) will only have the minimum required fields (OSD UUID, size, device type, birth time). The main device contains additional label copies at offsets: 1GiB, 10GiB, 100GiB and 1000GiB. Corrupted labels are fixed as part of repair:
ceph-bluestore-tool repair --dev *device*
OSD directory priming
You can generate the content for an OSD data directory that can start up a BlueStore OSD with the prime-osd-dir command:
ceph-bluestore-tool prime-osd-dir --dev *main device* --path /var/lib/ceph/osd/ceph-*id*
BlueFS log rescue
Some versions of BlueStore were susceptible to BlueFS log growing extremely large - beyond the point of making booting OSD impossible. This state is indicated by booting that takes very long and fails in _replay function.
- This can be fixed by::
ceph-bluestore-tool fsck --path osd 路径 --bluefs_replay_recovery=true
- It is advised to first check if rescue process would be successful::
ceph-bluestore-tool fsck --path osd 路径 --bluefs_replay_recovery=true --bluefs_replay_recovery_disable_compact=true
If above fsck is successful fix procedure can be applied.
可用性
ceph-bluestore-tool is part of Ceph, a massively scalable, open-source, distributed storage system. Please refer to the Ceph documentation at https://docs.ceph.com中的Ceph文档以获取更多信息。
参见
ceph-osd(8)