文档版本 v3.7-DRAFT 处于草稿状态。如需获取最新的稳定版文档，请参阅 v3.6。

将 etcd 从 3.5 降级到 3.4

关于将 etcd 从 3.5 降级到 3.4 的流程、清单和注意事项

在一般情况下，从 etcd 3.5 降级到 3.4 可以实现零停机、滚动式降级：

逐个停止 etcd 3.5 进程，并将其替换为 etcd 3.4 进程
启动任意 3.4 进程后，集群将不再可用 3.5 中的新功能

在开始降级之前，请通读本指南其余部分以做好准备。

降级检查清单

content/en/docs/v3.5/op-guide/authentication/rbac.md

注意： 如果你的集群启用了认证（auth），则不支持从 3.5 滚动降级，因为 3.5 更改了与认证相关的 WAL 日志条目格式。你可以按照认证说明禁用认证并先删除所有用户。

从 3.5 到 3.4 的主要破坏性变更：

命令行参数的差异

如果你在 3.5 配置中使用了以下任一参数，请确保在降级到 3.4 时移除、重命名或更改其默认值。

注意此差异基于版本 3.5.14 和 3.4.33。实际差异取决于你使用的补丁版本，请先使用 diff <(etcd-3.5/bin/etcd -h | grep \\-\\-) <(etcd-3.4/bin/etcd -h | grep \\-\\-) 命令进行检查。

# flags not available in 3.4
-etcd --socket-reuse-port
-etcd --socket-reuse-address
-etcd --raft-read-timeout
-etcd --raft-write-timeout
-etcd --v2-deprecation
-etcd --client-cert-file
-etcd --client-key-file
-etcd --peer-client-cert-file
-etcd --peer-client-key-file
-etcd --self-signed-cert-validity
-etcd --enable-log-rotation --log-rotation-config-json=some.json
-etcd --experimental-enable-distributed-tracing --experimental-distributed-tracing-address='localhost:4317' --experimental-distributed-tracing-service-name='etcd' --experimental-distributed-tracing-instance-id='' --experimental-distributed-tracing-sampling-rate='0'
-etcd --experimental-compact-hash-check-enabled --experimental-compact-hash-check-time='1m'
-etcd --experimental-downgrade-check-time
-etcd --experimental-memory-mlock
-etcd --experimental-txn-mode-write-with-shared-buffer
-etcd --experimental-bootstrap-defrag-threshold-megabytes
-etcd --experimental-stop-grpc-service-on-defrag

# same flag with different names
-etcd --backend-bbolt-freelist-type=map
+etcd --experimental-backend-bbolt-freelist-type=array

# same flag different defaults
-etcd --pre-vote=true
+etcd --pre-vote=false

-etcd --logger=zap
+etcd --logger=capnslog

`etcd --logger zap`

3.4 默认使用 --logger=capnslog，而 3.5 默认使用 --logger=zap。

如果你想继续使用 zap，需要显式指定该参数。

+etcd --logger=zap --log-outputs=stderr

+# to write logs to stderr and a.log file at the same time
+etcd --logger=zap --log-outputs=stderr,a.log

Prometheus 指标的变化

# metrics not available in 3.4
-etcd_debugging_mvcc_db_compaction_last

服务器降级检查清单

降级要求

为了确保滚动降级顺利进行，运行中的集群必须处于健康状态。在继续操作前，请使用 etcdctl endpoint health 命令检查集群健康状况。

要降级到的 3.4 版本必须 >= 3.4.32。

准备工作

在降级 etcd 之前，务必先在预发布环境中测试依赖 etcd 的服务，然后再将降级部署到生产环境。

开始之前，请下载快照备份。如果降级过程中出现问题，可以使用此备份回滚到现有的 etcd 版本。请注意，snapshot 命令仅备份 v3 数据。对于 v2 数据，请参阅 v2 数据存储的备份方法。

开始之前，请下载最新版的 etcd 3.4，并确保其版本 >= 3.4.32。

混合版本

在降级过程中，etcd 集群支持混合版本的成员，并以最低公共版本的协议运行。一旦有任何成员降级到 3.4 版本，即认为整个集群已降级。内部而言，etcd 成员之间会相互协商以确定整体集群版本，该版本控制着报告的版本号和所支持的功能。

限制

注意：如果集群仅有 v3 数据而无 v2 数据，则不受此限制影响。

如果集群正在服务的 v2 数据集大于 50MB，每个新降级的成员可能需要最多两分钟才能追上现有集群的进度。可通过检查最近快照的大小来估算总数据量。换句话说，在每次降级成员之间最好等待 2 分钟。

对于更大的总数据量（例如 100MB 或更多），这一一次性过程可能耗时更长。如此大规模的 etcd 集群管理员可在降级前联系 etcd 团队，我们将很乐意提供操作建议。

回滚

一旦有任何成员被降级到 3.4，集群版本将降级为 3.4，所有操作都将“兼容 3.4”。若需回滚，你需要遵循从 3.4 升级到 3.5的说明进行操作。

请下载快照备份，以便即使在集群完全降级后仍可执行降级操作。

降级流程

本示例展示了如何将本地机器上运行的三成员 3.5 etcd 集群进行降级。

步骤 1：检查降级要求

集群是否健康且正在运行 3.5.x？

etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint health
<<COMMENT
localhost:2379 is healthy: successfully committed proposal: took = 2.118638ms
localhost:22379 is healthy: successfully committed proposal: took = 3.631388ms
localhost:32379 is healthy: successfully committed proposal: took = 2.157051ms
COMMENT

curl http://localhost:2379/version
<<COMMENT
{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
COMMENT

curl http://localhost:22379/version
<<COMMENT
{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
COMMENT

curl http://localhost:32379/version
<<COMMENT
{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
COMMENT

步骤 2：从主节点下载快照备份

下载快照备份，以便在出现问题时提供降级恢复路径。

步骤 3：停止一个现有的 etcd 服务器

在停止服务器之前，检查它是否是主节点

etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint status -w=table
<<COMMENT
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  localhost:2379 | 8211f1d0f64f3269 |  3.5.13 |   20 kB |      true |      false |         2 |          9 |                  9 |        |
| localhost:22379 | 91bc3c398fb3c146 |  3.5.13 |   20 kB |     false |      false |         2 |          9 |                  9 |        |
| localhost:32379 | fd422379fda50e48 |  3.5.13 |   20 kB |     false |      false |         2 |          9 |                  9 |        |
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
COMMENT

如果待停止的服务器是 Leader，可以在停止前使用 move-leader 命令将 Leader 角色转移到其他服务器，以避免部分停机时间。

etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 move-leader 91bc3c398fb3c146

etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint status -w=table
<<COMMENT
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  localhost:2379 | 8211f1d0f64f3269 |  3.5.13 |   20 kB |     false |      false |         3 |         11 |                 11 |        |
| localhost:22379 | 91bc3c398fb3c146 |  3.5.13 |   20 kB |      true |      false |         3 |         11 |                 11 |        |
| localhost:32379 | fd422379fda50e48 |  3.5.13 |   20 kB |     false |      false |         3 |         11 |                 11 |        |
+-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
COMMENT

每当一个 etcd 进程停止时，集群中的其他成员会记录预期的错误日志。这是正常现象，因为集群成员之间的连接已（临时）中断。

{"level":"info","ts":"2024-05-14T20:25:47.051124Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"91bc3c398fb3c146 became leader at term 3"}
{"level":"info","ts":"2024-05-14T20:25:47.051139Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 91bc3c398fb3c146 elected leader 91bc3c398fb3c146 at term 3"}

^C{"level":"warn","ts":"2024-05-14T20:27:09.094119Z","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"91bc3c398fb3c146","remote-peer-id":"8211f1d0f64f3269","error":"EOF"}
{"level":"warn","ts":"2024-05-14T20:27:09.09427Z","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"91bc3c398fb3c146","remote-peer-id":"8211f1d0f64f3269","error":"EOF"}
{"level":"warn","ts":"2024-05-14T20:27:09.095535Z","caller":"rafthttp/peer_status.go:66","msg":"peer became inactive (message send to peer failed)","peer-id":"8211f1d0f64f3269","error":"failed to dial 8211f1d0f64f3269 on stream MsgApp v2 (peer 8211f1d0f64f3269 failed to find local node 91bc3c398fb3c146)"}
{"level":"warn","ts":"2024-05-14T20:27:09.43915Z","caller":"rafthttp/stream.go:223","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"91bc3c398fb3c146","remote-peer-id":"8211f1d0f64f3269"}
{"level":"warn","ts":"2024-05-14T20:27:11.085646Z","caller":"etcdserver/cluster_util.go:294","msg":"failed to reach the peer URL","address":"http://127.0.0.1:12380/version","remote-member-id":"8211f1d0f64f3269","error":"Get \"http://127.0.0.1:12380/version\": dial tcp 127.0.0.1:12380: connect: connection refused"}
{"level":"warn","ts":"2024-05-14T20:27:11.085718Z","caller":"etcdserver/cluster_util.go:158","msg":"failed to get version","remote-member-id":"8211f1d0f64f3269","error":"Get \"http://127.0.0.1:12380/version\": dial tcp 127.0.0.1:12380: connect: connection refused"}
{"level":"warn","ts":"2024-05-14T20:27:13.557385Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"8211f1d0f64f3269","rtt":"416.079µs","error":"dial tcp 127.0.0.1:12380: connect: connection refused"}

步骤 4：使用相同的配置和 `--next-cluster-version-compatible` 参数重启 etcd 服务器

使用相同的配置以及新的 etcd 二进制文件，并加上 --next-cluster-version-compatible 参数重启 etcd 服务器。

-etcd-3.5/bin --name s1 \
+etcd-3.4/bin --name s1 \
  --data-dir /tmp/etcd/s1 \
  --listen-client-urls http://localhost:2379 \
  --advertise-client-urls http://localhost:2379 \
  --listen-peer-urls http://localhost:2380 \
  --initial-advertise-peer-urls http://localhost:2380 \
  --initial-cluster s1=http://localhost:2380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state existing
  --next-cluster-version-compatible

新的 3.4 版本 etcd 将向集群发布其信息。此时，集群将开始以 3.4 协议运行，这是最低公共版本。

> `{"level":"info","ts":"2024-05-13T21:05:43.981445Z","caller":"membership/cluster.go:561","msg":"set initial cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","cluster-version":"3.0"}`

> `{"level":"info","ts":"2024-05-13T21:05:43.982188Z","caller":"api/capability.go:77","msg":"enabled capabilities for version","cluster-version":"3.0"}`

> `{"level":"info","ts":"2024-05-13T21:05:43.982312Z","caller":"membership/cluster.go:549","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","from":"3.0","from":"3.5"}`

> `{"level":"info","ts":"2024-05-13T21:05:43.982376Z","caller":"api/capability.go:77","msg":"enabled capabilities for version","cluster-version":"3.5"}`

> `{"level":"info","ts":"2024-05-13T21:05:44.000672Z","caller":"etcdserver/server.go:2152","msg":"published local member to cluster through raft","local-member-id":"8211f1d0f64f3269","local-member-attributes":"{Name:infra1 ClientURLs:[http://127.0.0.1:2379]}","request-path":"/0/members/8211f1d0f64f3269/attributes","cluster-id":"ef37ad9dc622a7c4","publish-timeout":"7s"}`

> `{"level":"info","ts":"2024-05-13T21:05:46.452631Z","caller":"membership/cluster.go:549","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","from":"3.5","from":"3.4"}`

验证每个成员以及整个集群是否都使用新的 3.4 版本 etcd 二进制文件恢复正常状态：

etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
<<COMMENT
localhost:32379 is healthy: successfully committed proposal: took = 2.337471ms
localhost:22379 is healthy: successfully committed proposal: took = 1.130717ms
localhost:2379 is healthy: successfully committed proposal: took = 2.124843ms
COMMENT

未降级的成员将记录类似如下的信息

{"level":"info","ts":"2024-05-13T21:05:46.450764Z","caller":"etcdserver/server.go:2633","msg":"updating cluster version using v2 API","from":"3.5","to":"3.4"}
{"level":"info","ts":"2024-05-13T21:05:46.452419Z","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"91bc3c398fb3c146","from":"3.5","to":"3.4"}
{"level":"info","ts":"2024-05-13T21:05:46.452547Z","caller":"etcdserver/server.go:2652","msg":"cluster version is updated","cluster-version":"3.4"}

步骤 5：对剩余成员重复步骤 3 和步骤 4

当所有成员都降级完成后，检查集群的健康状态和版本：

endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
<<COMMENT
localhost:2379 is healthy: successfully committed proposal: took = 492.834µs
localhost:22379 is healthy: successfully committed proposal: took = 1.015025ms
localhost:32379 is healthy: successfully committed proposal: took = 1.853077ms
COMMENT

curl http://localhost:2379/version
<<COMMENT
{"etcdserver":"3.4.32","etcdcluster":"3.4.0"}
COMMENT

curl http://localhost:22379/version
<<COMMENT
{"etcdserver":"3.4.32","etcdcluster":"3.4.0"}
COMMENT

curl http://localhost:32379/version
<<COMMENT
{"etcdserver":"3.4.32","etcdcluster":"3.4.0"}
COMMENT

反馈

这个页面有帮助吗？

很高兴听到这一点！请告诉我们如何改进。

很抱歉听到这一点。请告诉我们如何改进。

最后更新于 2025 年 6 月 3 日：递归地将 v3.6 的内容复制到 v3.7（a90b2a6）