当前位置：首页 > news >正文

repmgr出现双主，并且时间线分叉，删除了最新的时间线节点

news 2025/12/19 9:22:11

遇到的问题如下：

2023-08-17 20:24:21.566 CST [1556001] LOG: database system was interrupted; last known up at 2023-08-17 20:21:41 CST
2023-08-17 20:24:21.770 CST [1556001] LOG: restored log file "00000009.history" from archive
cp: 无法获取'/home/postgres/pgarch/0000000A.history' 的文件状态(stat): 没有那个文件或目录
2023-08-17 20:24:21.771 CST [1556001] LOG: entering standby mode
2023-08-17 20:24:21.772 CST [1556001] LOG: restored log file "00000009.history" from archive
cp: 无法获取'/home/postgres/pgarch/000000090000010200000066' 的文件状态(stat): 没有那个文件或目录
2023-08-17 20:24:21.784 CST [1556001] LOG: restored log file "000000080000010200000066" from archive
2023-08-17 20:24:21.851 CST [1556001] FATAL: requested timeline 9 is not a child of this server's history
2023-08-17 20:24:21.851 CST [1556001] DETAIL: Latest checkpoint is at 102/66000060 on timeline 8, but in the history of the requested timeline, the server forked off from that timeline at 102/580000A0.
2023-08-17 20:24:21.851 CST [1555991] LOG: startup process (PID 1556001) exited with exit code 1
2023-08-17 20:24:21.851 CST [1555991] LOG: aborting startup due to startup process failure
2023-08-17 20:24:21.851 CST [1555991] LOG: database system is shut down

出现上面的原因是repmgr出现了双主。

在db206的主机上修改了shared_preload_libraries = 'pg_stat_statements'，试图重启，发现无法启动（没有提前创建pg_stat_statements扩展）导致。

[postgres@db206 data]$ vi postgresql.conf
[postgres@db206 data]$ pg_ctl restart
waiting for server to shut down...... done
server stopped
waiting for server to start....2023-08-17 18:11:53.086 CST [6497] FATAL: could not access file "pg_stat_statements": 没有那个文件或目录
2023-08-17 18:11:53.086 CST [6497] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server

这个时候 vi postgresql.conf 把shared_preload_libraries = 'pg_stat_statements'去掉，再次启动数据库，可以启动，试图创建，这个时候备机已经接管主机了

这个时候想起来先去修改db223的shared_preload_libraries = 'pg_stat_statements'（先在备机上给加上）

[postgres@db223 ~]$ vi pg14/data/postgresql.conf

这个时候发现出现了双主（暂时还不知道为什么会出现双主），这个时候时间线也不一样，新主是9，旧主是8

[postgres@db206 data]$ repmgr -f ~/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+----------------------+----------+----------+----------+----------+------------------------------------------------------------------------
1 | db223 | standby | ! running as primary | | default | 100 | 9 | host=db223 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
2 | db206 | primary | * running | | default | 100 | 8 | host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2

WARNING: following issues were detected
- node "db223" (ID: 1) is registered as standby but running as primary

试图对从节点进行重新注册操作，提示需要先启动数据库。

[postgres@db206 data]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby unregister
INFO: connecting to local standby
ERROR: connection to database failed
DETAIL:
connection to server at "db206" (172.20.101.206), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
user=repmgr password=repmgr connect_timeout=2 dbname=repmgr host=db206 fallback_application_name=repmgr options=-csearch_path=

启动之后重新执行命令，又提示现在是主节点。

[postgres@db206 data]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby unregister
INFO: connecting to local standby
INFO: connecting to primary database
ERROR: node 2 is not a standby server

然后试图对主节点执行注销操作，又说db233节点仍然将此节点作为其上游节点。提示:使用“repmgr standby follow”确保这些节点遵循当前的主节点。

[postgres@db206 data]$ repmgr -f /home/postgres/repmgr/repmgr.conf primary unregister
ERROR: 1 other node still has this node as its upstream node
HINT: ensure these nodes are following the current primary with "repmgr standby follow"
DETAIL: the affected node(s) are:
db223 (ID: 1)

这个时候对db223重新加入集群，发现不能在正在运行的节点上执行

[postgres@db223 ~]$ repmgr -f ~/repmgr/repmgr.conf node rejoin -d 'host=db206 port=5432 user=repmgr dbname=repmgr password=repmgr'
ERROR: database is still running in state "in production"
HINT: "repmgr node rejoin" cannot be executed on a running node

停止数据库后，再次执行，这个时候没有报错

[postgres@db223 ~]$ repmgr -f ~/repmgr/repmgr.conf node rejoin -d 'host=db206 port=5432 user=repmgr dbname=repmgr password=repmgr' -F
NOTICE: rejoin target is node "db206" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
HINT: provide --force-rewind

重新启动db223，发现还是作为主节点加入，这就很崩溃了。

pg_ctl start

[postgres@db223 ~]$ repmgr -f ~/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------
1 | db223 | primary | * running | | default | 100 | 9 | host=db223 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
2 | db206 | primary | ! running | | default | 100 | 8 | host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2

WARNING: following issues were detected
- node "db206" (ID: 2) is running but the repmgr node record is inactive

这个时候加上pg_rewind操作是不是就好了呢，发现还是不行，无法读到时间线9的，不知道为什么要读9的时间线，估计还是作为主节点加入吧。

[postgres@db223 ~]$ repmgr -f ~/repmgr/repmgr.conf node rejoin -d 'host=db206 port=5432 user=repmgr dbname=repmgr password=repmgr' --force-rewind
NOTICE: rejoin target is node "db206" (ID: 2)
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/home/postgres/pg14/bin/pg_rewind -D '/home/postgres/pg14/data' --source-server='host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2'"
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 102/580000A0 on timeline 8
pg_rewind: error: could not open file "/home/postgres/pg14/data/pg_wal/000000090000010200000058": 没有那个文件或目录
pg_rewind: fatal: could not find previous WAL record at 102/580000A0

最终极的方法是删掉重建，这个时候删掉的是时间线9的，虽然重建好了，但是pg_ctl start无法启动。

[postgres@db223 data]$ rm -rf *
[postgres@db223 data]$ ll
总用量 0
[postgres@db223 data]$ repmgr -h db206 -U repmgr -d repmgr -f /home/postgres/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/home/postgres/pg14/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=db206 user=repmgr dbname=repmgr
DETAIL: current installation size is 12 GB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/home/postgres/pg14/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/home/postgres/pg14/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/pg14/data -h db206 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/postgres/pg14/data start
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
[postgres@db223 data]$ ^C
[postgres@db223 data]$ pg_ctl start
waiting for server to start....2023-08-17 19:48:33.265 CST [1532642] LOG: redirecting log output to logging collector process
2023-08-17 19:48:33.265 CST [1532642] HINT: Future log output will appear in directory "log".
stopped waiting
pg_ctl: could not start server

查看log日志就是开头的，还是要读取时间线9，但是主库db203是没有时间线8的。又崩溃了。。。

2023-08-17 20:24:21.566 CST [1556001] LOG: database system was interrupted; last known up at 2023-08-17 20:21:41 CST
2023-08-17 20:24:21.770 CST [1556001] LOG: restored log file "00000009.history" from archive
cp: 无法获取'/home/postgres/pgarch/0000000A.history' 的文件状态(stat): 没有那个文件或目录
2023-08-17 20:24:21.771 CST [1556001] LOG: entering standby mode
2023-08-17 20:24:21.772 CST [1556001] LOG: restored log file "00000009.history" from archive
cp: 无法获取'/home/postgres/pgarch/000000090000010200000066' 的文件状态(stat): 没有那个文件或目录
2023-08-17 20:24:21.784 CST [1556001] LOG: restored log file "000000080000010200000066" from archive
2023-08-17 20:24:21.851 CST [1556001] FATAL: requested timeline 9 is not a child of this server's history
2023-08-17 20:24:21.851 CST [1556001] DETAIL: Latest checkpoint is at 102/66000060 on timeline 8, but in the history of the requested timeline, the server forked off from that timeline at 102/580000A0.
2023-08-17 20:24:21.851 CST [1555991] LOG: startup process (PID 1556001) exited with exit code 1
2023-08-17 20:24:21.851 CST [1555991] LOG: aborting startup due to startup process failure
2023-08-17 20:24:21.851 CST [1555991] LOG: database system is shut down

这个时候看了看db223的参数，是不是读取的归档路径不对，然后就看到基于时间线恢复recovery_target_timeline参数

archive_mode = on

archive_command = 'scp %p postgres@172.20.101.208:/home/postgres/pgarch/%f'

archive_cleanup_command = 'pg_archivecleanup /home/postgres/pgarch %r'

restore_command = 'cp /home/postgres/pgarch/%f %p'

recovery_target_timeline = 'latest'

修改了recovery_target_timeline = 'current'之后，再次启动db223就好了。

[postgres@db206 ~]$ repmgr -f ~/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------
1 | db223 | standby | running | db206 | default | 100 | 8 | host=db223 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
2 | db206 | primary | * running | | default | 100 | 8 | host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2

总结：

1、暂时还不知道为什么会出现双主，这个还需要复现一下。

2、考虑加一个见证节点（不知道能不能预防双主的出现），有待研究。

3、对recovery_target_timeline 知其然而不知所以然，抽空研究一下。

4、对recovery_target_timeline 在备机上修改完current之后，是否还需要再修改成laster（个人认为是不需要的）。

5大概看了一眼如下博客，解决的很顺利？？？？

repmgr 集群双主问题处理

repmgr 集群双主问题处理_repmgr 把主库down 了_瀚高PG实验室的博客-CSDN博客

repmgr出现双主，并且时间线分叉，删除了最新的时间线节点

遇到的问题如下： 2023-08-17 20:24:21.566 CST [1556001] LOG: database system was interrupted; last known up at 2023-08-17 20:21:41 CST 2023-08-17 20:24:21.770 CST [1556001] LOG: restored log file "00000009.history" from archive cp: 无法…...

编程日记 2023/8/18 6:50:34

ThinkPHP中实现IP地址定位

在网站开发中，我们经常需要获取用户的地理位置信息以提供个性化的服务。一种常见的方法是通过IP地址定位。在本文中，我们将介绍如何在ThinkPHP框架中实现IP地址定位。一、IP地址定位的基本原理 IP地址是Internet上的设备在网络中的标识符。每个设备都有…...

编程日记 2023/8/18 6:49:33

使用Python批量将Word文件转为PDF文件

说明：在使用Minio服务器时，无法对word文件预览，如果有需要的话，可以将word文件转为pdf文件，再存储到Minio中，本文介绍如何批量将word文件，转为pdf格式的文件； 安装库首先&#xff…...

编程日记 2023/8/18 6:48:32

XDR解决方案成为了新的安全趋势

和当今指数倍增长的安全数据相比，安全人才的短缺带来了潜在的风险。几乎所有的公司，无论规模大小，在安全资源能力上都有限，需要过滤各种告警才能将分析量保持在可接受范围。但这样一来，潜在的威胁线索就可能被埋没&…...

编程日记 2023/8/18 6:47:30

001-Nacos 服务注册

目录 Nacos介绍注册中心架构面临问题源码分析实例注册-接口实例注册-入口实例注册-创建一个(Nacos)Service实例注册-注册(Nacos)Service Nacos 介绍 Dynamic Naming and Configuration Service 动态的命名和配置服务反正可以实现注册中心的功能注册中心架构服务提供者 …...

编程日记 2023/8/18 6:46:30

71 # 协商缓存的配置：通过内容

对比（协商）缓存比较一下再去决定是用缓存还是重新获取数据，这样会减少网络请求，提高性能。对比缓存的工作原理客户端第一次请求服务器的时候，服务器会把数据进行缓存，同时会生成一个缓存标识符&#…...

编程日记 2023/8/18 6:45:28

【服务器】Strace显示后台进程输出

今天有小朋友遇到一个问题她想把2331509和2854637这两个进程调到前台来，以便于在当前shell查看这两个python进程的实时输出我第一反应是用jobs -l然后fg （参考这里） 但是发现jobs -l根本没有输出： 原因是jobs看的是当前ses…...

编程日记 2023/8/18 6:44:27

centos如何安装libssl-dev libsdl-dev libavcodec-dev libavutil-dev ffmpeg

在 CentOS 系统上安装这些包可以按照以下步骤进行： 打开终端，使用 root 或具有管理员权限的用户登录。使用以下命令安装 libssl-dev 包： yum install openssl-devel使用以下命令安装 libsdl-dev 包： yum install SDL-devel使用以…...

编程日记 2023/8/18 6:43:26

2022年12月 C/C++（二级）真题解析#中国电子学会#全国青少年软件编程等级考试

第1题：数组逆序重放将一个数组中的值按逆序重新存放。例如，原来的顺序为8,6,5,4,1。要求改为1,4,5,6,8。输入输入为两行：第一行数组中元素的个数n(1 输出输出为一行：输出逆序后数组的整数，每两个整数之间用空格分隔。样例输入 5 8 6 5 4 1 样例输出 1 4 5 6 8 以下是…...

编程日记 2023/8/18 6:42:25

详谈MongoDB的那些事

概念区分什么是关系型数据库关系型数据库（Relational Database）是一种基于关系模型的数据库管理系统（DBMS）。在关系型数据库中，数据以表格的形式存储，表格由行和列组成，行表示数据记录&…...

编程日记 2023/8/18 6:41:23

企业电子招投标采购系统源码之电子招投标的组成 tbms

功能模块： 待办消息，招标公告，中标公告，信息发布描述： 全过程数字化采购管理，打造从供应商管理到采购招投标、采购合同、采购执行的全过程数字化管理。通供应商门户具备内外协同的能力，为…...

编程日记 2023/8/18 6:40:22

Android 13 添加自定义分区，恢复出厂设置不被清除

需求：客户有些文件或数据，需要做得恢复出厂设置还存在，故需新增一个分区存储客户数据。要求： a) 分区大小为50M b) 应用层可读可写 c) 恢复出厂设置后不会被清除 d) 不需要打包.img e) 不影响OTA升级缺点： 1).通过代码在分区创建目录和文件，会涉及到SeLinux权限的修…...

编程日记 2023/8/18 6:39:20

改进YOLO系列：１.添加SE注意力机制

添加SE注意力机制 1. SE注意力机制论文２. SE注意力机制原理３. SE注意力机制的配置３.１common.py配置３.２yolo.py配置３.３yaml文件配置 1. SE注意力机制论文论文题目：Squee…...

编程日记 2023/8/18 6:38:19

RP2040开发板自制树莓派逻辑分析仪

目录前言 1 准备工作和前提条件 1.1 Raspberry Pi Pico RP2040板子一个 1.2 Firmware-LogicAnalyzer-5.0.0.0-PICO.uf2固件 1.3 LogicAnalyzer-5.0.0.0-win-x64软件 2 操作指南 2.1 按住Raspberry Pi Pico开发板的BOOTSEL按键，再接上USB接口到电脑 2.2 刷入…...

编程日记 2023/8/18 6:37:18

git clone -b与git pull origin ＜branch_name＞的区别

git clone -b 和 git pull origin <branch_name> 都是用于在 Git 中操作分支的命令，但它们有不同的用途和行为。 git clone -b 这是在克隆仓库时指定要克隆的特定分支的命令。它用于在克隆一个仓库的同时指定要克隆的分支。例如，如果你只想克隆一…...

编程日记 2023/8/18 6:36:17

中期国际：MT4数据挖掘与分析方法：以数据为导向，制定有效的交易策略

在金融市场中，制定有效的交易策略是成功交易的关键。而要制定一份可靠的交易策略，数据挖掘与分析方法是不可或缺的工具。本文将介绍如何以数据为导向，利用MT4进行数据挖掘与分析，从而制定有效的交易策略。首先，我们需…...

编程日记 2023/8/18 6:35:14

Linux命令(70)之bzip2

linux命令之bzip2 1.bzip2介绍 linux命令bzip2是用来压缩或解压缩文件名后缀为".bz2"的文件 2.bzip2用法 bzip2 [参数] filename bzip2常用参数参数说明-d解压缩文件-t测试压缩文件是否正确-k压缩后，保留源文件-z强制压缩-f强制覆盖已存在的文件-v显…...

编程日记 2023/8/18 6:34:13

ubuntu下gif动态图片的制作

Gif图片比视频小, 比静态JPG图片形象生动, 更适用于产品展示和步骤演示等。各种各样的gif动图为大家交流提供很大的乐趣. 这里简单介绍ubuntu系统下gif图的制作。一、工具安装: kazam和ffmpeg kazam是linux下的一款简单但是功能强大的屏幕录制工具. 它可录制声音并选择全屏录…...

编程日记 2023/8/18 6:33:11

56.linux 进程管理命令和用户管理命令

目录一、进程管理命令 1.ps 2.pstree 3.kill 4.pkill 5.&后台运行程序 6.jobs 7.fg bg 8.top 二、用户管理命令 1.系统存储用户信息的文件 2.添加新用户 3.修改用户密码 4.删除用户一、进程管理命令 1.ps 用于查看当前系统中运行的进程信息。它可以…...

编程日记 2023/8/18 6:32:10

Mac os 上的apt-get install 就是brew install

Mac os 上面不支持apt-get install ,但是有个 brew install可以代替。 Homebrew是Mac OS的包管理器，可以方便地安装各种需要的软件。 1.1 安装Homebrew 如果没有安装Homebrew，需要在终端输入以下命令进行安装： /usr/bin/ruby -e "$(…...

编程日记 2023/8/18 6:31:09

后进先出（LIFO）详解

LIFO 是 Last In, First Out 的缩写，中文译为后进先出。这是一种数据结构的工作原则，类似于一摞盘子或一叠书本： 最后放进去的元素最先出来 -想象往筒状容器里放盘子： （1）你放进的最后一个盘子&#xff08…...

编程新知 2025/12/18 15:10:55

谷歌浏览器插件

项目中有时候会用到插件 sync-cookie-extension1.0.0：开发环境同步测试 cookie 至 localhost，便于本地请求服务携带 cookie 参考地址：https://juejin.cn/post/7139354571712757767 里面有源码下载下来，加在到扩展即可使用FeHelp…...

编程新知 2025/8/16 3:55:30

docker详细操作--未完待续

docker介绍 docker官网: Docker：加速容器应用程序开发 harbor官网：Harbor - Harbor 中文使用docker加速器: Docker镜像极速下载服务 - 毫秒镜像是什么 Docker 是一种开源的容器化平台，用于将应用程序及其依赖项（如库、运行时环…...

编程新知 2025/12/15 20:23:19

Vue3 + Element Plus + TypeScript中el-transfer穿梭框组件使用详解及示例

使用详解 Element Plus 的 el-transfer 组件是一个强大的穿梭框组件，常用于在两个集合之间进行数据转移，如权限分配、数据选择等场景。下面我将详细介绍其用法并提供一个完整示例。核心特性与用法基本属性 v-model：绑定右侧列表的值&…...

编程新知 2025/12/16 1:16:15

iPhone密码忘记了办？iPhoneUnlocker，iPhone解锁工具Aiseesoft iPhone Unlocker 高级注册版分享

平时用 iPhone 的时候，难免会碰到解锁的麻烦事。比如密码忘了、人脸识别 / 指纹识别突然不灵，或者买了二手 iPhone 却被原来的 iCloud 账号锁住，这时候就需要靠谱的解锁工具来帮忙了。Aiseesoft iPhone Unlocker 就是专门解决这些问题的软件&…...

编程新知 2025/12/17 20:13:54

均衡后的SNRSINR

本文主要摘自参考文献中的前两篇，相关文献中经常会出现MIMO检测后的SINR不过一直没有找到相关数学推到过程，其中文献[1]中给出了相关原理在此仅做记录。 1. 系统模型复信道模型 n t n_t nt 根发送天线， n r n_r nr 根接收天线的 MIMO 系…...

编程新知 2025/12/9 19:30:18

高效线程安全的单例模式：Python 中的懒加载与自定义初始化参数

高效线程安全的单例模式：Python 中的懒加载与自定义初始化参数在软件开发中，单例模式（Singleton Pattern）是一种常见的设计模式，确保一个类仅有一个实例，并提供一个全局访问点。在多线程环境下，实现单例模式时需要注意线程安全问题，以防止多个线程同时创建实例，导致…...

编程新知 2025/11/25 19:50:27

安宝特案例丨Vuzix AR智能眼镜集成专业软件，助力卢森堡医院药房转型，赢得辉瑞创新奖

在Vuzix M400 AR智能眼镜的助力下，卢森堡罗伯特舒曼医院（the Robert Schuman Hospitals, HRS）凭借在无菌制剂生产流程中引入增强现实技术（AR）创新项目，荣获了2024年6月7日由卢森堡医院药剂师协会&#xff0…...

编程新知 2025/12/16 15:18:19

【电力电子】基于STM32F103C8T6单片机双极性SPWM逆变（硬件篇）

本项目是基于 STM32F103C8T6 微控制器的 SPWM（正弦脉宽调制）电源模块，能够生成可调频率和幅值的正弦波交流电源输出。该项目适用于逆变器、UPS电源、变频器等应用场景。供电电源输入电压采集上图为本设计的电源电路，图中 D1 为二极管，其目的是防止正负极电源反接， …...

编程新知 2025/12/17 14:43:00

群晖NAS如何在虚拟机创建飞牛NAS

套件中心下载安装Virtual Machine Manager 创建虚拟机配置虚拟机飞牛官网下载 https://iso.liveupdate.fnnas.com/x86_64/trim/fnos-0.9.2-863.iso 群晖NAS如何在虚拟机创建飞牛NAS - 个人信息分享...

编程新知 2025/12/15 17:15:27

repmgr 集群双主问题处理

相关文章：