TypechoJoeTheme

至尊技术网

登录
用户名
密码

centospostgresql故障排查技巧

2025-12-07
/
0 评论
/
60 阅读
/
正在检测是否收录...
12/07

正文:
凌晨2点的告警短信像催命符般震动手机——生产数据库突然拒绝连接。顶着困意打开终端,一场与PostgreSQL的搏斗就此展开。


▌ 阶段一:连接风暴的真相
psql -h 127.0.0.1 -U postgres反复返回Connection refused时,第一反应不是直接重启服务,而是先检查三个关键点:
bash

1. 端口监听状态

ss -tuln | grep 5432

2. 防火墙闪电排查

firewall-cmd --list-ports | grep 5432

3. 服务幽灵进程检测

ps -ef | grep postgres | grep -v grep

发现端口处于LISTEN状态但防火墙未放行,原来同事下午调整安全策略时漏掉了数据库端口。临时解决:
bash firewall-cmd --add-port=5432/tcp --permanent firewall-cmd --reload


▌ 阶段二:查询陷入泥潭
刚解决连接问题,监控面板突然显示CPU飙升至90%。用pg_top定位到某个账单生成进程长期占据资源:
sql SELECT pid, query_start, query FROM pg_stat_activity WHERE state = 'active' ORDER BY query_start;

发现一条每月执行的报表查询竟运行了2小时!进一步用EXPLAIN ANALYZE抓取执行计划,揭示出缺失的索引:
sql CREATE INDEX CONCURRENTLY idx_order_date ON orders(created_at);

避坑指南:在亿级数据表创建索引务必使用CONCURRENTLY,否则会触发表级锁阻塞业务。


▌ 阶段三:神秘的锁争夺战
凌晨4点,客服系统突然报出订单更新异常。通过锁监控视图精准定位死锁链条:
sql SELECT blocked.pid AS blocked_pid, blocking.pid AS blocking_pid, blocked.query AS blocked_query FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked ON blocked.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking ON blocking.pid = blocking_locks.pid WHERE NOT blocked_locks.granted;

强制终止阻塞进程后,立即修改应用程序逻辑:将长事务拆解为小批量提交,并增设锁等待超时参数:
postgresql SET lock_timeout = '2s';


▌ 深度调优:从救火到防火
天亮前最后一步——预防性检查:
1. 日志挖矿:放大/var/lib/pgsql/12/data/pg_log/postgresql-*.log中的WARNING级日志
2. 空间定时炸弹:设置监控告警规则
bash df -h /var/lib/pgsql # 存储空间 SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10; # 慢查询TOP10
3. 内存暗流:调整shared_buffers为物理内存的25%,并监控vmstat的si/so交换值


当晨光透过机房窗户时,数据库指标终于恢复平稳。这场战役教会我:真正的故障排查不是背命令,而是像侦探一样串联证据链——从网络层到SQL层,从系统指标到事务逻辑,每一步都需要用数据说话。留下最后一道锦囊:定期执行pg_prewarm预热常用数据,让危机化解在黎明之前。

CentOS7日志分析连接失败PostgreSQL12性能骤降锁冲突
朗读
赞(0)
版权属于:

至尊技术网

本文链接:

https://www.zzwws.cn/archives/40631/(转载时请注明本文出处及文章链接)

评论 (0)