找回密码
立即注册
搜索
热搜: Java Python Linux Go
发回帖 发新帖

2047

积分

0

好友

286

主题
发表于 2025-12-25 04:25:25 | 查看: 30| 回复: 0

云原生数据库 PolarDB(基于PostgreSQL引擎)的日常运维中,数据库管理员(DBA)需要借助一系列系统视图和统计函数来掌握集群运行状态。本文将梳理从数据库全局统计到复制延迟、存储空间、事务健康度等维度的核心监控SQL,助力高效运维。

数据库级别活动统计

此部分SQL用于监控数据库实例的整体活动情况,包括事务提交回滚、数据块读写、元数据操作等关键指标。

-- 查看数据库级别统计
SELECT datname, xact_commit, xact_rollback, blks_read, blks_hit,
       tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted,
       deadlocks
FROM pg_stat_database
 WHERE datname NOT IN ('template0', 'template1');

-- 查看数据库冲突统计(适用于只读实例)
SELECT datname, confl_tablespace, confl_lock, confl_snapshot,
       confl_bufferpin, confl_deadlock
FROM pg_stat_database_conflicts;

复制与流复制状态监控

监控主从复制链路的状态与延迟是保障高可用性的关键。

-- 主库:查看流复制状态
SELECT pid, state, client_addr, sync_state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as send_lag,
       pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) as write_lag,
       pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) as flush_lag,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) as replay_lag
FROM pg_stat_replication;

-- 从库:查看 WAL 接收状态
SELECT status, receive_start_lsn, receive_start_tli,
       pg_wal_lsn_diff(pg_last_wal_receive_lsn(), receive_start_lsn) as received_bytes,
       pg_wal_lsn_diff(pg_last_wal_replay_lsn(), receive_start_lsn) as replayed_bytes,
       pg_last_wal_receive_time, pg_last_wal_replay_time
FROM pg_stat_wal_receiver;

-- 计算复制延迟(字节)
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) AS replication_lag_bytes;

长时间运行操作进度报告

对于VACUUM、创建索引等可能长时间运行的操作,可以通过进度视图实时了解其完成情况。

-- VACUUM 进度
SELECT pid, datname, relname, phase, heap_blks_total, heap_blks_scanned,
       heap_blks_vacuumed, index_vacuum_count, max_dead_tuples,
       num_dead_tuples
FROM pg_stat_progress_vacuum;

-- CREATE INDEX/REINDEX 进度
SELECT pid, datname, relname, index_relname, phase,
       blocks_total, blocks_done, tuples_total, tuples_done,
       partitions_total, partitions_done
FROM pg_stat_progress_create_index;

-- ANALYZE 进度
SELECT pid, datname, relname, phase,
       sample_blks_total, sample_blks_scanned,
       ext_stats_total, ext_stats_computed
FROM pg_stat_progress_analyze;

系统级后台进程统计

监控检查点、WAL写入、归档等后台进程的活动,有助于评估系统底层性能。

-- 后台写入器统计
SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time,
       checkpoint_sync_time, buffers_checkpoint, buffers_clean,
       maxwritten_clean, buffers_backend, buffers_backend_fsync,
       buffers_alloc
FROM pg_stat_bgwriter;

-- WAL 统计
SELECT wal_records, wal_fpi, wal_bytes, wal_buffers_full,
       wal_write_time, wal_sync_time, wal_write_lag,
       wal_sync_lag, wal_replay_lag
FROM pg_stat_wal;

-- 归档进程统计
SELECT archived_count, failed_count, last_archived_wal,
       last_archived_time, last_failed_wal, last_failed_time,
       stats_reset
FROM pg_stat_archiver;

存储空间使用监控

定期检查数据库、表、索引的大小增长趋势,是容量规划与空间清理的基础。

-- 查看各数据库大小
SELECT pg_database.datname,
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database;

-- 查看表大小排序(Top 20)
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
       pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
       pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
 WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
 ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
 LIMIT 20;

-- WAL 日志大小检查
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')) AS current_wal_size;

事务年龄与长事务监控

监控事务ID年龄是预防“事务ID回卷”这一严重故障的核心手段,同时需要关注长事务对系统的影响。

-- 查看持有事务ID的长事务
SELECT pid, age(backend_xid) AS xid_age, age(backend_xmin) AS xmin_age,
       query, state, backend_start, xact_start, query_start
FROM pg_stat_activity
 WHERE backend_xid IS NOT NULL OR backend_xmin IS NOT NULL
 ORDER BY greatest(age(backend_xid), age(backend_xmin)) DESC;

-- 查看各数据库的事务年龄
SELECT datname, age(datfrozenxid) AS frozenxid_age,
       pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
 ORDER BY age(datfrozenxid) DESC;

-- 检查已准备但未提交的事务(两阶段提交)
SELECT transaction, gid, owner, database, prepared
FROM pg_prepared_xacts
 WHERE age(transaction) > 1000000;

SLRU缓存与数据库缓存命中率

SLRU(简单LRU)缓存用于管理事务提交日志等结构,其命中率及数据库缓冲区的命中率是性能调优的重要参考。

-- SLRU 缓存统计
SELECT name, blks_zeroed, blks_hit, blks_read, blks_written,
       blks_exists, flushes, truncates, stats_reset
FROM pg_stat_slru;

-- 计算各数据库的缓存命中率
SELECT datname,
       blks_hit::float / (blks_hit + blks_read) * 100 AS cache_hit_ratio,
       blks_read, blks_hit
FROM pg_stat_database
 WHERE blks_hit + blks_read > 0
 ORDER BY cache_hit_ratio;

连接会话与等待事件分析

分析连接来源、会话状态及等待事件,是进行连接池优化和排查性能瓶颈的常用方法。

-- 查看空闲事务(可能持有锁)
SELECT pid, state, backend_start, query_start,
       age(clock_timestamp(), query_start) AS idle_duration,
       query
FROM pg_stat_activity
 WHERE state = 'idle in transaction'
 ORDER BY query_start;

关键运维注意事项

  • 事务年龄:务必定期检查,避免其接近20亿的事务ID回卷限制,必要时需执行VACUUM FREEZE
  • 复制延迟:监控时建议同时关注字节延迟(replication_lag_bytes)和时间延迟(replay_lag),以全面评估影响。
  • 进度报告pg_stat_progress_* 视图仅在对应操作执行期间有数据,操作结束即清空。
  • 缓存命中率:若数据库级缓存命中率持续低于95%,可考虑适当调大shared_buffers参数。
  • 操作监控:对于预计运行时间较长的VACUUMCREATE INDEX操作,可通过进度视图实时跟踪。
  • WAL管理:若WAL日志大小增长过快,需检查并调整wal_keep_size参数或归档任务的策略与效率。



上一篇:Spring Boot容器分层构建指南:基于layertools优化Docker镜像CI/CD流程
下一篇:AI代理崛起中的隐私与数据安全挑战及发展趋势
您需要登录后才可以回帖 登录 | 立即注册

手机版|小黑屋|网站地图|云栈社区 ( 苏ICP备2022046150号-2 )

GMT+8, 2026-1-11 18:12 , Processed in 0.241288 second(s), 39 queries , Gzip On.

Powered by Discuz! X3.5

© 2025-2025 云栈社区.

快速回复 返回顶部 返回列表