This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

Understand Pigsty’s core concepts, architecture design, and principles. Master high availability, backup recovery, security compliance, and other key capabilities.

Pigsty is a portable, extensible open-source PostgreSQL distribution for building production-grade database services in local environments with declarative configuration and automation. It has a vast ecosystem providing a complete set of tools, scripts, and best practices to bring PostgreSQL to enterprise-grade RDS service levels.

Pigsty’s name comes from PostgreSQL In Great STYle, also understood as Postgres, Infras, Graphics, Service, Toolbox, it’s all Yours—a self-hosted PostgreSQL solution with graphical monitoring that’s all yours. You can find the source code on GitHub, visit the official documentation for more information, or experience the Web UI in the online demo.

pigsty-banner


Why Pigsty? What Can It Do?

PostgreSQL is a sufficiently perfect database kernel, but it needs more tools and systems to become a truly excellent database service. In production environments, you need to manage every aspect of your database: high availability, backup recovery, monitoring alerts, access control, parameter tuning, extension installation, connection pooling, load balancing…

Wouldn’t it be easier if all this complex operational work could be automated? This is precisely why Pigsty was created.

Pigsty provides:

  • Out-of-the-Box PostgreSQL Distribution

    Pigsty deeply integrates 440+ extensions from the PostgreSQL ecosystem, providing out-of-the-box distributed, time-series, geographic, spatial, graph, vector, search, and other multi-modal database capabilities. From kernel to RDS distribution, providing production-grade database services for versions 13-18 on EL/Debian/Ubuntu.

  • Self-Healing High Availability Architecture

    A high availability architecture built on Patroni, Etcd, and HAProxy enables automatic failover for hardware failures with seamless traffic handoff. Primary failure recovery time RTO < 30s, data recovery point RPO ≈ 0. You can perform rolling maintenance and upgrades on the entire cluster without application coordination.

  • Complete Point-in-Time Recovery Capability

    Based on pgBackRest and optional MinIO cluster, providing out-of-the-box PITR point-in-time recovery capability. Giving you the ability to quickly return to any point in time, protecting against software defects and accidental data deletion.

  • Flexible Service Access and Traffic Management

    Through HAProxy, Pgbouncer, and VIP, providing flexible service access patterns for read-write separation, connection pooling, and automatic routing. Delivering stable, reliable, auto-routing, transaction-pooled high-performance database services.

  • Stunning Observability

    A modern observability stack based on Prometheus and Grafana provides unparalleled monitoring best practices. Over three thousand types of monitoring metrics describe every aspect of the system, from global dashboards to CRUD operations on individual objects.

  • Declarative Configuration Management

    Following the Infrastructure as Code philosophy, using declarative configuration to describe the entire environment. You just tell Pigsty “what kind of database cluster you want” without worrying about how to implement it—the system automatically adjusts to the desired state.

  • Modular Architecture Design

    A modular architecture design that can be freely combined to suit different scenarios. Beyond the core PostgreSQL module, it also provides optional modules for Redis, MinIO, Etcd, FerretDB, and support for various PG-compatible kernels.

  • Solid Security Best Practices

    Industry-leading security best practices: self-signed CA certificate encryption, AES encrypted backups, scram-sha-256 encrypted passwords, out-of-the-box ACL model, HBA rule sets following the principle of least privilege, ensuring data security.

  • Simple and Easy Deployment

    All dependencies are pre-packaged for one-click installation in environments without internet access. Local sandbox environments can run on micro VMs with 1 core and 2GB RAM, providing functionality identical to production environments. Provides Vagrant-based local sandboxes and Terraform-based cloud deployments.


What Pigsty Is Not

Pigsty is not a traditional, all-encompassing PaaS (Platform as a Service) system.

  • Pigsty doesn’t provide basic hardware resources. It runs on nodes you provide, whether bare metal, VMs, or cloud instances, but it doesn’t create or manage these resources itself (though it provides Terraform templates to simplify cloud resource preparation).

  • Pigsty is not a container orchestration system. It runs directly on the operating system, not requiring Kubernetes or Docker as infrastructure. Of course, it can coexist with these systems and provides a Docker module for running stateless applications.

  • Pigsty is not a general database management tool. It focuses on PostgreSQL and its ecosystem. While it also supports peripheral components like Redis, Etcd, and MinIO, the core is always built around PostgreSQL.

  • Pigsty won’t lock you in. It’s built on open-source components, doesn’t modify the PostgreSQL kernel, and introduces no proprietary protocols. You can continue using your well-managed PostgreSQL clusters anytime without Pigsty.

Pigsty doesn’t restrict how you should or shouldn’t build your database services. For example:

  • Pigsty provides good parameter defaults and configuration templates, but you can override any parameter.
  • Pigsty provides a declarative API, but you can still use underlying tools (Ansible, Patroni, pgBackRest, etc.) for manual management.
  • Pigsty can manage the complete lifecycle, or you can use only its monitoring system to observe existing database instances or RDS.

Pigsty provides a different level of abstraction than the hardware layer—it works at the database service layer, focusing on how to deliver PostgreSQL at its best, rather than reinventing the wheel.


Evolution of PostgreSQL Deployment

To understand Pigsty’s value, let’s review the evolution of PostgreSQL deployment approaches.

Manual Deployment Era

In traditional deployment, DBAs needed to manually install and configure PostgreSQL, manually set up replication, manually configure monitoring, and manually handle failures. The problems with this approach are obvious:

  • Low efficiency: Each instance requires repeating many manual operations, prone to errors.
  • Lack of standardization: Databases configured by different DBAs can vary greatly, making maintenance difficult.
  • Poor reliability: Failure handling depends on manual intervention, with long recovery times and susceptibility to human error.
  • Weak observability: Lack of unified monitoring, making problem discovery and diagnosis difficult.

Managed Database Era

To solve these problems, cloud providers offer managed database services (RDS). Cloud RDS does solve some operational issues, but also brings new challenges:

  • High cost: Managed services typically charge multiples to dozens of times hardware cost as “service fees.”
  • Vendor lock-in: Migration is difficult, tied to specific cloud platforms.
  • Limited functionality: Cannot use certain advanced features, extensions are restricted, parameter tuning is limited.
  • Data sovereignty: Data stored in the cloud, reducing autonomy and control.

Local RDS Era

Pigsty represents a third approach: building database services in local environments that match or exceed cloud RDS.

Pigsty combines the advantages of both approaches:

  • High automation: One-click deployment, automatic configuration, self-healing failures—as convenient as cloud RDS.
  • Complete autonomy: Runs on your own infrastructure, data completely in your own hands.
  • Extremely low cost: Run enterprise-grade database services at near-pure-hardware costs.
  • Complete functionality: Unlimited use of PostgreSQL’s full capabilities and ecosystem extensions.
  • Open architecture: Based on open-source components, no vendor lock-in, free to migrate anytime.

This approach is particularly suitable for:

  • Private and hybrid clouds: Enterprises needing to run databases in local environments.
  • Cost-sensitive users: Organizations looking to reduce database TCO.
  • High-security scenarios: Critical data requiring complete autonomy and control.
  • PostgreSQL power users: Scenarios requiring advanced features and rich extensions.
  • Development and testing: Quickly setting up databases locally that match production environments.

What’s Next

Now that you understand Pigsty’s basic concepts, you can:

1 - Architecture

Pigsty’s modular architecture — declarative composition, on-demand customization, flexible deployment.

Pigsty uses a modular architecture with a declarative interface. You can freely combine modules like building blocks as needed.


Modules

Pigsty uses a modular design with six main default modules: PGSQL, INFRA, NODE, ETCD, REDIS, and MINIO.

  • PGSQL: Self-healing HA Postgres clusters powered by Patroni, Pgbouncer, HAproxy, PgBackrest, and more.
  • INFRA: Local software repo, Nginx, Grafana, Victoria, AlertManager, Blackbox Exporter—the complete observability stack.
  • NODE: Tune nodes to desired state—hostname, timezone, NTP, ssh, sudo, haproxy, docker, vector, keepalived.
  • ETCD: Distributed key-value store as DCS for HA Postgres clusters: consensus leader election/config management/service discovery.
  • REDIS: Redis servers supporting standalone primary-replica, sentinel, and cluster modes with full monitoring.
  • MINIO: S3-compatible simple object storage that can serve as an optional backup destination for PG databases.

You can declaratively compose them freely. If you only want host monitoring, installing the INFRA module on infrastructure nodes and the NODE module on managed nodes is sufficient. The ETCD and PGSQL modules are used to build HA PG clusters—installing these modules on multiple nodes automatically forms a high-availability database cluster. You can reuse Pigsty infrastructure and develop your own modules; REDIS and MINIO can serve as examples. More modules will be added—preliminary support for Mongo and MySQL is already on the roadmap.

Note that all modules depend strongly on the NODE module: in Pigsty, nodes must first have the NODE module installed to be managed before deploying other modules. When nodes (by default) use the local software repo for installation, the NODE module has a weak dependency on the INFRA module. Therefore, the admin/infrastructure nodes with the INFRA module complete the bootstrap process in the deploy.yml playbook, resolving the circular dependency.

pigsty-sandbox


Standalone Installation

By default, Pigsty installs on a single node (physical/virtual machine). The deploy.yml playbook installs INFRA, ETCD, PGSQL, and optionally MINIO modules on the current node, giving you a fully-featured observability stack (Prometheus, Grafana, Loki, AlertManager, PushGateway, BlackboxExporter, etc.), plus a built-in PostgreSQL standalone instance as a CMDB, ready to use out of the box (cluster name pg-meta, database name meta).

This node now has a complete self-monitoring system, visualization tools, and a Postgres database with PITR auto-configured (HA unavailable since you only have one node). You can use this node as a devbox, for testing, running demos, and data visualization/analysis. Or, use this node as an admin node to deploy and manage more nodes!

pigsty-arch


Monitoring

The installed standalone meta node can serve as an admin node and monitoring center to bring more nodes and database servers under its supervision and control.

Pigsty’s monitoring system can be used independently. If you want to install the Prometheus/Grafana observability stack, Pigsty provides best practices! It offers rich dashboards for host nodes and PostgreSQL databases. Whether or not these nodes or PostgreSQL servers are managed by Pigsty, with simple configuration, you immediately have a production-grade monitoring and alerting system, bringing existing hosts and PostgreSQL under management.

pigsty-dashboard.jpg


HA PostgreSQL Clusters

Pigsty helps you own your own production-grade HA PostgreSQL RDS service anywhere.

To create such an HA PostgreSQL cluster/RDS service, you simply describe it with a short config and run the playbook to create it:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }
    10.10.10.12: { pg_seq: 2, pg_role: replica }
    10.10.10.13: { pg_seq: 3, pg_role: replica }
  vars: { pg_cluster: pg-test }
$ bin/pgsql-add pg-test  # Initialize cluster 'pg-test'

In less than 10 minutes, you’ll have a PostgreSQL database cluster with service access, monitoring, backup PITR, and HA fully configured.

pigsty-ha.png

Hardware failures are covered by the self-healing HA architecture provided by patroni, etcd, and haproxy—in case of primary failure, automatic failover executes within 30 seconds by default. Clients don’t need to modify config or restart applications: Haproxy uses patroni health checks for traffic distribution, and read-write requests are automatically routed to the new cluster primary, avoiding split-brain issues. This process is seamless—for example, in case of replica failure or planned switchover, clients experience only a momentary flash of the current query.

Software failures, human errors, and datacenter-level disasters are covered by pgbackrest and the optional MinIO cluster. This provides local/cloud PITR capabilities and, in case of datacenter failure, offers cross-region replication and disaster recovery.

1.1 - Nodes

A node is an abstraction of hardware/OS resources - physical machines, bare metal, VMs, or containers/pods.

A node is an abstraction of hardware/OS resources. It can be a physical machine, bare metal, virtual machine, or container/pod.

Any machine running a Linux OS with systemd and standard CPU/memory/disk/network resources can be treated as a node.

Nodes can have modules installed. Pigsty has several node types, distinguished by which modules are deployed:

TypeDescription
Regular NodeA node managed by Pigsty
ADMIN NodeThe node that runs Ansible to issue management commands
INFRA NodeNodes with the INFRA module installed
ETCD NodeNodes with the ETCD module for DCS
MINIO NodeNodes with the MINIO module for object storage
PGSQL NodeNodes with the PGSQL module installed
Nodes with other modules…

In a singleton Pigsty deployment, multiple roles converge on one node: it serves as the regular node, admin node, infra node, ETCD node, and database node simultaneously.


Regular Node

Nodes managed by Pigsty can have modules installed. The node.yml playbook configures nodes to the desired state. A regular node may run the following services:

ComponentPortDescriptionStatus
node_exporter9100Host metrics exporterEnabled
haproxy9101HAProxy load balancer (admin port)Enabled
vector9598Log collection agentEnabled
docker9323Container runtime supportOptional
keepalivedn/aL2 VIP for node clusterOptional
keepalived_exporter9650Keepalived status monitorOptional

Here, node_exporter exposes host metrics, vector sends logs to the collection system, and haproxy provides load balancing. These three are enabled by default. Docker, keepalived, and keepalived_exporter are optional and can be enabled as needed.


ADMIN Node

A Pigsty deployment has exactly one admin node—the node that runs Ansible playbooks and issues control/deployment commands.

This node has ssh/sudo access to all other nodes. Admin node security is critical; ensure access is strictly controlled.

During single-node installation and configuration, the current node becomes the admin node. However, alternatives exist. For example, if your laptop can SSH to all managed nodes and has Ansible installed, it can serve as the admin node—though this isn’t recommended for production.

For instance, you might use your laptop to manage a Pigsty VM in the cloud. In this case, your laptop is the admin node.

In serious production environments, the admin node is typically 1-2 dedicated DBA machines. In resource-constrained setups, INFRA nodes often double as admin nodes since all INFRA nodes have Ansible installed by default.


INFRA Node

A Pigsty deployment may have 1 or more INFRA nodes; large production environments typically have 2-3.

The infra group in the inventory defines which nodes are INFRA nodes. These nodes run the INFRA module with these components:

ComponentPortDescription
nginx80/443Web UI, local software repository
grafana3000Visualization platform
victoriaMetrics8428Time-series database (metrics)
victoriaLogs9428Log collection server
victoriaTraces10428Trace collection server
vmalert8880Alerting and derived metrics
alertmanager9093Alert aggregation and routing
blackbox_exporter9115Blackbox probing (ping nodes/VIPs)
dnsmasq53Internal DNS resolution
chronyd123NTP time server
ansible-Playbook execution

Nginx serves as the module’s entry point, providing the web UI and local software repository. With multiple INFRA nodes, services on each are independent, but you can access all monitoring data sources from any INFRA node’s Grafana.

Note: The INFRA module is licensed under AGPLv3 due to Grafana. As an exception, if you only use Nginx/Victoria components without Grafana, you’re effectively under Apache-2.0.


ETCD Node

The ETCD module provides Distributed Consensus Service (DCS) for PostgreSQL high availability.

The etcd group in the inventory defines ETCD nodes. These nodes run etcd servers on two ports:

ComponentPortDescription
etcd2379ETCD key-value store (client port)
etcd2380ETCD cluster peer communication

MINIO Node

The MINIO module provides optional backup storage for PostgreSQL.

The minio group in the inventory defines MinIO nodes. These nodes run MinIO servers on:

ComponentPortDescription
minio9000MinIO S3 API endpoint
minio9001MinIO admin console

PGSQL Node

Nodes with the PGSQL module are called PGSQL nodes. Node and PostgreSQL instance have a 1:1 deployment—one PG instance per node.

PGSQL nodes can borrow identity from their PostgreSQL instance—controlled by node_id_from_pg, defaulting to true, meaning the node name is set to the PG instance name.

PGSQL nodes run these additional components beyond regular node services:

ComponentPortDescriptionStatus
postgres5432PostgreSQL database serverEnabled
pgbouncer6432PgBouncer connection poolEnabled
patroni8008Patroni HA managementEnabled
pg_exporter9630PostgreSQL metrics exporterEnabled
pgbouncer_exporter9631PgBouncer metrics exporterEnabled
pgbackrest_exporter9854pgBackRest metrics exporterEnabled
vip-managern/aBinds L2 VIP to cluster primaryOptional
{{ pg_cluster }}-primary5433HAProxy service: pooled read/writeEnabled
{{ pg_cluster }}-replica5434HAProxy service: pooled read-onlyEnabled
{{ pg_cluster }}-default5436HAProxy service: primary direct connectionEnabled
{{ pg_cluster }}-offline5438HAProxy service: offline readEnabled
{{ pg_cluster }}-<service>543xHAProxy service: custom PostgreSQL servicesCustom

The vip-manager is only enabled when users configure a PG VIP. Additional custom services can be defined in pg_services, exposed via haproxy using additional service ports.


Node Relationships

Regular nodes typically reference an INFRA node via the admin_ip parameter as their infrastructure provider. For example, with global admin_ip = 10.10.10.10, all nodes use infrastructure services at this IP.

Parameters that reference ${admin_ip}:

ParameterModuleDefault ValueDescription
repo_endpointINFRAhttp://${admin_ip}:80Software repo URL
repo_upstream.baseurlINFRAhttp://${admin_ip}/pigstyLocal repo baseurl
infra_portal.endpointINFRA${admin_ip}:<port>Nginx proxy backend
dns_recordsINFRA["${admin_ip} i.pigsty", ...]DNS records
node_default_etc_hostsNODE["${admin_ip} i.pigsty"]Default static DNS
node_etc_hostsNODE-Custom static DNS
node_dns_serversNODE["${admin_ip}"]Dynamic DNS servers
node_ntp_serversNODE-NTP servers (optional)

Typically the admin node and INFRA node coincide. With multiple INFRA nodes, the admin node is usually the first one; others serve as backups.

In large-scale production deployments, you might separate the Ansible admin node from INFRA module nodes. For example, use 1-2 small dedicated hosts under the DBA team as the control hub (ADMIN nodes), and 2-3 high-spec physical machines as monitoring infrastructure (INFRA nodes).

Typical node counts by deployment scale:

ScaleADMININFRAETCDMINIOPGSQL
Single-node11101
3-node13303
Small prod1230N
Large prod2354+N

1.2 - PGSQL Architecture

PostgreSQL module component interactions and data flow.

The PGSQL module organizes PostgreSQL in production as clusterslogical entities composed of a group of database instances associated by primary-replica relationships.

Each cluster is an autonomous business unit consisting of at least one primary instance, exposing capabilities through services.

There are four core entities in Pigsty’s PGSQL module:

  • Cluster: An autonomous PostgreSQL business unit serving as the top-level namespace for other entities.
  • Service: A named abstraction that exposes capabilities, routes traffic, and exposes services using node ports.
  • Instance: A single PostgreSQL server consisting of running processes and database files on a single node.
  • Node: A hardware resource abstraction running Linux + Systemd environment—can be bare metal, VM, container, or Pod.

Along with two business entities—“Database” and “Role”—these form the complete logical view as shown below:

pigsty-er.jpg

Naming Conventions (following Pigsty’s early constraints)

  • Cluster names should be valid DNS domain names without any dots, regex: [a-zA-Z0-9-]+
  • Service names should be prefixed with the cluster name and suffixed with specific words: primary, replica, offline, delayed, connected by -.
  • Instance names are prefixed with the cluster name and suffixed with a positive integer instance number, connected by -, e.g., ${cluster}-${seq}.
  • Nodes are identified by their primary internal IP address; since databases and hosts are deployed 1:1 in the PGSQL module, hostnames typically match instance names.

1.3 - INFRA Arch

Infrastructure architecture, components and functionality in Pigsty.

Running production-grade, highly available PostgreSQL clusters typically requires a comprehensive set of infrastructure services (foundation) for support, such as monitoring and alerting, log collection, time synchronization, DNS resolution, and local software repositories. Pigsty provides the INFRA module to solve this problem — it’s an optional module, but we strongly recommend enabling it.


Overview

The diagram below shows the architecture of a single-node deployment. The right half represents the components included in the INFRA module:

ComponentTypeDescription
NginxWeb ServerUnified entry for WebUI, local repo, reverse proxy
CACertificateIssues encryption certificates within the environment
GrafanaVisualizationPresents metrics, logs, and traces; hosts dashboards, reports, and custom data apps
VictoriaMetricsTime Series DBScrapes all metrics, Prometheus API compatible, provides VMUI query interface
VictoriaLogsLog PlatformCentralized log storage; all nodes run Vector by default, pushing logs here
VictoriaTracesTracingCollects slow SQL, service traces, and other tracing data
VMAlertAlert EngineEvaluates alerting rules, pushes events to Alertmanager
AlertManagerAlert ManagerAggregates alerts, dispatches notifications via email, Webhook, etc.
BlackboxExporterBlackbox ProbeProbes reachability of IPs/VIPs/URLs
DNSMASQDNS ServiceProvides DNS resolution for domains used within Pigsty [Optional]
ChronydTime SyncProvides NTP time synchronization to ensure all nodes have consistent time [Optional]

pigsty-arch


Nginx

Nginx is the access entry point for all WebUI services in Pigsty, using ports 80 / 443 for HTTP/HTTPS by default. Live Demo

Infrastructure components with WebUIs can be exposed uniformly through Nginx, such as Grafana, VictoriaMetrics (VMUI), AlertManager, and HAProxy console. Additionally, local yum/apt repo and other static resources are served internally via Nginx.

Nginx configures local web servers or reverse proxy servers based on definitions in infra_portal.

infra_portal:
  home : { domain: i.pigsty }

By default, it exposes Pigsty’s admin homepage: i.pigsty. You can expose more services; see Nginx Management for details.

Pigsty allows rich customization of Nginx as a local file server or reverse proxy, with self-signed or real HTTPS certificates. For more information, see: Tutorial: Nginx—Expose Web Services via Proxy and Tutorial: Certbot—Request and Renew HTTPS Certificates


Local Software Repository

Pigsty creates a local software repository on the Infra node during installation to accelerate subsequent software installations. Live Demo

This repository defaults to the /www/pigsty directory, served by Nginx, mounted at the /pigsty path, accessible via ports 80/443.

  • http://<admin_ip>/pigsty / http://i.pigsty/pigsty

Pigsty supports offline installation, which essentially pre-copies a prepared local software repository to the target environment. When Pigsty performs production deployment and needs to create a local software repository, if it finds the /www/pigsty/repo_complete marker file already exists locally, it skips downloading packages from upstream and uses existing packages directly, avoiding internet downloads.

For more information, see: Config: INFRA - REPO


Grafana

Grafana is the core component of Pigsty’s monitoring system, used for visualizing metrics, logs, and various information. Demo

It listens on port 3000 by default, accessible via IP:3000 or http://g.pigsty.

Pigsty provides pre-built Dashboards based on VictoriaMetrics/Logs/Traces, with one-click drill-down and roll-up via URL jumps for rapid troubleshooting.

Grafana can also serve as a low-code visualization platform, so ECharts, victoriametrics-datasource, victorialogs-datasource plugins are installed by default, with Vector/Victoria datasources registered uniformly as vmetrics-*, vlogs-*, vtraces-* for easy custom dashboard extension.

For more information, see: Config: INFRA - GRAFANA.


Victoria Observability Suite

Pigsty v4.0 uses VictoriaMetrics components to replace Prometheus/Loki, providing a unified observability platform:

  • VictoriaMetrics: Listens on port 8428 by default, accessible via http://p.pigsty or https://i.pigsty/vmetrics/ for VMUI, compatible with PromQL, remote read/write protocols, and Alertmanager API.
  • VMAlert: Runs alerting rules on port 8880, sends events to Alertmanager.
  • VictoriaLogs: Listens on port 9428 by default, searchable via https://i.pigsty/vlogs/. Node-side Vector pushes system logs, PostgreSQL logs, etc. structured here.
  • VictoriaTraces: Listens on port 10428, provides Jaeger-compatible interface for slow SQL and trace analysis.
  • Alertmanager: Listens on port 9059, accessible via http://a.pigsty or https://i.pigsty/alertmgr/ for alert routing and notification management.
  • Blackbox Exporter: Listens on port 9115 by default, responsible for ICMP/TCP/HTTP blackbox probing.

For more information, see: Config: INFRA - VICTORIA and Config: INFRA - PROMETHEUS.


Ansible

Pigsty installs Ansible on the meta node by default. Ansible is a popular ops tool with declarative config style and idempotent playbook design, greatly reducing system maintenance complexity.


DNSMASQ

DNSMASQ provides DNS resolution within the environment; domains from other modules are registered with the DNSMASQ service on INFRA nodes.

DNS records are placed in the /etc/hosts.d/ directory on all INFRA nodes by default.

For more information, see: Config: INFRA - DNS and Tutorial: DNS—Configure Domain Resolution


Chronyd

NTP service synchronizes time across all nodes in the environment (optional).

For more information, see: Config: NODES - NTP


Others

EndpointComponentNative PortNotesPublic Demo
/Nginx80/443Homepage, local repo, file serverdemo.pigsty.io
/ui/Grafana3000Grafana dashboard entrydemo.pigsty.io/ui/
/vmetrics/VictoriaMetrics8428Time series DB Web UIdemo.pigsty.io/vmetrics/
/vlogs/VictoriaLogs9428Log DB Web UIdemo.pigsty.io/vlogs/
/vtraces/VictoriaTraces10428Tracing Web UIdemo.pigsty.io/vtraces/
/vmalert/VMAlert8880Alert rule managementdemo.pigsty.io/vmalert/
/alertmgr/AlertManager9059Alert management Web UIdemo.pigsty.io/alertmgr/
/blackbox/Blackbox9115Blackbox probe

A standard Pigsty deployment includes an INFRA module that provides services for managed nodes and database clusters:

The INFRA module is not mandatory for HA PostgreSQL—for example, in slim install mode, the Infra module is not installed. However, the INFRA module provides essential supporting services for running production-grade HA PostgreSQL clusters; it’s strongly recommended to enable it for the complete Pigsty DBaaS experience.

If you already have your own infrastructure (Nginx, local repo, monitoring system, DNS, NTP), you can disable the INFRA module and modify config to use existing infrastructure.

ComponentPortDefault DomainDescription
Nginx80/443i.pigstyWeb portal, local repo
Grafana3000g.pigstyVisualization platform
VictoriaMetrics8428p.pigstyTime series DB (VMUI, Prometheus compatible)
VictoriaLogs9428-Log database (receives Vector push)
VictoriaTraces10428-Trace / slow SQL storage
VMAlert8880-Metrics computation, alerting rules
AlertManager9059a.pigstyAlert aggregation and dispatch
BlackboxExporter9115-Blackbox monitoring probes
DNSMasq53-DNS server
Chronyd123-NTP time server

2 - Cluster Model

How Pigsty abstracts different functionalities into modules, and the logical model of these modules.

In Pigsty, functional modules are organized as “clusters”. Each cluster is an Ansible group containing several node resources with defined instances.

PGSQL Module Overview: Key Concepts and Architecture Details

The PGSQL module is organized as clusters in production environments, which are logical entities composed of a set of database instances associated by primary-replica relationships. Each database cluster is an autonomous business service unit consisting of at least one database (primary) instance.


Entity Relationship

Let’s start with the ER diagram. In Pigsty’s PGSQL module, there are four core entities:

  • Cluster: An autonomous PostgreSQL business unit, serving as the top-level namespace for other entities.
  • Service: A named abstraction of cluster capability that routes traffic and exposes PostgreSQL services using node ports.
  • Instance: A single PostgreSQL server consisting of a running process and database files on a single node.
  • Node: An abstraction of hardware resources, which can be bare metal, virtual machines, or even Kubernetes pods.

Naming Conventions

  • Cluster names should be valid DNS domain names without dots, matching the regex: [a-zA-Z0-9-]+
  • Service names should be prefixed with the cluster name and suffixed with specific words: primary, replica, offline, delayed, connected by -.
  • Instance names are prefixed with the cluster name and suffixed with a positive integer instance number, connected by -, e.g., ${cluster}-${seq}.
  • Nodes are identified by their primary internal IP address. Since databases and hosts are deployed 1:1 in the PGSQL module, the hostname is usually the same as the instance name.

Identity Parameters

Pigsty uses identity parameters to identify entities: PG_ID.

Besides the node IP address, pg_cluster, pg_role, and pg_seq are the minimum required parameters for defining a PostgreSQL cluster. Using the sandbox environment test cluster pg-test as an example:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }
    10.10.10.12: { pg_seq: 2, pg_role: replica }
    10.10.10.13: { pg_seq: 3, pg_role: replica }
  vars:
    pg_cluster: pg-test

The three cluster members are shown below:

ClusterSeqRoleHost / IPInstanceServiceNode Name
pg-test1primary10.10.10.11pg-test-1pg-test-primarypg-test-1
pg-test2replica10.10.10.12pg-test-2pg-test-replicapg-test-2
pg-test3replica10.10.10.13pg-test-3pg-test-replicapg-test-3

This includes:

  • One cluster: The cluster is named pg-test.
  • Two roles: primary and replica.
  • Three instances: The cluster consists of three instances: pg-test-1, pg-test-2, pg-test-3.
  • Three nodes: The cluster is deployed on three nodes: 10.10.10.11, 10.10.10.12, and 10.10.10.13.
  • Four services:

In the monitoring system (Prometheus/Grafana/Loki), corresponding metrics will be labeled with these identity parameters:

pg_up{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", job="pgsql"}
pg_up{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", job="pgsql"}
pg_up{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", job="pgsql"}
pg_up{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", job="pgsql"}

3 - Infra as Code

Pigsty uses Infrastructure as Code (IaC) philosophy to manage all components, providing declarative management for large-scale clusters.

Pigsty follows the IaC and GitOPS philosophy: use a declarative config inventory to describe the entire environment, and materialize it through idempotent playbooks.

Users describe their desired state declaratively through parameters, and playbooks idempotently adjust target nodes to reach that state. This is similar to Kubernetes CRDs & Operators, but Pigsty implements this functionality on bare metal and virtual machines through Ansible.

Pigsty was born to solve the operational management problem of ultra-large-scale PostgreSQL clusters. The idea behind it is simple — we need the ability to replicate the entire infrastructure (100+ database clusters + PG/Redis + observability) on ready servers within ten minutes. No GUI + ClickOps can complete such a complex task in such a short time, making CLI + IaC the only choice — it provides precise, efficient control.

The config inventory pigsty.yml file describes the state of the entire deployment. Whether it’s production (prod), staging, test, or development (devbox) environments, the difference between infrastructures lies only in the config inventory, while the deployment delivery logic is exactly the same.

You can use git for version control and auditing of this deployment “seed/gene”, and Pigsty even supports storing the config inventory as database tables in PostgreSQL CMDB, further achieving Infra as Data capability. Seamlessly integrate with your existing workflows.

IaC is designed for professional users and enterprise scenarios but is also deeply optimized for individual developers and SMBs. Even if you’re not a professional DBA, you don’t need to understand these hundreds of adjustment knobs and switches. All parameters come with well-performing default values. You can get an out-of-the-box single-node database with zero configuration; Simply add two more IP addresses to get an enterprise-grade high-availability PostgreSQL cluster.


Declare Modules

Take the following default config snippet as an example. This config describes a node 10.10.10.10 with INFRA, NODE, ETCD, and PGSQL modules installed.

# monitoring, alerting, DNS, NTP and other infrastructure cluster...
infra: { hosts: { 10.10.10.10: { infra_seq: 1 } } }

# minio cluster, s3 compatible object storage
minio: { hosts: { 10.10.10.10: { minio_seq: 1 } }, vars: { minio_cluster: minio } }

# etcd cluster, used as DCS for PostgreSQL high availability
etcd: { hosts: { 10.10.10.10: { etcd_seq: 1 } }, vars: { etcd_cluster: etcd } }

# PGSQL example cluster: pg-meta
pg-meta: { hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary }, vars: { pg_cluster: pg-meta } }

To actually install these modules, execute the following playbooks:

./infra.yml -l 10.10.10.10  # Initialize infra module on node 10.10.10.10
./etcd.yml  -l 10.10.10.10  # Initialize etcd module on node 10.10.10.10
./minio.yml -l 10.10.10.10  # Initialize minio module on node 10.10.10.10
./pgsql.yml -l 10.10.10.10  # Initialize pgsql module on node 10.10.10.10

Declare Clusters

You can declare PostgreSQL database clusters by installing the PGSQL module on multiple nodes, making them a service unit:

For example, to deploy a three-node high-availability PostgreSQL cluster using streaming replication on the following three Pigsty-managed nodes, you can add the following definition to the all.children section of the config file pigsty.yml:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }
    10.10.10.12: { pg_seq: 2, pg_role: replica }
    10.10.10.13: { pg_seq: 3, pg_role: offline }
  vars:  { pg_cluster: pg-test }

After defining, you can use playbooks to create the cluster:

bin/pgsql-add pg-test   # Create the pg-test cluster

pigsty-iac.jpg

You can use different instance roles such as primary, replica, offline, delayed, sync standby; as well as different clusters: such as standby clusters, Citus clusters, and even Redis / MinIO / Etcd clusters


Customize Cluster Content

Not only can you define clusters declaratively, but you can also define databases, users, services, and HBA rules within the cluster. For example, the following config file deeply customizes the content of the default pg-meta single-node database cluster:

Including: declaring six business databases and seven business users, adding an extra standby service (synchronous standby, providing read capability with no replication delay), defining some additional pg_hba rules, an L2 VIP address pointing to the cluster primary, and a customized backup strategy.

pg-meta:
  hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary , pg_offline_query: true } }
  vars:
    pg_cluster: pg-meta
    pg_databases:                       # define business databases on this cluster, array of database definition
      - name: meta                      # REQUIRED, `name` is the only mandatory field of a database definition
        baseline: cmdb.sql              # optional, database sql baseline path, (relative path among ansible search path, e.g files/)
        pgbouncer: true                 # optional, add this database to pgbouncer database list? true by default
        schemas: [pigsty]               # optional, additional schemas to be created, array of schema names
        extensions:                     # optional, additional extensions to be installed: array of `{name[,schema]}`
          - { name: postgis , schema: public }
          - { name: timescaledb }
        comment: pigsty meta database   # optional, comment string for this database
        owner: postgres                # optional, database owner, postgres by default
        template: template1            # optional, which template to use, template1 by default
        encoding: UTF8                 # optional, database encoding, UTF8 by default. (MUST same as template database)
        locale: C                      # optional, database locale, C by default.  (MUST same as template database)
        lc_collate: C                  # optional, database collate, C by default. (MUST same as template database)
        lc_ctype: C                    # optional, database ctype, C by default.   (MUST same as template database)
        tablespace: pg_default         # optional, default tablespace, 'pg_default' by default.
        allowconn: true                # optional, allow connection, true by default. false will disable connect at all
        revokeconn: false              # optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)
        register_datasource: true      # optional, register this database to grafana datasources? true by default
        connlimit: -1                  # optional, database connection limit, default -1 disable limit
        pool_auth_user: dbuser_meta    # optional, all connection to this pgbouncer database will be authenticated by this user
        pool_mode: transaction         # optional, pgbouncer pool mode at database level, default transaction
        pool_size: 64                  # optional, pgbouncer pool size at database level, default 64
        pool_size_reserve: 32          # optional, pgbouncer pool size reserve at database level, default 32
        pool_size_min: 0               # optional, pgbouncer pool size min at database level, default 0
        pool_max_db_conn: 100          # optional, max database connections at database level, default 100
      - { name: grafana  ,owner: dbuser_grafana  ,revokeconn: true ,comment: grafana primary database }
      - { name: bytebase ,owner: dbuser_bytebase ,revokeconn: true ,comment: bytebase primary database }
      - { name: kong     ,owner: dbuser_kong     ,revokeconn: true ,comment: kong the api gateway database }
      - { name: gitea    ,owner: dbuser_gitea    ,revokeconn: true ,comment: gitea meta database }
      - { name: wiki     ,owner: dbuser_wiki     ,revokeconn: true ,comment: wiki meta database }
    pg_users:                           # define business users/roles on this cluster, array of user definition
      - name: dbuser_meta               # REQUIRED, `name` is the only mandatory field of a user definition
        password: DBUser.Meta           # optional, password, can be a scram-sha-256 hash string or plain text
        login: true                     # optional, can log in, true by default  (new biz ROLE should be false)
        superuser: false                # optional, is superuser? false by default
        createdb: false                 # optional, can create database? false by default
        createrole: false               # optional, can create role? false by default
        inherit: true                   # optional, can this role use inherited privileges? true by default
        replication: false              # optional, can this role do replication? false by default
        bypassrls: false                # optional, can this role bypass row level security? false by default
        pgbouncer: true                 # optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)
        connlimit: -1                   # optional, user connection limit, default -1 disable limit
        expire_in: 3650                 # optional, now + n days when this role is expired (OVERWRITE expire_at)
        expire_at: '2030-12-31'         # optional, YYYY-MM-DD 'timestamp' when this role is expired  (OVERWRITTEN by expire_in)
        comment: pigsty admin user      # optional, comment string for this user/role
        roles: [dbrole_admin]           # optional, belonged roles. default roles are: dbrole_{admin,readonly,readwrite,offline}
        parameters: {}                  # optional, role level parameters with `ALTER ROLE SET`
        pool_mode: transaction          # optional, pgbouncer pool mode at user level, transaction by default
        pool_connlimit: -1              # optional, max database connections at user level, default -1 disable limit
      - {name: dbuser_view     ,password: DBUser.Viewer   ,pgbouncer: true ,roles: [dbrole_readonly], comment: read-only viewer for meta database}
      - {name: dbuser_grafana  ,password: DBUser.Grafana  ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for grafana database   }
      - {name: dbuser_bytebase ,password: DBUser.Bytebase ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for bytebase database  }
      - {name: dbuser_kong     ,password: DBUser.Kong     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for kong api gateway   }
      - {name: dbuser_gitea    ,password: DBUser.Gitea    ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for gitea service      }
      - {name: dbuser_wiki     ,password: DBUser.Wiki     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for wiki.js service    }
    pg_services:                        # extra services in addition to pg_default_services, array of service definition
      # standby service will route {ip|name}:5435 to sync replica's pgbouncer (5435->6432 standby)
      - name: standby                   # required, service name, the actual svc name will be prefixed with `pg_cluster`, e.g: pg-meta-standby
        port: 5435                      # required, service exposed port (work as kubernetes service node port mode)
        ip: "*"                         # optional, service bind ip address, `*` for all ip by default
        selector: "[]"                  # required, service member selector, use JMESPath to filter inventory
        dest: default                   # optional, destination port, default|postgres|pgbouncer|<port_number>, 'default' by default
        check: /sync                    # optional, health check url path, / by default
        backup: "[? pg_role == `primary`]"  # backup server selector
        maxconn: 3000                   # optional, max allowed front-end connection
        balance: roundrobin             # optional, haproxy load balance algorithm (roundrobin by default, other: leastconn)
        options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    pg_vip_enabled: true
    pg_vip_address: 10.10.10.2/24
    pg_vip_interface: eth1
    node_crontab:  # make a full backup 1 am everyday
      - '00 01 * * * postgres /pg/bin/pg-backup full'

Declare Access Control

You can also deeply customize Pigsty’s access control capabilities through declarative configuration. For example, the following config file provides deep security customization for the pg-meta cluster:

Uses the three-node core cluster template: crit.yml, to ensure data consistency is prioritized with zero data loss during failover. Enables L2 VIP and restricts database and connection pool listening addresses to local loopback IP + internal network IP + VIP three specific addresses. The template enforces Patroni’s SSL API and Pgbouncer’s SSL, and in HBA rules, enforces SSL usage for accessing the database cluster. Also enables the $libdir/passwordcheck extension in pg_libs to enforce password strength security policy.

Finally, a separate pg-meta-delay cluster is declared as pg-meta’s delayed replica from one hour ago, for emergency data deletion recovery.

pg-meta:      # 3 instance postgres cluster `pg-meta`
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
    10.10.10.11: { pg_seq: 2, pg_role: replica }
    10.10.10.12: { pg_seq: 3, pg_role: replica , pg_offline_query: true }
  vars:
    pg_cluster: pg-meta
    pg_conf: crit.yml
    pg_users:
      - { name: dbuser_meta , password: DBUser.Meta   , pgbouncer: true , roles: [ dbrole_admin ] , comment: pigsty admin user }
      - { name: dbuser_view , password: DBUser.Viewer , pgbouncer: true , roles: [ dbrole_readonly ] , comment: read-only viewer for meta database }
    pg_databases:
      - {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [{name: postgis, schema: public}, {name: timescaledb}]}
    pg_default_service_dest: postgres
    pg_services:
      - { name: standby ,src_ip: "*" ,port: 5435 , dest: default ,selector: "[]" , backup: "[? pg_role == `primary`]" }
    pg_vip_enabled: true
    pg_vip_address: 10.10.10.2/24
    pg_vip_interface: eth1
    pg_listen: '${ip},${vip},${lo}'
    patroni_ssl_enabled: true
    pgbouncer_sslmode: require
    pgbackrest_method: minio
    pg_libs: 'timescaledb, $libdir/passwordcheck, pg_stat_statements, auto_explain' # add passwordcheck extension to enforce strong password
    pg_default_roles:                 # default roles and users in postgres cluster
      - { name: dbrole_readonly  ,login: false ,comment: role for global read-only access     }
      - { name: dbrole_offline   ,login: false ,comment: role for restricted read-only access }
      - { name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly]               ,comment: role for global read-write access }
      - { name: dbrole_admin     ,login: false ,roles: [pg_monitor, dbrole_readwrite]  ,comment: role for object creation }
      - { name: postgres     ,superuser: true  ,expire_in: 7300                        ,comment: system superuser }
      - { name: replicator ,replication: true  ,expire_in: 7300 ,roles: [pg_monitor, dbrole_readonly]   ,comment: system replicator }
      - { name: dbuser_dba   ,superuser: true  ,expire_in: 7300 ,roles: [dbrole_admin]  ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 , comment: pgsql admin user }
      - { name: dbuser_monitor ,roles: [pg_monitor] ,expire_in: 7300 ,pgbouncer: true ,parameters: {log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment: pgsql monitor user }
    pg_default_hba_rules:             # postgres host-based auth rules by default
      - {user: '${dbsu}'    ,db: all         ,addr: local     ,auth: ident ,title: 'dbsu access via local os user ident'  }
      - {user: '${dbsu}'    ,db: replication ,addr: local     ,auth: ident ,title: 'dbsu replication from local os ident' }
      - {user: '${repl}'    ,db: replication ,addr: localhost ,auth: ssl   ,title: 'replicator replication from localhost'}
      - {user: '${repl}'    ,db: replication ,addr: intra     ,auth: ssl   ,title: 'replicator replication from intranet' }
      - {user: '${repl}'    ,db: postgres    ,addr: intra     ,auth: ssl   ,title: 'replicator postgres db from intranet' }
      - {user: '${monitor}' ,db: all         ,addr: localhost ,auth: pwd   ,title: 'monitor from localhost with password' }
      - {user: '${monitor}' ,db: all         ,addr: infra     ,auth: ssl   ,title: 'monitor from infra host with password'}
      - {user: '${admin}'   ,db: all         ,addr: infra     ,auth: ssl   ,title: 'admin @ infra nodes with pwd & ssl'   }
      - {user: '${admin}'   ,db: all         ,addr: world     ,auth: cert  ,title: 'admin @ everywhere with ssl & cert'   }
      - {user: '+dbrole_readonly',db: all    ,addr: localhost ,auth: ssl   ,title: 'pgbouncer read/write via local socket'}
      - {user: '+dbrole_readonly',db: all    ,addr: intra     ,auth: ssl   ,title: 'read/write biz user via password'     }
      - {user: '+dbrole_offline' ,db: all    ,addr: intra     ,auth: ssl   ,title: 'allow etl offline tasks from intranet'}
    pgb_default_hba_rules:            # pgbouncer host-based authentication rules
      - {user: '${dbsu}'    ,db: pgbouncer   ,addr: local     ,auth: peer  ,title: 'dbsu local admin access with os ident'}
      - {user: 'all'        ,db: all         ,addr: localhost ,auth: pwd   ,title: 'allow all user local access with pwd' }
      - {user: '${monitor}' ,db: pgbouncer   ,addr: intra     ,auth: ssl   ,title: 'monitor access via intranet with pwd' }
      - {user: '${monitor}' ,db: all         ,addr: world     ,auth: deny  ,title: 'reject all other monitor access addr' }
      - {user: '${admin}'   ,db: all         ,addr: intra     ,auth: ssl   ,title: 'admin access via intranet with pwd'   }
      - {user: '${admin}'   ,db: all         ,addr: world     ,auth: deny  ,title: 'reject all other admin access addr'   }
      - {user: 'all'        ,db: all         ,addr: intra     ,auth: ssl   ,title: 'allow all user intra access with pwd' }

# OPTIONAL delayed cluster for pg-meta
pg-meta-delay:                    # delayed instance for pg-meta (1 hour ago)
  hosts: { 10.10.10.13: { pg_seq: 1, pg_role: primary, pg_upstream: 10.10.10.10, pg_delay: 1h } }
  vars: { pg_cluster: pg-meta-delay }

Citus Distributed Cluster

Below is a declarative configuration for a four-node Citus distributed cluster:

all:
  children:
    pg-citus0: # citus coordinator, pg_group = 0
      hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus0 , pg_group: 0 }
    pg-citus1: # citus data node 1
      hosts: { 10.10.10.11: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus1 , pg_group: 1 }
    pg-citus2: # citus data node 2
      hosts: { 10.10.10.12: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus2 , pg_group: 2 }
    pg-citus3: # citus data node 3, with an extra replica
      hosts:
        10.10.10.13: { pg_seq: 1, pg_role: primary }
        10.10.10.14: { pg_seq: 2, pg_role: replica }
      vars: { pg_cluster: pg-citus3 , pg_group: 3 }
  vars:                               # global parameters for all citus clusters
    pg_mode: citus                    # pgsql cluster mode: citus
    pg_shard: pg-citus                # citus shard name: pg-citus
    patroni_citus_db: meta            # citus distributed database name
    pg_dbsu_password: DBUser.Postgres # all dbsu password access for citus cluster
    pg_users: [ { name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: meta ,extensions: [ { name: citus }, { name: postgis }, { name: timescaledb } ] } ]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32 ,auth: ssl ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra        ,auth: ssl ,title: 'all user ssl access from intranet'  }

Redis Clusters

Below are declarative configuration examples for Redis primary-replica cluster, sentinel cluster, and Redis Cluster:

redis-ms: # redis classic primary & replica
  hosts: { 10.10.10.10: { redis_node: 1 , redis_instances: { 6379: { }, 6380: { replica_of: '10.10.10.10 6379' } } } }
  vars: { redis_cluster: redis-ms ,redis_password: 'redis.ms' ,redis_max_memory: 64MB }

redis-meta: # redis sentinel x 3
  hosts: { 10.10.10.11: { redis_node: 1 , redis_instances: { 26379: { } ,26380: { } ,26381: { } } } }
  vars:
    redis_cluster: redis-meta
    redis_password: 'redis.meta'
    redis_mode: sentinel
    redis_max_memory: 16MB
    redis_sentinel_monitor: # primary list for redis sentinel, use cls as name, primary ip:port
      - { name: redis-ms, host: 10.10.10.10, port: 6379 ,password: redis.ms, quorum: 2 }

redis-test: # redis native cluster: 3m x 3s
  hosts:
    10.10.10.12: { redis_node: 1 ,redis_instances: { 6379: { } ,6380: { } ,6381: { } } }
    10.10.10.13: { redis_node: 2 ,redis_instances: { 6379: { } ,6380: { } ,6381: { } } }
  vars: { redis_cluster: redis-test ,redis_password: 'redis.test' ,redis_mode: cluster, redis_max_memory: 32MB }

ETCD Cluster

Below is a declarative configuration example for a three-node Etcd cluster:

etcd: # dcs service for postgres/patroni ha consensus
  hosts:  # 1 node for testing, 3 or 5 for production
    10.10.10.10: { etcd_seq: 1 }  # etcd_seq required
    10.10.10.11: { etcd_seq: 2 }  # assign from 1 ~ n
    10.10.10.12: { etcd_seq: 3 }  # odd number please
  vars: # cluster level parameter override roles/etcd
    etcd_cluster: etcd  # mark etcd cluster name etcd
    etcd_safeguard: false # safeguard against purging
    etcd_clean: true # purge etcd during init process

MinIO Cluster

Below is a declarative configuration example for a three-node MinIO cluster:

minio:
  hosts:
    10.10.10.10: { minio_seq: 1 }
    10.10.10.11: { minio_seq: 2 }
    10.10.10.12: { minio_seq: 3 }
  vars:
    minio_cluster: minio
    minio_data: '/data{1...2}'          # use two disks per node
    minio_node: '${minio_cluster}-${minio_seq}.pigsty' # node name pattern
    haproxy_services:
      - name: minio                     # [required] service name, must be unique
        port: 9002                      # [required] service port, must be unique
        options:
          - option httpchk
          - option http-keep-alive
          - http-check send meth OPTIONS uri /minio/health/live
          - http-check expect status 200
        servers:
          - { name: minio-1 ,ip: 10.10.10.10 , port: 9000 , options: 'check-ssl ca-file /etc/pki/ca.crt check port 9000' }
          - { name: minio-2 ,ip: 10.10.10.11 , port: 9000 , options: 'check-ssl ca-file /etc/pki/ca.crt check port 9000' }
          - { name: minio-3 ,ip: 10.10.10.12 , port: 9000 , options: 'check-ssl ca-file /etc/pki/ca.crt check port 9000' }

3.1 - Inventory

Describe your infrastructure and clusters using declarative configuration files

Every Pigsty deployment corresponds to an Inventory that describes key properties of the infrastructure and database clusters.


Configuration File

Pigsty uses Ansible YAML configuration format by default, with a single YAML configuration file pigsty.yml as the inventory.

~/pigsty
  ^---- pigsty.yml   # <---- Default configuration file

You can directly edit this configuration file to customize your deployment, or use the configure wizard script provided by Pigsty to automatically generate an appropriate configuration file.


Configuration Structure

The inventory uses standard Ansible YAML configuration format, consisting of two parts: global parameters (all.vars) and multiple groups (all.children).

You can define new clusters in all.children and describe the infrastructure using global variables: all.vars, which looks like this:

all:                  # Top-level object: all
  vars: {...}         # Global parameters
  children:           # Group definitions
    infra:            # Group definition: 'infra'
      hosts: {...}        # Group members: 'infra'
      vars:  {...}        # Group parameters: 'infra'
    etcd:    {...}    # Group definition: 'etcd'
    pg-meta: {...}    # Group definition: 'pg-meta'
    pg-test: {...}    # Group definition: 'pg-test'
    redis-test: {...} # Group definition: 'redis-test'
    # ...

Cluster Definition

Each Ansible group may represent a cluster, which can be a node cluster, PostgreSQL cluster, Redis cluster, Etcd cluster, MinIO cluster, etc.

A cluster definition consists of two parts: cluster members (hosts) and cluster parameters (vars). You can define cluster members in <cls>.hosts and describe the cluster using configuration parameters in <cls>.vars. Here’s an example of a 3-node high-availability PostgreSQL cluster definition:

all:
  children:    # Ansible group list
    pg-test:   # Ansible group name
      hosts:   # Ansible group instances (cluster members)
        10.10.10.11: { pg_seq: 1, pg_role: primary } # Host 1
        10.10.10.12: { pg_seq: 2, pg_role: replica } # Host 2
        10.10.10.13: { pg_seq: 3, pg_role: offline } # Host 3
      vars:    # Ansible group variables (cluster parameters)
        pg_cluster: pg-test

Cluster-level vars (cluster parameters) override global parameters, and instance-level vars override both cluster parameters and global parameters.


Splitting Configuration

If your deployment is large or you want to better organize configuration files, you can split the inventory into multiple files for easier management and maintenance.

inventory/
├── hosts.yml              # Host and cluster definitions
├── group_vars/
│   ├── all.yml            # Global default variables (corresponds to all.vars)
│   ├── infra.yml          # infra group variables
│   ├── etcd.yml           # etcd group variables
│   └── pg-meta.yml        # pg-meta cluster variables
└── host_vars/
    ├── 10.10.10.10.yml    # Specific host variables
    └── 10.10.10.11.yml

You can place cluster member definitions in the hosts.yml file and put cluster-level configuration parameters in corresponding files under the group_vars directory.


Switching Configuration

You can temporarily specify a different inventory file when running playbooks using the -i parameter.

./pgsql.yml -i another_config.yml
./infra.yml -i nginx_config.yml

Additionally, Ansible supports multiple configuration methods. You can use local yaml|ini configuration files, or use CMDB and any dynamic configuration scripts as configuration sources.

In Pigsty, we specify pigsty.yml in the same directory as the default inventory through ansible.cfg in the Pigsty home directory. You can modify it as needed.

[defaults]
inventory = pigsty.yml

Additionally, Pigsty supports using a CMDB metabase to store the inventory, facilitating integration with existing systems.

3.2 - Configure

Use the configure script to automatically generate recommended configuration files based on your environment.

Pigsty provides a configure script as a configuration wizard that automatically generates an appropriate pigsty.yml configuration file based on your current environment.

This is an optional script: if you already understand how to configure Pigsty, you can directly edit the pigsty.yml configuration file and skip the wizard.


Quick Start

Enter the pigsty source home directory and run ./configure to automatically start the configuration wizard. Without any arguments, it defaults to the meta single-node configuration template:

cd ~/pigsty
./configure          # Interactive configuration wizard, auto-detect environment and generate config

This command will use the selected template as a base, detect the current node’s IP address and region, and generate a pigsty.yml configuration file suitable for the current environment.

Features

The configure script performs the following adjustments based on environment and input, generating a pigsty.yml configuration file in the current directory.

  • Detects the current node IP address; if multiple IPs exist, prompts the user to input a primary IP address as the node’s identity
  • Uses the IP address to replace the placeholder 10.10.10.10 in the configuration template and sets it as the admin_ip parameter value
  • Detects the current region, setting region to default (global default repos) or china (using Chinese mirror repos)
  • For micro instances (vCPU < 4), uses the tiny parameter template for node_tune and pg_conf to optimize resource usage
  • If -v PG major version is specified, sets pg_version and all PG alias parameters to the corresponding major version
  • If -g is specified, replaces all default passwords with randomly generated strong passwords for enhanced security (strongly recommended)
  • When PG major version ≥ 17, prioritizes the built-in C.UTF-8 locale, or the OS-supported C.UTF-8
  • Checks if the core dependency ansible for deployment is available in the current environment
  • Also checks if the deployment target node is SSH-reachable and can execute commands with sudo (-s to skip)

Usage Examples

# Basic usage
./configure                       # Interactive configuration wizard
./configure -i 10.10.10.10        # Specify primary IP address

# Specify configuration template
./configure -c meta               # Use default single-node template (default)
./configure -c rich               # Use feature-rich single-node template
./configure -c slim               # Use minimal template (PGSQL + ETCD only)
./configure -c ha/full            # Use 4-node HA sandbox template
./configure -c ha/trio            # Use 3-node HA template
./configure -c app/supa           # Use Supabase self-hosted template

# Specify PostgreSQL version
./configure -v 17                 # Use PostgreSQL 17
./configure -v 16                 # Use PostgreSQL 16
./configure -c rich -v 16         # rich template + PG 16

# Region and proxy
./configure -r china              # Use Chinese mirrors
./configure -r europe             # Use European mirrors
./configure -x                    # Import current proxy environment variables

# Skip and automation
./configure -s                    # Skip IP detection, keep placeholder
./configure -n -i 10.10.10.10     # Non-interactive mode with specified IP
./configure -c ha/full -s         # 4-node template, skip IP replacement

# Security enhancement
./configure -g                    # Generate random passwords
./configure -c meta -g -i 10.10.10.10  # Complete production configuration

# Specify output and SSH port
./configure -o prod.yml           # Output to prod.yml
./configure -p 2222               # Use SSH port 2222

Command Arguments

./configure
    [-c|--conf <template>]      # Configuration template name (meta|rich|slim|ha/full|...)
    [-i|--ip <ipaddr>]          # Specify primary IP address
    [-v|--version <pgver>]      # PostgreSQL major version (13|14|15|16|17|18)
    [-r|--region <region>]      # Upstream software repo region (default|china|europe)
    [-o|--output <file>]        # Output configuration file path (default: pigsty.yml)
    [-s|--skip]                 # Skip IP address detection and replacement
    [-x|--proxy]                # Import proxy settings from environment variables
    [-n|--non-interactive]      # Non-interactive mode (don't ask any questions)
    [-p|--port <port>]          # Specify SSH port
    [-g|--generate]             # Generate random passwords
    [-h|--help]                 # Display help information

Argument Details

ArgumentDescription
-c, --confGenerate config from conf/<template>.yml, supports subdirectories like ha/full
-i, --ipReplace placeholder 10.10.10.10 in config template with specified IP
-v, --versionSpecify PostgreSQL major version (13-18), keeps template default if not specified
-r, --regionSet software repo mirror region: default, china (Chinese mirrors), europe (European)
-o, --outputSpecify output file path, defaults to pigsty.yml
-s, --skipSkip IP address detection and replacement, keep 10.10.10.10 placeholder in template
-x, --proxyWrite current environment proxy variables (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY) to config
-n, --non-interactiveNon-interactive mode, don’t ask any questions (requires -i to specify IP)
-p, --portSpecify SSH port (when using non-default port 22)
-g, --generateGenerate random values for passwords in config file, improving security (strongly recommended)

Execution Flow

The configure script executes detection and configuration in the following order:

┌─────────────────────────────────────────────────────────────┐
│                  configure Execution Flow                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. check_region          Detect network region (GFW check) │
│         ↓                                                   │
│  2. check_version         Validate PostgreSQL version       │
│         ↓                                                   │
│  3. check_kernel          Detect OS kernel (Linux/Darwin)   │
│         ↓                                                   │
│  4. check_machine         Detect CPU arch (x86_64/aarch64)  │
│         ↓                                                   │
│  5. check_package_manager Detect package manager (dnf/yum/apt) │
│         ↓                                                   │
│  6. check_vendor_version  Detect OS distro and version      │
│         ↓                                                   │
│  7. check_sudo            Detect passwordless sudo          │
│         ↓                                                   │
│  8. check_ssh             Detect passwordless SSH to self   │
│         ↓                                                   │
│  9. check_proxy           Handle proxy environment vars     │
│         ↓                                                   │
│ 10. check_ipaddr          Detect/input primary IP address   │
│         ↓                                                   │
│ 11. check_admin           Validate admin SSH + Sudo access  │
│         ↓                                                   │
│ 12. check_conf            Select configuration template     │
│         ↓                                                   │
│ 13. check_config          Generate configuration file       │
│         ↓                                                   │
│ 14. check_utils           Check if Ansible etc. installed   │
│         ↓                                                   │
│     ✓ Configuration complete, output pigsty.yml             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Automatic Behaviors

Region Detection

The script automatically detects the network environment to determine if you’re in mainland China (behind GFW):

# Check network environment by accessing Google
curl -I -s --connect-timeout 1 www.google.com
  • If Google is inaccessible, automatically sets region: china to use domestic mirrors
  • If accessible, uses region: default default mirrors
  • Can manually specify region via -r argument

IP Address Handling

The script determines the primary IP address in the following priority:

  1. Command line argument: If IP is specified via -i, use it directly
  2. Single IP detection: If the current node has only one IP, use it automatically
  3. Demo IP detection: If 10.10.10.10 is detected, select it automatically (for sandbox environments)
  4. Interactive input: When multiple IPs exist, prompt user to choose or input
[WARN] Multiple IP address candidates found:
    (1) 192.168.1.100   inet 192.168.1.100/24 scope global eth0
    (2) 10.10.10.10     inet 10.10.10.10/24 scope global eth1
[ IN ] INPUT primary_ip address (of current meta node, e.g 10.10.10.10):
=> 10.10.10.10

Low-End Hardware Optimization

When CPU core count ≤ 4 is detected, the script automatically adjusts configuration:

[WARN] replace oltp template with tiny due to cpu < 4

This ensures smooth operation on low-spec virtual machines.

Locale Settings

The script automatically enables C.UTF-8 as the default locale when:

  • PostgreSQL version ≥ 17 (built-in Locale Provider support)
  • Or the current system supports C.UTF-8 / C.utf8 locale
pg_locale: C.UTF-8
pg_lc_collate: C.UTF-8
pg_lc_ctype: C.UTF-8

China Region Special Handling

When region is set to china, the script automatically:

  • Enables docker_registry_mirrors Docker mirror acceleration
  • Enables PIP_MIRROR_URL Python mirror acceleration

Password Generation

When using the -g argument, the script generates 24-character random strings for the following passwords:

Password ParameterDescription
grafana_admin_passwordGrafana admin password
pg_admin_passwordPostgreSQL admin password
pg_monitor_passwordPostgreSQL monitor user password
pg_replication_passwordPostgreSQL replication user password
patroni_passwordPatroni API password
haproxy_admin_passwordHAProxy admin password
minio_secret_keyMinIO Secret Key
etcd_root_passwordETCD Root password

It also replaces the following placeholder passwords:

  • DBUser.Meta → random password
  • DBUser.Viewer → random password
  • S3User.Backup → random password
  • S3User.Meta → random password
  • S3User.Data → random password
$ ./configure -g
[INFO] generating random passwords...
    grafana_admin_password   : xK9mL2nP4qR7sT1vW3yZ5bD8
    pg_admin_password        : aB3cD5eF7gH9iJ1kL2mN4oP6
    ...
[INFO] random passwords generated, check and save them

Configuration Templates

The script reads configuration templates from the conf/ directory, supporting the following templates:

Core Templates

TemplateDescription
metaDefault template: Single-node installation with INFRA + NODE + ETCD + PGSQL
richFeature-rich version: Includes almost all extensions, MinIO, local repo
slimMinimal version: PostgreSQL + ETCD only, no monitoring infrastructure
fatComplete version: rich base with more extensions installed
pgsqlPure PostgreSQL template
infraPure infrastructure template

HA Templates (ha/)

TemplateDescription
ha/dual2-node HA cluster
ha/trio3-node HA cluster
ha/full4-node complete sandbox environment
ha/safeSecurity-hardened HA configuration
ha/simu42-node large-scale simulation environment

Application Templates (app/)

TemplateDescription
supabaseSupabase self-hosted configuration
app/difyDify AI platform configuration
app/odooOdoo ERP configuration
app/teableTeable table database configuration
app/registryDocker Registry configuration

Special Kernel Templates

TemplateDescription
ivoryIvorySQL: Oracle-compatible PostgreSQL
mssqlBabelfish: SQL Server-compatible PostgreSQL
polarPolarDB: Alibaba Cloud open-source distributed PostgreSQL
citusCitus: Distributed PostgreSQL
orioleOrioleDB: Next-generation storage engine

Demo Templates (demo/)

TemplateDescription
demo/demoDemo environment configuration
demo/redisRedis cluster demo
demo/minioMinIO cluster demo

Output Example

$ ./configure
configure pigsty v4.0.0 begin
[ OK ] region = china
[ OK ] kernel  = Linux
[ OK ] machine = x86_64
[ OK ] package = rpm,dnf
[ OK ] vendor  = rocky (Rocky Linux)
[ OK ] version = 9 (9.5)
[ OK ] sudo = vagrant ok
[ OK ] ssh = vagrant@127.0.0.1 ok
[WARN] Multiple IP address candidates found:
    (1) 192.168.121.193	    inet 192.168.121.193/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0
    (2) 10.10.10.10	    inet 10.10.10.10/24 brd 10.10.10.255 scope global noprefixroute eth1
[ OK ] primary_ip = 10.10.10.10 (from demo)
[ OK ] admin = vagrant@10.10.10.10 ok
[ OK ] mode = meta (el9)
[ OK ] locale  = C.UTF-8
[ OK ] ansible = ready
[ OK ] pigsty configured
[WARN] don't forget to check it and change passwords!
proceed with ./deploy.yml

Environment Variables

The script supports the following environment variables:

Environment VariableDescriptionDefault
PIGSTY_HOMEPigsty installation directory~/pigsty
METADB_URLMetabase connection URLservice=meta
HTTP_PROXYHTTP proxy-
HTTPS_PROXYHTTPS proxy-
ALL_PROXYUniversal proxy-
NO_PROXYProxy whitelistBuilt-in default

Notes

  1. Passwordless access: Before running configure, ensure the current user has passwordless sudo privileges and passwordless SSH to localhost. This can be automatically configured via the bootstrap script.

  2. IP address selection: Choose an internal IP as the primary IP address, not a public IP or 127.0.0.1.

  3. Password security: In production environments, always modify default passwords in the configuration file, or use the -g argument to generate random passwords.

  4. Configuration review: After the script completes, it’s recommended to review the generated pigsty.yml file to confirm the configuration meets expectations.

  5. Multiple executions: You can run configure multiple times to regenerate configuration; each run will overwrite the existing pigsty.yml.

  6. macOS limitations: When running on macOS, the script skips some Linux-specific checks and uses placeholder IP 10.10.10.10. macOS can only serve as an admin node.


FAQ

How to use a custom configuration template?

Place your configuration file in the conf/ directory, then specify it with the -c argument:

cp my-config.yml ~/pigsty/conf/myconf.yml
./configure -c myconf

How to generate different configurations for multiple clusters?

Use the -o argument to specify different output files:

./configure -c ha/full -o cluster-a.yml
./configure -c ha/trio -o cluster-b.yml

Then specify the configuration file when running playbooks:

./deploy.yml -i cluster-a.yml

How to handle multiple IPs in non-interactive mode?

You must explicitly specify the IP address using the -i argument:

./configure -n -i 10.10.10.10

How to keep the placeholder IP in the template?

Use the -s argument to skip IP replacement:

./configure -c ha/full -s   # Keep 10.10.10.10 placeholder

  • Inventory: Understand the Ansible inventory structure
  • Parameters: Understand Pigsty parameter hierarchy and priority
  • Templates: View all available configuration templates
  • Installation: Understand the complete installation process
  • Metabase: Use PostgreSQL as a dynamic configuration source

3.3 - Parameters

Fine-tune Pigsty customization using configuration parameters

In the inventory, you can use various parameters to fine-tune Pigsty customization. These parameters cover everything from infrastructure settings to database configuration.


Parameter List

Pigsty provides approximately 380+ configuration parameters distributed across 8 default modules for fine-grained control of various system aspects. See Reference - Parameter List for the complete list.

ModuleGroupsParamsDescription
PGSQL9123Core configuration for PostgreSQL database clusters
INFRA1082Infrastructure: repos, Nginx, DNS, monitoring, Grafana, etc.
NODE1183Host node tuning: identity, DNS, packages, tuning, security, admin, time, VIP, etc.
ETCD213Distributed configuration store and service discovery
REDIS121Redis cache and data structure server
MINIO221S3-compatible object storage service
FERRET19MongoDB-compatible database FerretDB
DOCKER18Docker container engine

Parameter Form

Parameters are key-value pairs that describe entities. The Key is a string, and the Value can be one of five types: boolean, string, number, array, or object.

all:                            # <------- Top-level object: all
  vars:
    admin_ip: 10.10.10.10       # <------- Global configuration parameter
  children:
    pg-meta:                    # <------- pg-meta group
      vars:
        pg_cluster: pg-meta     # <------- Cluster-level parameter
      hosts:
        10.10.10.10:            # <------- Host node IP
          pg_seq: 1
          pg_role: primary      # <------- Instance-level parameter

Parameter Priority

Parameters can be set at different levels with the following priority:

LevelLocationDescriptionPriority
CLI-e command line argumentPassed via command lineHighest (5)
Host/Instance<group>.hosts.<host>Parameters specific to a single hostHigher (4)
Group/Cluster<group>.varsParameters shared by hosts in group/clusterMedium (3)
Globalall.varsParameters shared by all hostsLower (2)
Default<roles>/default/main.ymlRole implementation defaultsLowest (1)

Here are some examples of parameter priority:

  • Use command line parameter -e grafana_clean=true when running playbooks to wipe Grafana data
  • Use instance-level parameter pg_role on host variables to override pg instance role
  • Use cluster-level parameter pg_cluster on group variables to override pg cluster name
  • Use global parameter node_ntp_servers on global variables to specify global NTP servers
  • If pg_version is not set, Pigsty will use the default value from the pgsql role implementation (default is 18)

Except for identity parameters, every parameter has an appropriate default value, so explicit setting is not required.


Identity Parameters

Identity parameters are special parameters that serve as entity ID identifiers, therefore they have no default values and must be explicitly set.

ModuleIdentity Parameters
PGSQLpg_cluster, pg_seq, pg_role, …
NODEnodename, node_cluster
ETCDetcd_cluster, etcd_seq
MINIOminio_cluster, minio_seq
REDISredis_cluster, redis_node, redis_instances
INFRAinfra_seq

Exceptions are etcd_cluster and minio_cluster which have default values. This assumes each deployment has only one etcd cluster for DCS and one optional MinIO cluster for centralized backup storage, so they are assigned default cluster names etcd and minio. However, you can still deploy multiple etcd or MinIO clusters using different names.

3.4 - Conf Templates

Use pre-made configuration templates to quickly generate configuration files adapted to your environment

In Pigsty, deployment blueprint details are defined by the inventory, which is the pigsty.yml configuration file. You can customize it through declarative configuration.

However, writing configuration files directly can be daunting for new users. To address this, we provide some ready-to-use configuration templates covering common usage scenarios.

Each template is a predefined pigsty.yml configuration file containing reasonable defaults suitable for specific scenarios.

You can choose a template as your customization starting point, then modify it as needed to meet your specific requirements.


Using Templates

Pigsty provides the configure script as an optional configuration wizard that generates an inventory with good defaults based on your environment and input.

Use ./configure -c <conf> to specify a configuration template, where <conf> is the path relative to the conf directory (the .yml suffix can be omitted).

./configure                     # Default to meta.yml configuration template
./configure -c meta             # Explicitly specify meta.yml single-node template
./configure -c rich             # Use feature-rich template with all extensions and MinIO
./configure -c slim             # Use minimal single-node template

# Use different database kernels
./configure -c pgsql            # Native PostgreSQL kernel, basic features (13~18)
./configure -c citus            # Citus distributed HA PostgreSQL (14~17)
./configure -c mssql            # Babelfish kernel, SQL Server protocol compatible (15)
./configure -c polar            # PolarDB PG kernel, Aurora/RAC style (15)
./configure -c ivory            # IvorySQL kernel, Oracle syntax compatible (18)
./configure -c mysql            # OpenHalo kernel, MySQL compatible (14)
./configure -c pgtde            # Percona PostgreSQL Server transparent encryption (18)
./configure -c oriole           # OrioleDB kernel, OLTP enhanced (17)
./configure -c supabase         # Supabase self-hosted configuration (15~18)

# Use multi-node HA templates
./configure -c ha/dual          # Use 2-node HA template
./configure -c ha/trio          # Use 3-node HA template
./configure -c ha/full          # Use 4-node HA template

If no template is specified, Pigsty defaults to the meta.yml single-node configuration template.


Template List

Main Templates

The following are single-node configuration templates for installing Pigsty on a single server:

TemplateDescription
meta.ymlDefault template, single-node PostgreSQL online installation
rich.ymlFeature-rich template with local repo, MinIO, and more examples
slim.ymlMinimal template, PostgreSQL only without monitoring and infrastructure

Database Kernel Templates

Templates for various database management systems and kernels:

TemplateDescription
pgsql.ymlNative PostgreSQL kernel, basic features (13~18)
citus.ymlCitus distributed HA PostgreSQL (14~17)
mssql.ymlBabelfish kernel, SQL Server protocol compatible (15)
polar.ymlPolarDB PG kernel, Aurora/RAC style (15)
ivory.ymlIvorySQL kernel, Oracle syntax compatible (17)
mysql.ymlOpenHalo kernel, MySQL compatible (14)
pgtde.ymlPercona PostgreSQL Server transparent encryption (17)
oriole.ymlOrioleDB kernel, OLTP enhanced (17, Debian pkg pending)
supabase.ymlSupabase self-hosted configuration (15~17)

You can add more nodes later or use HA templates to plan your cluster from the start.


HA Templates

You can configure Pigsty to run on multiple nodes, forming a high-availability (HA) cluster:

TemplateDescription
dual.yml2-node semi-HA deployment
trio.yml3-node standard HA deployment
full.yml4-node standard deployment
safe.yml4-node security-enhanced deployment with delayed replica
simu.yml20-node production environment simulation

Application Templates

You can use the following templates to run Docker applications/software:

TemplateDescription
supa.ymlStart single-node Supabase
odoo.ymlStart Odoo ERP system
dify.ymlStart Dify AI workflow system
electric.ymlStart Electric sync engine

Demo Templates

Besides main templates, Pigsty provides a set of demo templates for different scenarios:

TemplateDescription
el.ymlFull-parameter config file for EL 8/9 systems
debian.ymlFull-parameter config file for Debian/Ubuntu systems
remote.ymlExample config for monitoring remote PostgreSQL clusters or RDS
redis.ymlRedis cluster example configuration
minio.yml3-node MinIO cluster example configuration
demo.ymlConfiguration file for Pigsty public demo site

Build Templates

The following configuration templates are for development and testing purposes:

TemplateDescription
build.ymlOpen source build config for EL 9/10, Debian 12/13, Ubuntu 22.04/24.04

3.5 - Use CMDB as Config Inventory

Use PostgreSQL as a CMDB metabase to store Ansible inventory.

Pigsty allows you to use a PostgreSQL metabase as a dynamic configuration source, replacing static YAML configuration files for more powerful configuration management capabilities.


Overview

CMDB (Configuration Management Database) is a method of storing configuration information in a database for management.

In Pigsty, the default configuration source is a static YAML file pigsty.yml, which serves as Ansible’s inventory.

This approach is simple and direct, but when infrastructure scales and requires complex, fine-grained management and external integration, a single static file becomes insufficient.

FeatureStatic YAML FileCMDB Metabase
QueryingManual search/grepSQL queries with any conditions, aggregation analysis
VersioningDepends on Git or manual backupDatabase transactions, audit logs, time-travel snapshots
Access ControlFile system permissions, coarse-grainedPostgreSQL fine-grained access control
Concurrent EditingRequires file locking or merge conflictsDatabase transactions naturally support concurrency
External IntegrationRequires YAML parsingStandard SQL interface, easy integration with any language
ScalabilityDifficult to maintain when file becomes too largeScales to physical limits
Dynamic GenerationStatic file, changes require manual applicationImmediate effect, real-time configuration changes

Pigsty provides the CMDB database schema in the sample database pg-meta.meta schema baseline definition.


How It Works

The core idea of CMDB is to replace the static configuration file with a dynamic script. Ansible supports using executable scripts as inventory, as long as the script outputs inventory data in JSON format. When you enable CMDB, Pigsty creates a dynamic inventory script named inventory.sh:

#!/bin/bash
psql ${METADB_URL} -AXtwc 'SELECT text FROM pigsty.inventory;'

This script’s function is simple: every time Ansible needs to read the inventory, it queries configuration data from the PostgreSQL database’s pigsty.inventory view and returns it in JSON format.

The overall architecture is as follows:

flowchart LR
    conf["bin/inventory_conf"]
    tocmdb["bin/inventory_cmdb"]
    load["bin/inventory_load"]
    ansible["🚀 Ansible"]

    subgraph static["📄 Static Config Mode"]
        yml[("pigsty.yml")]
    end

    subgraph dynamic["🗄️ CMDB Dynamic Mode"]
        sh["inventory.sh"]
        cmdb[("PostgreSQL CMDB")]
    end

    conf -->|"switch"| yml
    yml -->|"load config"| load
    load -->|"write"| cmdb
    tocmdb -->|"switch"| sh
    sh --> cmdb

    yml --> ansible
    cmdb --> ansible

Data Model

The CMDB database schema is defined in files/cmdb.sql, with all objects in the pigsty schema.

Core Tables

TableDescriptionPrimary Key
pigsty.groupCluster/group definitions, corresponds to Ansible groupscls
pigsty.hostHost definitions, belongs to a group(cls, ip)
pigsty.global_varGlobal variables, corresponds to all.varskey
pigsty.group_varGroup variables, corresponds to all.children.<cls>.vars(cls, key)
pigsty.host_varHost variables, host-level variables(cls, ip, key)
pigsty.default_varDefault variable definitions, stores parameter metadatakey
pigsty.jobJob records table, records executed tasksid

Table Structure Details

Cluster Table pigsty.group

CREATE TABLE pigsty.group (
    cls     TEXT PRIMARY KEY,        -- Cluster name, primary key
    ctime   TIMESTAMPTZ DEFAULT now(), -- Creation time
    mtime   TIMESTAMPTZ DEFAULT now()  -- Modification time
);

Host Table pigsty.host

CREATE TABLE pigsty.host (
    cls    TEXT NOT NULL REFERENCES pigsty.group(cls),  -- Parent cluster
    ip     INET NOT NULL,                               -- Host IP address
    ctime  TIMESTAMPTZ DEFAULT now(),
    mtime  TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (cls, ip)
);

Global Variables Table pigsty.global_var

CREATE TABLE pigsty.global_var (
    key   TEXT PRIMARY KEY,           -- Variable name
    value JSONB NULL,                 -- Variable value (JSON format)
    mtime TIMESTAMPTZ DEFAULT now()   -- Modification time
);

Group Variables Table pigsty.group_var

CREATE TABLE pigsty.group_var (
    cls   TEXT NOT NULL REFERENCES pigsty.group(cls),
    key   TEXT NOT NULL,
    value JSONB NULL,
    mtime TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (cls, key)
);

Host Variables Table pigsty.host_var

CREATE TABLE pigsty.host_var (
    cls   TEXT NOT NULL,
    ip    INET NOT NULL,
    key   TEXT NOT NULL,
    value JSONB NULL,
    mtime TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (cls, ip, key),
    FOREIGN KEY (cls, ip) REFERENCES pigsty.host(cls, ip)
);

Core Views

CMDB provides a series of views for querying and displaying configuration data:

ViewDescription
pigsty.inventoryCore view: Generates Ansible dynamic inventory JSON
pigsty.raw_configRaw configuration in JSON format
pigsty.global_configGlobal config view, merges defaults and global vars
pigsty.group_configGroup config view, includes host list and group vars
pigsty.host_configHost config view, merges group and host-level vars
pigsty.pg_clusterPostgreSQL cluster view
pigsty.pg_instancePostgreSQL instance view
pigsty.pg_databasePostgreSQL database definition view
pigsty.pg_usersPostgreSQL user definition view
pigsty.pg_servicePostgreSQL service definition view
pigsty.pg_hbaPostgreSQL HBA rules view
pigsty.pg_remoteRemote PostgreSQL instance view

pigsty.inventory is the core view that converts database configuration data to the JSON format required by Ansible:

SELECT text FROM pigsty.inventory;

Utility Scripts

Pigsty provides three convenience scripts for managing CMDB:

ScriptFunction
bin/inventory_loadLoad YAML configuration file into PostgreSQL database
bin/inventory_cmdbSwitch configuration source to CMDB (dynamic inventory script)
bin/inventory_confSwitch configuration source to static config file pigsty.yml

inventory_load

Parse and import YAML configuration file into CMDB:

bin/inventory_load                     # Load default pigsty.yml to default CMDB
bin/inventory_load -p /path/to/conf.yml  # Specify configuration file path
bin/inventory_load -d "postgres://..."   # Specify database connection URL
bin/inventory_load -n myconfig           # Specify configuration name

The script performs the following operations:

  1. Clears existing data in the pigsty schema
  2. Parses the YAML configuration file
  3. Writes global variables to the global_var table
  4. Writes cluster definitions to the group table
  5. Writes cluster variables to the group_var table
  6. Writes host definitions to the host table
  7. Writes host variables to the host_var table

Environment Variables

  • PIGSTY_HOME: Pigsty installation directory, defaults to ~/pigsty
  • METADB_URL: Database connection URL, defaults to service=meta

inventory_cmdb

Switch Ansible to use CMDB as the configuration source:

bin/inventory_cmdb

The script performs the following operations:

  1. Creates dynamic inventory script ${PIGSTY_HOME}/inventory.sh
  2. Modifies ansible.cfg to set inventory to inventory.sh

The generated inventory.sh contents:

#!/bin/bash
psql ${METADB_URL} -AXtwc 'SELECT text FROM pigsty.inventory;'

inventory_conf

Switch back to using static YAML configuration file:

bin/inventory_conf

The script modifies ansible.cfg to set inventory back to pigsty.yml.


Usage Workflow

First-time CMDB Setup

  1. Initialize CMDB schema (usually done automatically during Pigsty installation):
psql -f ~/pigsty/files/cmdb.sql
  1. Load configuration to database:
bin/inventory_load
  1. Switch to CMDB mode:
bin/inventory_cmdb
  1. Verify configuration:
ansible all --list-hosts          # List all hosts
ansible-inventory --list          # View complete inventory

Query Configuration

After enabling CMDB, you can flexibly query configuration using SQL:

-- View all clusters
SELECT cls FROM pigsty.group;

-- View all hosts in a cluster
SELECT ip FROM pigsty.host WHERE cls = 'pg-meta';

-- View global variables
SELECT key, value FROM pigsty.global_var;

-- View cluster variables
SELECT key, value FROM pigsty.group_var WHERE cls = 'pg-meta';

-- View all PostgreSQL clusters
SELECT cls, name, pg_databases, pg_users FROM pigsty.pg_cluster;

-- View all PostgreSQL instances
SELECT cls, ins, ip, seq, role FROM pigsty.pg_instance;

-- View all database definitions
SELECT cls, datname, owner, encoding FROM pigsty.pg_database;

-- View all user definitions
SELECT cls, name, login, superuser FROM pigsty.pg_users;

Modify Configuration

You can modify configuration directly via SQL:

-- Add new cluster
INSERT INTO pigsty.group (cls) VALUES ('pg-new');

-- Add cluster variable
INSERT INTO pigsty.group_var (cls, key, value)
VALUES ('pg-new', 'pg_cluster', '"pg-new"');

-- Add host
INSERT INTO pigsty.host (cls, ip) VALUES ('pg-new', '10.10.10.20');

-- Add host variables
INSERT INTO pigsty.host_var (cls, ip, key, value)
VALUES ('pg-new', '10.10.10.20', 'pg_seq', '1'),
       ('pg-new', '10.10.10.20', 'pg_role', '"primary"');

-- Modify global variable
UPDATE pigsty.global_var SET value = '"new-value"' WHERE key = 'some_param';

-- Delete cluster (cascades to hosts and variables)
DELETE FROM pigsty.group WHERE cls = 'pg-old';

Changes take effect immediately without reloading or restarting any service.

Switch Back to Static Configuration

To switch back to static configuration file mode:

bin/inventory_conf

Advanced Usage

Export Configuration

Export CMDB configuration to YAML format:

psql service=meta -AXtwc "SELECT jsonb_pretty(jsonb_build_object('all', jsonb_build_object('children', children, 'vars', vars))) FROM pigsty.raw_config;"

Or use the ansible-inventory command:

ansible-inventory --list --yaml > exported_config.yml

Configuration Auditing

Track configuration changes using the mtime field:

-- View recently modified global variables
SELECT key, value, mtime FROM pigsty.global_var
ORDER BY mtime DESC LIMIT 10;

-- View changes after a specific time
SELECT * FROM pigsty.group_var
WHERE mtime > '2024-01-01'::timestamptz;

Integration with External Systems

CMDB uses standard PostgreSQL, making it easy to integrate with other systems:

  • Web Management Interface: Expose configuration data through REST API (e.g., PostgREST)
  • CI/CD Pipelines: Read/write database directly in deployment scripts
  • Monitoring & Alerting: Generate monitoring rules based on configuration data
  • ITSM Systems: Sync with enterprise CMDB systems

Considerations

  1. Data Consistency: After modifying configuration, you need to re-run the corresponding Ansible playbooks to apply changes to the actual environment

  2. Backup: Configuration data in CMDB is critical, ensure regular backups

  3. Permissions: Configure appropriate database access permissions for CMDB to avoid accidental modifications

  4. Transactions: When making batch configuration changes, perform them within a transaction for rollback on errors

  5. Connection Pooling: The inventory.sh script creates a new connection on each execution; if Ansible runs frequently, consider using connection pooling


Summary

CMDB is Pigsty’s advanced configuration management solution, suitable for scenarios requiring large-scale cluster management, complex queries, external integration, or fine-grained access control. By storing configuration data in PostgreSQL, you can fully leverage the database’s powerful capabilities to manage infrastructure configuration.

FeatureDescription
StoragePostgreSQL pigsty schema
Dynamic Inventoryinventory.sh script
Config Loadbin/inventory_load
Switch to CMDBbin/inventory_cmdb
Switch to YAMLbin/inventory_conf
Core Viewpigsty.inventory

4 - High Availability

Pigsty uses Patroni to implement PostgreSQL high availability, ensuring automatic failover when the primary becomes unavailable.

Overview

Pigsty’s PostgreSQL clusters come with out-of-the-box high availability, powered by Patroni, Etcd, and HAProxy.

When your PostgreSQL cluster has two or more instances, you automatically have self-healing database high availability without any additional configuration — as long as any instance in the cluster survives, the cluster can provide complete service. Clients only need to connect to any node in the cluster to get full service without worrying about primary-replica topology changes.

With default configuration, the primary failure Recovery Time Objective (RTO) ≈ 30s, and Recovery Point Objective (RPO) < 1MB; for replica failures, RPO = 0 and RTO ≈ 0 (brief interruption). In consistency-first mode, failover can guarantee zero data loss: RPO = 0. All these metrics can be configured as needed based on your actual hardware conditions and reliability requirements.

Pigsty includes built-in HAProxy load balancers for automatic traffic switching, providing DNS/VIP/LVS and other access methods for clients. Failover and switchover are almost transparent to the business side except for brief interruptions - applications don’t need to modify connection strings or restart. The minimal maintenance window requirements bring great flexibility and convenience: you can perform rolling maintenance and upgrades on the entire cluster without application coordination. The feature that hardware failures can wait until the next day to handle lets developers, operations, and DBAs sleep well during incidents.

pigsty-ha

Many large organizations and core institutions have been using Pigsty in production for extended periods. The largest deployment has 25K CPU cores and 220+ PostgreSQL ultra-large instances (64c / 512g / 3TB NVMe SSD). In this deployment case, dozens of hardware failures and various incidents occurred over five years, yet overall availability of over 99.999% was maintained.


What problems does High Availability solve?

  • Elevates data security C/IA availability to a new level: RPO ≈ 0, RTO < 30s.
  • Gains seamless rolling maintenance capability, minimizing maintenance window requirements and bringing great convenience.
  • Hardware failures can self-heal immediately without human intervention, allowing operations and DBAs to sleep well.
  • Replicas can handle read-only requests, offloading primary load and fully utilizing resources.

What are the costs of High Availability?

  • Infrastructure dependency: HA requires DCS (etcd/zk/consul) for consensus.
  • Higher starting threshold: A meaningful HA deployment requires at least three nodes.
  • Extra resource consumption: Each new replica consumes additional resources, though this is usually not a major concern.
  • Significantly increased complexity: Backup costs increase significantly, requiring tools to manage complexity.

Limitations of High Availability

Since replication happens in real-time, all changes are immediately applied to replicas. Therefore, streaming replication-based HA solutions cannot handle data deletion or modification caused by human errors and software defects. (e.g., DROP TABLE or DELETE data) Such failures require using delayed clusters or performing point-in-time recovery using previous base backups and WAL archives.

Configuration StrategyRTORPO
Standalone + Nothing Data permanently lost, unrecoverable All data lost
Standalone + Base Backup Depends on backup size and bandwidth (hours) Lose data since last backup (hours to days)
Standalone + Base Backup + WAL Archive Depends on backup size and bandwidth (hours) Lose unarchived data (tens of MB)
Primary-Replica + Manual Failover ~10 minutes Lose data in replication lag (~100KB)
Primary-Replica + Auto Failover Within 1 minute Lose data in replication lag (~100KB)
Primary-Replica + Auto Failover + Sync Commit Within 1 minute No data loss

How It Works

In Pigsty, the high availability architecture works as follows:

  • PostgreSQL uses standard streaming replication to build physical replicas; replicas take over when the primary fails.
  • Patroni manages PostgreSQL server processes and handles high availability matters.
  • Etcd provides distributed configuration storage (DCS) capability and is used for leader election after failures.
  • Patroni relies on Etcd to reach cluster leader consensus and provides health check interfaces externally.
  • HAProxy exposes cluster services externally and uses Patroni health check interfaces to automatically distribute traffic to healthy nodes.
  • vip-manager provides an optional Layer 2 VIP, retrieves leader information from Etcd, and binds the VIP to the node where the cluster primary resides.

When the primary fails, a new round of leader election is triggered. The healthiest replica in the cluster (highest LSN position, minimum data loss) wins and is promoted to the new primary. After the winning replica is promoted, read-write traffic is immediately routed to the new primary. The impact of primary failure is brief write service unavailability: write requests will be blocked or fail directly from primary failure until new primary promotion, with unavailability typically lasting 15 to 30 seconds, usually not exceeding 1 minute.

When a replica fails, read-only traffic is routed to other replicas. Only when all replicas fail will read-only traffic ultimately be handled by the primary. The impact of replica failure is partial read-only query interruption: queries currently running on that replica will abort due to connection reset and be immediately taken over by other available replicas.

Failure detection is performed jointly by Patroni and Etcd. The cluster leader holds a lease; if the cluster leader fails to renew the lease in time (10s) due to failure, the lease is released, triggering a Failover and new cluster election.

Even without any failures, you can proactively change the cluster primary through Switchover. In this case, write queries on the primary will experience a brief interruption and be immediately routed to the new primary. This operation is typically used for rolling maintenance/upgrades of database servers.


Tradeoffs

Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are two parameters that require careful tradeoffs when designing high availability clusters.

The default RTO and RPO values used by Pigsty meet reliability requirements for most scenarios. You can adjust them based on your hardware level, network quality, and business requirements.

The upper limit of unavailability during failover is controlled by the pg_rto parameter. RTO defaults to 30s. Increasing it will result in longer primary failure write unavailability, while decreasing it will increase the rate of false positive failovers (e.g., repeated switching due to brief network jitter).

The upper limit of potential data loss is controlled by the pg_rpo parameter, defaulting to 1MB. Reducing this value can lower the data loss ceiling during failover but also increases the probability of refusing automatic failover when replicas are not healthy enough (lagging too far behind).

Pigsty uses availability-first mode by default, meaning it will failover as quickly as possible when the primary fails, and data not yet replicated to replicas may be lost (under typical 10GbE networks, replication lag is usually a few KB to 100KB).

If you need to ensure zero data loss during failover, you can use the crit.yml template to ensure no data loss during failover, but this sacrifices some performance as a tradeoff.


pg_rto

Parameter name: pg_rto, Type: int, Level: C

Recovery Time Objective (RTO) in seconds. This is used to calculate Patroni’s TTL value, defaulting to 30 seconds.

If the primary instance is missing for this long, a new leader election will be triggered. This value is not always better when lower; it involves tradeoffs: Reducing this value can decrease unavailability during cluster failover (inability to write), but makes the cluster more sensitive to short-term network jitter, increasing the probability of false positive failover triggers. You need to configure this value based on network conditions and business constraints, making a tradeoff between failure probability and failure impact.

pg_rpo

Parameter name: pg_rpo, Type: int, Level: C

Recovery Point Objective (RPO) in bytes, default: 1048576.

Defaults to 1MiB, meaning up to 1MiB of data loss can be tolerated during failover.

When the primary goes down and all replicas are lagging, you must make a difficult choice: Either promote a replica to become the new primary immediately, accepting acceptable data loss (e.g., less than 1MB), and restore service as quickly as possible. Or wait for the primary to come back online (which may never happen) to avoid any data loss, or abandon automatic failover and wait for human intervention to make the final decision. You need to configure this value based on business preference, making a tradeoff between availability and consistency.

Additionally, you can always ensure RPO = 0 by enabling synchronous commit (e.g., using the crit.yml template), sacrificing some cluster latency/throughput performance to guarantee data consistency.

4.1 - Service Access

Pigsty uses HAProxy to provide service access, with optional pgBouncer for connection pooling, and optional L2 VIP and DNS access.

Split read and write operations, route traffic correctly, and deliver PostgreSQL cluster capabilities reliably.

Service is an abstraction: it represents the form in which database clusters expose their capabilities externally, encapsulating underlying cluster details.

Services are crucial for stable access in production environments, showing their value during automatic failover in high availability clusters. Personal users typically don’t need to worry about this concept.


Personal Users

The concept of “service” is for production environments. Personal users with single-node clusters can skip the complexity and directly use instance names or IP addresses to access the database.

For example, Pigsty’s default single-node pg-meta.meta database can be connected directly using three different users:

psql postgres://dbuser_dba:DBUser.DBA@10.10.10.10/meta     # Connect directly with DBA superuser
psql postgres://dbuser_meta:DBUser.Meta@10.10.10.10/meta   # Connect with default business admin user
psql postgres://dbuser_view:DBUser.View@pg-meta/meta       # Connect with default read-only user via instance domain name

Service Overview

In real-world production environments, we use primary-replica database clusters based on replication. Within a cluster, one and only one instance serves as the leader (primary) that can accept writes. Other instances (replicas) continuously fetch change logs from the cluster leader to stay synchronized. Replicas can also handle read-only requests, significantly offloading the primary in read-heavy, write-light scenarios. Therefore, distinguishing write requests from read-only requests is a common practice.

Additionally, for production environments with high-frequency, short-lived connections, we pool requests through connection pool middleware (Pgbouncer) to reduce connection and backend process creation overhead. However, for scenarios like ETL and change execution, we need to bypass the connection pool and directly access the database. Meanwhile, high-availability clusters may undergo failover during failures, causing cluster leadership changes. Therefore, high-availability database solutions require write traffic to automatically adapt to cluster leadership changes. These varying access needs (read-write separation, pooled vs. direct connections, failover auto-adaptation) ultimately lead to the abstraction of the Service concept.

Typically, database clusters must provide this most basic service:

  • Read-write service (primary): Can read from and write to the database

For production database clusters, at least these two services should be provided:

  • Read-write service (primary): Write data: Can only be served by the primary.
  • Read-only service (replica): Read data: Can be served by replicas; falls back to primary when no replicas are available

Additionally, depending on specific business scenarios, there may be other services, such as:

  • Default direct service (default): Allows (admin) users to bypass the connection pool and directly access the database
  • Offline replica service (offline): Dedicated replica not serving online read traffic, used for ETL and analytical queries
  • Sync replica service (standby): Read-only service with no replication delay, handled by synchronous standby/primary for read queries
  • Delayed replica service (delayed): Access data from the same cluster as it was some time ago, handled by delayed replicas

Access Services

Pigsty’s service delivery boundary stops at the cluster’s HAProxy. Users can access these load balancers through various means.

The typical approach is to use DNS or VIP access, binding them to all or any number of load balancers in the cluster.

pigsty-access.jpg

You can use different host & port combinations, which provide PostgreSQL service in different ways.

Host

TypeSampleDescription
Cluster Domain Namepg-testAccess via cluster domain name (resolved by dnsmasq @ infra nodes)
Cluster VIP Address10.10.10.3Access via L2 VIP address managed by vip-manager, bound to primary node
Instance Hostnamepg-test-1Access via any instance hostname (resolved by dnsmasq @ infra nodes)
Instance IP Address10.10.10.11Access any instance’s IP address

Port

Pigsty uses different ports to distinguish pg services

PortServiceTypeDescription
5432postgresDatabaseDirect access to postgres server
6432pgbouncerMiddlewareAccess postgres through connection pool middleware
5433primaryServiceAccess primary pgbouncer (or postgres)
5434replicaServiceAccess replica pgbouncer (or postgres)
5436defaultServiceAccess primary postgres
5438offlineServiceAccess offline postgres

Combinations

# Access via cluster domain
postgres://test@pg-test:5432/test # DNS -> L2 VIP -> primary direct connection
postgres://test@pg-test:6432/test # DNS -> L2 VIP -> primary connection pool -> primary
postgres://test@pg-test:5433/test # DNS -> L2 VIP -> HAProxy -> primary connection pool -> primary
postgres://test@pg-test:5434/test # DNS -> L2 VIP -> HAProxy -> replica connection pool -> replica
postgres://dbuser_dba@pg-test:5436/test # DNS -> L2 VIP -> HAProxy -> primary direct connection (for admin)
postgres://dbuser_stats@pg-test:5438/test # DNS -> L2 VIP -> HAProxy -> offline direct connection (for ETL/personal queries)

# Access via cluster VIP directly
postgres://test@10.10.10.3:5432/test # L2 VIP -> primary direct access
postgres://test@10.10.10.3:6432/test # L2 VIP -> primary connection pool -> primary
postgres://test@10.10.10.3:5433/test # L2 VIP -> HAProxy -> primary connection pool -> primary
postgres://test@10.10.10.3:5434/test # L2 VIP -> HAProxy -> replica connection pool -> replica
postgres://dbuser_dba@10.10.10.3:5436/test # L2 VIP -> HAProxy -> primary direct connection (for admin)
postgres://dbuser_stats@10.10.10.3::5438/test # L2 VIP -> HAProxy -> offline direct connection (for ETL/personal queries)

# Directly specify any cluster instance name
postgres://test@pg-test-1:5432/test # DNS -> database instance direct connection (singleton access)
postgres://test@pg-test-1:6432/test # DNS -> connection pool -> database
postgres://test@pg-test-1:5433/test # DNS -> HAProxy -> connection pool -> database read/write
postgres://test@pg-test-1:5434/test # DNS -> HAProxy -> connection pool -> database read-only
postgres://dbuser_dba@pg-test-1:5436/test # DNS -> HAProxy -> database direct connection
postgres://dbuser_stats@pg-test-1:5438/test # DNS -> HAProxy -> database offline read/write

# Directly specify any cluster instance IP access
postgres://test@10.10.10.11:5432/test # Database instance direct connection (directly specify instance, no automatic traffic distribution)
postgres://test@10.10.10.11:6432/test # Connection pool -> database
postgres://test@10.10.10.11:5433/test # HAProxy -> connection pool -> database read/write
postgres://test@10.10.10.11:5434/test # HAProxy -> connection pool -> database read-only
postgres://dbuser_dba@10.10.10.11:5436/test # HAProxy -> database direct connection
postgres://dbuser_stats@10.10.10.11:5438/test # HAProxy -> database offline read-write

# Smart client: read/write separation via URL
postgres://test@10.10.10.11:6432,10.10.10.12:6432,10.10.10.13:6432/test?target_session_attrs=primary
postgres://test@10.10.10.11:6432,10.10.10.12:6432,10.10.10.13:6432/test?target_session_attrs=prefer-standby

5 - Point-in-Time Recovery

Pigsty uses pgBackRest to implement PostgreSQL point-in-time recovery, allowing users to roll back to any point in time within the backup policy window.

Overview

You can restore and roll back your cluster to any point in the past, avoiding data loss caused by software defects and human errors.

Pigsty’s PostgreSQL clusters come with auto-configured Point-in-Time Recovery (PITR) capability, powered by the backup component pgBackRest and optional object storage repository MinIO.

High availability solutions can address hardware failures but are powerless against data deletion/overwriting/database drops caused by software defects and human errors. For such situations, Pigsty provides out-of-the-box Point-in-Time Recovery (PITR) capability, enabled by default without additional configuration.

Pigsty provides default configurations for base backups and WAL archiving. You can use local directories and disks, or dedicated MinIO clusters or S3 object storage services to store backups and achieve geo-redundant disaster recovery. When using local disks, the default capability to recover to any point within the past day is retained. When using MinIO or S3, the default capability to recover to any point within the past week is retained. As long as storage space permits, you can retain any arbitrarily long recoverable time window, as your budget allows.


What problems does PITR solve?

  • Enhanced disaster recovery: RPO drops from ∞ to tens of MB, RTO drops from ∞ to hours/minutes.
  • Ensures data security: Data integrity in C/I/A: avoids data consistency issues caused by accidental deletion.
  • Ensures data security: Data availability in C/I/A: provides fallback for “permanently unavailable” disaster scenarios
Standalone Configuration StrategyEventRTORPO
NothingCrash Permanently lost All lost
Base BackupCrash Depends on backup size and bandwidth (hours) Lose data since last backup (hours to days)
Base Backup + WAL ArchiveCrash Depends on backup size and bandwidth (hours) Lose unarchived data (tens of MB)

What are the costs of PITR?

  • Reduces C in data security: Confidentiality, creates additional leak points, requires additional backup protection.
  • Extra resource consumption: Local storage or network traffic/bandwidth overhead, usually not a concern.
  • Increased complexity: Users need to pay backup management costs.

Limitations of PITR

If only PITR is used for failure recovery, RTO and RPO metrics are inferior compared to high availability solutions, and typically both should be used together.

  • RTO: With only standalone + PITR, recovery time depends on backup size and network/disk bandwidth, ranging from tens of minutes to hours or days.
  • RPO: With only standalone + PITR, some data may be lost during crashes - one or several WAL segment files may not yet be archived, losing 16 MB to tens of MB of data.

Besides PITR, you can also use delayed clusters in Pigsty to address data deletion/modification caused by human errors or software defects.


How It Works

Point-in-time recovery allows you to restore and roll back your cluster to “any point” in the past, avoiding data loss caused by software defects and human errors. To achieve this, two preparations are needed: Base Backup and WAL Archiving. Having a base backup allows users to restore the database to its state at backup time, while having WAL archives starting from a base backup allows users to restore the database to any point after the base backup time.

fig-10-02.png

For detailed principles, see: Base Backup and Point-in-Time Recovery; for specific operations, refer to PGSQL Admin: Backup and Recovery.

Base Backup

Pigsty uses pgBackRest to manage PostgreSQL backups. pgBackRest initializes empty repositories on all cluster instances but only actually uses the repository on the cluster primary.

pgBackRest supports three backup modes: full backup, incremental backup, and differential backup, with the first two being most commonly used. Full backup takes a complete physical snapshot of the database cluster at the current moment; incremental backup records the differences between the current database cluster and the previous full backup.

Pigsty provides a wrapper command for backups: /pg/bin/pg-backup [full|incr]. You can schedule regular base backups as needed through Crontab or any other task scheduling system.

WAL Archiving

Pigsty enables WAL archiving on the cluster primary by default and uses the pgbackrest command-line tool to continuously push WAL segment files to the backup repository.

pgBackRest automatically manages required WAL files and timely cleans up expired backups and their corresponding WAL archive files based on the backup retention policy.

If you don’t need PITR functionality, you can disable WAL archiving by configuring the cluster: archive_mode: off and remove node_crontab to stop scheduled backup tasks.


Implementation

By default, Pigsty provides two preset backup strategies: The default uses local filesystem backup repository, performing one full backup daily to ensure users can roll back to any point within the past day. The alternative strategy uses dedicated MinIO clusters or S3 storage for backups, with weekly full backups, daily incremental backups, and two weeks of backup and WAL archive retention by default.

Pigsty uses pgBackRest to manage backups, receive WAL archives, and perform PITR. Backup repositories can be flexibly configured (pgbackrest_repo): defaults to primary’s local filesystem (local), but can also use other disk paths, or the included optional MinIO service (minio) and cloud S3 services.

pgbackrest_enabled: true          # enable pgBackRest on pgsql host?
pgbackrest_clean: true            # remove pg backup data during init?
pgbackrest_log_dir: /pg/log/pgbackrest # pgbackrest log dir, `/pg/log/pgbackrest` by default
pgbackrest_method: local          # pgbackrest repo method: local, minio, [user-defined...]
pgbackrest_repo:                  # pgbackrest repo: https://pgbackrest.org/configuration.html#section-repository
  local:                          # default pgbackrest repo with local posix fs
    path: /pg/backup              # local backup directory, `/pg/backup` by default
    retention_full_type: count    # retention full backup by count
    retention_full: 2             # keep at most 3 full backup, at least 2, when using local fs repo
  minio:                          # optional minio repo for pgbackrest
    type: s3                      # minio is s3-compatible, so use s3
    s3_endpoint: sss.pigsty       # minio endpoint domain name, `sss.pigsty` by default
    s3_region: us-east-1          # minio region, us-east-1 by default, not used for minio
    s3_bucket: pgsql              # minio bucket name, `pgsql` by default
    s3_key: pgbackrest            # minio user access key for pgbackrest
    s3_key_secret: S3User.Backup  # minio user secret key for pgbackrest
    s3_uri_style: path            # use path style uri for minio rather than host style
    path: /pgbackrest             # minio backup path, `/pgbackrest` by default
    storage_port: 9000            # minio port, 9000 by default
    storage_ca_file: /etc/pki/ca.crt  # minio ca file path, `/etc/pki/ca.crt` by default
    bundle: y                     # bundle small files into a single file
    cipher_type: aes-256-cbc      # enable AES encryption for remote backup repo
    cipher_pass: pgBackRest       # AES encryption password, default is 'pgBackRest'
    retention_full_type: time     # retention full backup by time on minio repo
    retention_full: 14            # keep full backup for last 14 days
  # You can also add other optional backup repos, such as S3, for geo-redundant disaster recovery

Pigsty parameter pgbackrest_repo target repositories are converted to repository definitions in the /etc/pgbackrest/pgbackrest.conf configuration file. For example, if you define a US West S3 repository for storing cold backups, you can use the following reference configuration.

s3:    # ------> /etc/pgbackrest/pgbackrest.conf
  repo1-type: s3                                   # ----> repo1-type=s3
  repo1-s3-region: us-west-1                       # ----> repo1-s3-region=us-west-1
  repo1-s3-endpoint: s3-us-west-1.amazonaws.com    # ----> repo1-s3-endpoint=s3-us-west-1.amazonaws.com
  repo1-s3-key: '<your_access_key>'                # ----> repo1-s3-key=<your_access_key>
  repo1-s3-key-secret: '<your_secret_key>'         # ----> repo1-s3-key-secret=<your_secret_key>
  repo1-s3-bucket: pgsql                           # ----> repo1-s3-bucket=pgsql
  repo1-s3-uri-style: host                         # ----> repo1-s3-uri-style=host
  repo1-path: /pgbackrest                          # ----> repo1-path=/pgbackrest
  repo1-bundle: y                                  # ----> repo1-bundle=y
  repo1-cipher-type: aes-256-cbc                   # ----> repo1-cipher-type=aes-256-cbc
  repo1-cipher-pass: pgBackRest                    # ----> repo1-cipher-pass=pgBackRest
  repo1-retention-full-type: time                  # ----> repo1-retention-full-type=time
  repo1-retention-full: 90                         # ----> repo1-retention-full=90

Recovery

You can directly use the following wrapper commands for PostgreSQL database cluster point-in-time recovery.

Pigsty uses incremental differential parallel recovery by default, allowing you to recover to a specified point in time at maximum speed.

pg-pitr                                 # Restore to the end of WAL archive stream (e.g., for entire datacenter failure)
pg-pitr -i                              # Restore to the most recent backup completion time (rarely used)
pg-pitr --time="2022-12-30 14:44:44+08" # Restore to a specified point in time (for database or table drops)
pg-pitr --name="my-restore-point"       # Restore to a named restore point created with pg_create_restore_point
pg-pitr --lsn="0/7C82CB8" -X            # Restore to immediately before the LSN
pg-pitr --xid="1234567" -X -P           # Restore to immediately before the specified transaction ID, then promote cluster to primary
pg-pitr --backup=latest                 # Restore to the latest backup set
pg-pitr --backup=20221108-105325        # Restore to a specific backup set, backup sets can be listed with pgbackrest info

pg-pitr                                 # pgbackrest --stanza=pg-meta restore
pg-pitr -i                              # pgbackrest --stanza=pg-meta --type=immediate restore
pg-pitr -t "2022-12-30 14:44:44+08"     # pgbackrest --stanza=pg-meta --type=time --target="2022-12-30 14:44:44+08" restore
pg-pitr -n "my-restore-point"           # pgbackrest --stanza=pg-meta --type=name --target=my-restore-point restore
pg-pitr -b 20221108-105325F             # pgbackrest --stanza=pg-meta --type=name --set=20221230-120101F restore
pg-pitr -l "0/7C82CB8" -X               # pgbackrest --stanza=pg-meta --type=lsn --target="0/7C82CB8" --target-exclusive restore
pg-pitr -x 1234567 -X -P                # pgbackrest --stanza=pg-meta --type=xid --target="0/7C82CB8" --target-exclusive --target-action=promote restore

When performing PITR, you can use Pigsty’s monitoring system to observe the cluster LSN position status and determine whether recovery to the specified point in time, transaction point, LSN position, or other point was successful.

pitr




6 - Monitoring System

How Pigsty’s monitoring system is architected and how monitored targets are automatically managed.

7 - Security and Compliance

Authentication, access control, encrypted communication, audit logs—meeting SOC2 compliance requirements.

Pigsty’s Security Philosophy

Secure by Default: Out-of-the-box security configuration—basic protection without additional setup.

Progressive Configuration: Enterprise users can gradually enhance security measures through configuration.

Defense in Depth: Multiple security layers—even if one layer is breached, others remain protective.

Least Privilege: Grant users only the minimum permissions needed to complete tasks, reducing risk.


Default Security Configuration

Pigsty enables these security features by default:

FeatureDefault ConfigDescription
Password Encryptionscram-sha-256PostgreSQL’s most secure password hash algorithm
SSL SupportEnabledClients can optionally use SSL encrypted connections
Local CAAuto-generatedSelf-signed CA issues server certificates
HBA LayeringSource-based controlDifferent auth strength for different sources
Role SystemFour-tier permissionsRead-only/Read-write/Admin/Offline
Data ChecksumsEnabledDetects storage-layer data corruption
Audit LogsEnabledRecords connections and slow queries

Enhanced Configuration

Additional configuration enables higher security levels:

FeatureConfiguration MethodSecurity Level
Password strength checkEnable passwordcheck extensionEnterprise
Enforce SSLHBA uses hostsslEnterprise
Client certificatesHBA uses cert authFinancial-grade
Backup encryptionConfigure cipher_typeCompliance
FirewallConfigure node_firewall_modeInfrastructure

If you only have one minute, remember this diagram:

flowchart TB
    subgraph L1["Layer 1: Network Security"]
        L1A["Firewall + SSL/TLS Encryption + HAProxy Proxy"]
        L1B["Who can connect? Is the connection encrypted?"]
    end

    subgraph L2["Layer 2: Authentication"]
        L2A["HBA Rules + SCRAM-SHA-256 Passwords + Certificate Auth"]
        L2B["Who are you? How do you prove it?"]
    end

    subgraph L3["Layer 3: Access Control"]
        L3A["Role System + Object Permissions + Database Isolation"]
        L3B["What can you do? What data can you access?"]
    end

    subgraph L4["Layer 4: Data Security"]
        L4A["Data Checksums + Backup Encryption + Audit Logs"]
        L4B["Is data intact? Are operations logged?"]
    end

    L1 --> L2 --> L3 --> L4

Core Value: Enterprise-grade security configuration out of the box, best practices enabled by default, additional configuration achieves SOC 2 compliance.


Contents

SectionDescriptionCore Question
Security OverviewSecurity capability overview and checklistWhat’s the overall security architecture?
AuthenticationHBA rules, password policies, certificate authHow to verify user identity?
Access ControlRole system, permission model, database isolationHow to control user permissions?
Encrypted CommunicationSSL/TLS, local CA, certificate managementHow to protect data in transit?
Compliance ChecklistDetailed SOC2 mappingHow to meet compliance requirements?

Why Security Matters

The Cost of Data Breaches

flowchart LR
    Breach["Data Breach"]

    subgraph Direct["Direct Losses"]
        D1["Regulatory Fines<br/>GDPR up to 4% global revenue"]
        D2["Legal Costs"]
        D3["Customer Compensation"]
    end

    subgraph Indirect["Indirect Losses"]
        I1["Brand Reputation Damage"]
        I2["Customer Trust Loss"]
        I3["Business Disruption"]
    end

    subgraph Compliance["Compliance Risk"]
        C1["Liability"]
        C2["SOC 2: Certification Revocation"]
        C3["Industry Access: Banned from Operating"]
    end

    Breach --> Direct
    Breach --> Indirect
    Breach --> Compliance

Default Users and Passwords

Pigsty creates these system users by default:

UserPurposeDefault PasswordPost-Deploy Action
postgresSystem superuserNo password (local only)Keep passwordless
dbuser_dbaAdmin userDBUser.DBAMust change
dbuser_monitorMonitor userDBUser.MonitorMust change
replicatorReplication userDBUser.ReplicatorMust change
# pigsty.yml - Change default passwords
pg_admin_password: 'YourSecurePassword123!'
pg_monitor_password: 'AnotherSecurePass456!'
pg_replication_password: 'ReplicationPass789!'

Important: After production deployment, immediately change these default passwords!


Role and Permission System

Pigsty provides a four-tier role system out of the box:

flowchart TB
    subgraph Admin["dbrole_admin (Admin)"]
        A1["Inherits dbrole_readwrite"]
        A2["Can CREATE/DROP/ALTER objects (DDL)"]
        A3["For: Business admins, apps needing table creation"]
    end

    subgraph RW["dbrole_readwrite (Read-Write)"]
        RW1["Inherits dbrole_readonly"]
        RW2["Can INSERT/UPDATE/DELETE"]
        RW3["For: Production business accounts"]
    end

    subgraph RO["dbrole_readonly (Read-Only)"]
        RO1["Can SELECT all tables"]
        RO2["For: Reporting, data analysis"]
    end

    subgraph Offline["dbrole_offline (Offline)"]
        OFF1["Can only access offline instances"]
        OFF2["For: ETL, personal analysis, slow queries"]
    end

    Admin --> |inherits| RW
    RW --> |inherits| RO

Creating Business Users

pg_users:
  # Read-only user - for reporting
  - name: dbuser_report
    password: ReportUser123
    roles: [dbrole_readonly]
    pgbouncer: true

  # Read-write user - for production
  - name: dbuser_app
    password: AppUser456
    roles: [dbrole_readwrite]
    pgbouncer: true

  # Admin user - for DDL operations
  - name: dbuser_admin
    password: AdminUser789
    roles: [dbrole_admin]
    pgbouncer: true

HBA Access Control

HBA (Host-Based Authentication) controls “who can connect from where”:

flowchart LR
    subgraph Sources["Connection Sources"]
        S1["Local Socket"]
        S2["localhost"]
        S3["Intranet CIDR"]
        S4["Admin Nodes"]
        S5["External"]
    end

    subgraph Auth["Auth Methods"]
        A1["ident/peer<br/>OS user mapping, most secure"]
        A2["scram-sha-256<br/>Password auth"]
        A3["scram-sha-256 + SSL<br/>Enforce SSL"]
    end

    S1 --> A1
    S2 --> A2
    S3 --> A2
    S4 --> A3
    S5 --> A3

    Note["Rules matched in order<br/>First matching rule applies"]

Custom HBA Rules

pg_hba_rules:
  # Allow app servers from intranet
  - {user: dbuser_app, db: mydb, addr: '10.10.10.0/24', auth: scram-sha-256}

  # Force SSL for certain users
  - {user: admin, db: all, addr: world, auth: ssl}

  # Require certificate auth (highest security)
  - {user: secure_user, db: all, addr: world, auth: cert}

Encrypted Communication

SSL/TLS Architecture

sequenceDiagram
    participant Client as Client
    participant Server as PostgreSQL

    Client->>Server: 1. ClientHello
    Server->>Client: 2. ServerHello
    Server->>Client: 3. Server Certificate
    Client->>Server: 4. Client Key
    Client->>Server: 5. Encrypted Channel Established
    Server->>Client: 5. Encrypted Channel Established

    rect rgb(200, 255, 200)
        Note over Client,Server: Encrypted Data Transfer
        Client->>Server: 6. Application Data (encrypted)
        Server->>Client: 6. Application Data (encrypted)
    end

    Note over Client,Server: Prevents eavesdropping, tampering, verifies server identity

Local CA

Pigsty automatically generates a local CA and issues certificates:

/etc/pki/
├── ca.crt              # CA certificate (public)
├── ca.key              # CA private key (keep secret!)
└── server.crt/key      # Server certificate/key

Important: Securely back up ca.key—if lost, all certificates must be reissued!


Compliance Mapping

SOC 2 Type II

Control PointPigsty SupportDescription
CC6.1 Logical Access ControlYesHBA + Role System
CC6.6 Transmission EncryptionYesSSL/TLS
CC7.2 System MonitoringYesPrometheus + Grafana
CC9.1 Business ContinuityYesHA + PITR
A1.2 Data RecoveryYespgBackRest Backup

Legend: Yes = Default satisfaction · Partial = Needs additional config


Security Checklist

Before Deployment

  • Prepare strong passwords (use password manager)
  • Plan network partitions (intranet/external CIDRs)
  • Decide SSL strategy (self-signed/external CA)

After Deployment (Required)

  • Change all default passwords
  • Verify HBA rules match expectations
  • Test SSL connections work
  • Configure auth failure alerts
  • Securely back up CA private key

Regular Maintenance

  • Audit user permissions
  • Check for expired accounts
  • Update certificates (if needed)
  • Review audit logs

Quick Config Examples

Production Security Configuration

# pigsty.yml - Production security config example
all:
  vars:
    # Change default passwords (required!)
    pg_admin_password: 'SecureDBAPassword2024!'
    pg_monitor_password: 'SecureMonitorPass2024!'
    pg_replication_password: 'SecureReplPass2024!'

    # Enable password strength check
    pg_libs: 'passwordcheck, pg_stat_statements, auto_explain'

    # Custom HBA rules
    pg_hba_rules:
      # App servers
      - {user: app, db: appdb, addr: '10.10.10.0/24', auth: scram-sha-256}
      # Admin enforce SSL
      - {user: dbuser_dba, db: all, addr: world, auth: ssl}

Financial-Grade Security Configuration

# Financial-grade config - enable certificate auth
pg_hba_rules:
  # Trading system uses certificate auth
  - {user: trade_user, db: trade, addr: world, auth: cert}
  # Other systems use SSL + password
  - {user: all, db: all, addr: world, auth: ssl}

# Enable backup encryption
pgbackrest_repo:
  minio:
    cipher_type: aes-256-cbc
    cipher_pass: 'YourBackupEncryptionKey'

Next Steps

Deep dive into security configuration details:

Related topics:

7.1 - Local CA

Pigsty includes a self-signed CA PKI infrastructure for issuing SSL certificates and encrypting network traffic.

Pigsty enables security best practices by default: using SSL to encrypt network traffic and HTTPS for web interfaces.

To achieve this, Pigsty includes a local self-signed CA for issuing SSL certificates and encrypting network communications.

By default, SSL and HTTPS are enabled but not enforced. For environments with higher security requirements, you can enforce SSL and HTTPS usage.


Local CA

During initialization, Pigsty generates a self-signed CA in the Pigsty source directory (~/pigsty) on the ADMIN node. This CA can be used for SSL, HTTPS, digital signatures, issuing database client certificates, and advanced security features.

Each Pigsty deployment uses a unique CA—CAs from different Pigsty deployments are not mutually trusted.

The local CA consists of two files, located in the files/pki/ca directory by default:

  • ca.crt: Self-signed CA root certificate, distributed to all managed nodes for certificate verification.
  • ca.key: CA private key for issuing certificates and verifying CA identity—keep this file secure and prevent leakage!

Using an Existing CA

If you already have your own CA PKI infrastructure, Pigsty can be configured to use your existing CA.

Simply place your CA public key and private key files in the files/pki/ca directory:

files/pki/ca/ca.key     # Core CA private key file, must exist; if missing, a new one is randomly generated
files/pki/ca/ca.crt     # If certificate file is missing, Pigsty auto-generates a new root certificate from the CA private key

When Pigsty executes the install.yml or infra.yml playbooks, if a ca.key private key file exists in files/pki/ca, the existing CA will be used. Since ca.crt can be generated from the ca.key private key, Pigsty will automatically regenerate the root certificate file if it’s missing.


Trusting the CA

During Pigsty installation, ca.crt is distributed to all nodes at /etc/pki/ca.crt during the node_ca task in the node.yml playbook.

EL-family and Debian-family operating systems have different default trusted CA certificate paths, so the distribution path and update methods differ:

rm -rf /etc/pki/ca-trust/source/anchors/ca.crt
ln -s /etc/pki/ca.crt /etc/pki/ca-trust/source/anchors/ca.crt
/bin/update-ca-trust
rm -rf /usr/local/share/ca-certificates/ca.crt
ln -s /etc/pki/ca.crt /usr/local/share/ca-certificates/ca.crt
/usr/sbin/update-ca-certificates

Pigsty issues HTTPS certificates for domain names used by web systems on infrastructure nodes by default, allowing HTTPS access to Pigsty’s web interfaces.

If you want to avoid “untrusted CA certificate” warnings in client browsers, distribute ca.crt to the trusted certificate directory on client machines.

You can double-click the ca.crt file to add it to your system keychain. For example, on MacOS, open “Keychain Access,” search for pigsty-ca, and set it to “trust” this root certificate.


Viewing Certificate Contents

Use the following command to view the Pigsty CA certificate contents:

openssl x509 -text -in /etc/pki/ca.crt
Local CA Root Certificate Content Example
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            50:29:e3:60:96:93:f4:85:14:fe:44:81:73:b5:e1:09:2a:a8:5c:0a
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=pigsty, OU=ca, CN=pigsty-ca
        Validity
            Not Before: Feb  7 00:56:27 2023 GMT
            Not After : Jan 14 00:56:27 2123 GMT
        Subject: O=pigsty, OU=ca, CN=pigsty-ca
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:c1:41:74:4f:28:c3:3c:2b:13:a2:37:05:87:31:
                    ....
                    e6:bd:69:a5:5b:e3:b4:c0:65:09:6e:84:14:e9:eb:
                    90:f7:61
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Subject Alternative Name:
                DNS:pigsty-ca
            X509v3 Key Usage:
                Digital Signature, Certificate Sign, CRL Sign
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:1
            X509v3 Subject Key Identifier:
                C5:F6:23:CE:BA:F3:96:F6:4B:48:A5:B1:CD:D4:FA:2B:BD:6F:A6:9C
    Signature Algorithm: sha256WithRSAEncryption
    Signature Value:
        89:9d:21:35:59:6b:2c:9b:c7:6d:26:5b:a9:49:80:93:81:18:
        ....
        9e:dd:87:88:0d:c4:29:9e
-----BEGIN CERTIFICATE-----
...
cXyWAYcvfPae3YeIDcQpng==
-----END CERTIFICATE-----

Issuing Certificates

If you want to use client certificate authentication, you can use the local CA and the cert.yml playbook to manually issue PostgreSQL client certificates.

Set the certificate’s CN field to the database username:

./cert.yml -e cn=dbuser_dba
./cert.yml -e cn=dbuser_monitor

Issued certificates are generated in files/pki/misc/<cn>.{key,crt} by default.

7.2 - Access Control

Pigsty provides standard security practices with an out-of-the-box role and permission model.

Pigsty provides an out-of-the-box access control model based on the Role System and Permission System.

Access control is important, but many users struggle to implement it properly. Pigsty provides a streamlined access control model that serves as a security baseline for your cluster.


Role System

Pigsty’s default role system includes four default roles and four default users:

Role NameAttributesMember OfDescription
dbrole_readonlyNOLOGINRole: Global read-only
dbrole_readwriteNOLOGINdbrole_readonlyRole: Global read-write
dbrole_adminNOLOGINpg_monitor,dbrole_readwriteRole: Admin/Object creation
dbrole_offlineNOLOGINRole: Restricted read-only
postgresSUPERUSERSystem superuser
replicatorREPLICATIONpg_monitor,dbrole_readonlySystem replication user
dbuser_dbaSUPERUSERdbrole_adminPostgreSQL admin user
dbuser_monitorpg_monitorPostgreSQL monitor user

These roles and users are defined as follows:

pg_default_roles:                 # Global default roles and system users
  - { name: dbrole_readonly  ,login: false ,comment: role for global read-only access     }
  - { name: dbrole_offline   ,login: false ,comment: role for restricted read-only access }
  - { name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment: role for global read-write access }
  - { name: dbrole_admin     ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment: role for object creation }
  - { name: postgres     ,superuser: true  ,comment: system superuser }
  - { name: replicator ,replication: true  ,roles: [pg_monitor, dbrole_readonly] ,comment: system replicator }
  - { name: dbuser_dba   ,superuser: true  ,roles: [dbrole_admin]  ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 ,comment: pgsql admin user }
  - { name: dbuser_monitor ,roles: [pg_monitor] ,pgbouncer: true ,parameters: {log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment: pgsql monitor user }

Default Roles

Pigsty has four default roles:

  • Business Read-Only (dbrole_readonly): Role for global read-only access. Use this if other services need read-only access to this database.
  • Business Read-Write (dbrole_readwrite): Role for global read-write access. Production accounts for primary business should have database read-write permissions.
  • Business Admin (dbrole_admin): Role with DDL permissions. Typically used for business administrators or scenarios requiring table creation in applications.
  • Offline Read-Only (dbrole_offline): Restricted read-only access role (can only access offline instances). Usually for personal users or ETL tool accounts.

Default roles are defined in pg_default_roles. Unless you know what you’re doing, don’t change the default role names.

- { name: dbrole_readonly  , login: false , comment: role for global read-only access  }
- { name: dbrole_offline ,   login: false , comment: role for restricted read-only access (offline instance) }
- { name: dbrole_readwrite , login: false , roles: [dbrole_readonly], comment: role for global read-write access }
- { name: dbrole_admin , login: false , roles: [pg_monitor, dbrole_readwrite] , comment: role for object creation }

Default Users

Pigsty also has four default users (system users):

  • Superuser (postgres): Cluster owner and creator, same name as OS dbsu.
  • Replication User (replicator): System user for primary-replica replication.
  • Monitor User (dbuser_monitor): User for monitoring database and connection pool metrics.
  • Admin User (dbuser_dba): Administrator for daily operations and database changes.

These 4 default users’ username/password are defined by 4 pairs of dedicated parameters and referenced in many places:

Remember to change these passwords in production deployments—don’t use defaults!

pg_dbsu: postgres                             # Database superuser name, recommended not to change
pg_dbsu_password: ''                          # Database superuser password, recommended to leave empty!
pg_replication_username: replicator           # System replication username
pg_replication_password: DBUser.Replicator    # System replication password, must change!
pg_monitor_username: dbuser_monitor           # System monitor username
pg_monitor_password: DBUser.Monitor           # System monitor password, must change!
pg_admin_username: dbuser_dba                 # System admin username
pg_admin_password: DBUser.DBA                 # System admin password, must change!

Permission System

Pigsty has an out-of-the-box permission model that works with default roles.

  • All users can access all schemas.
  • Read-only users (dbrole_readonly) can read from all tables. (SELECT, EXECUTE)
  • Read-write users (dbrole_readwrite) can write to all tables and run DML. (INSERT, UPDATE, DELETE)
  • Admin users (dbrole_admin) can create objects and run DDL. (CREATE, USAGE, TRUNCATE, REFERENCES, TRIGGER)
  • Offline users (dbrole_offline) are similar to read-only but with restricted access—only offline instances (pg_role = 'offline' or pg_offline_query = true)
  • Objects created by admin users will have correct permissions.
  • Default privileges are configured on all databases, including template databases.
  • Database connection permissions are managed by database definition.
  • CREATE privilege on databases and public schema is revoked from PUBLIC by default.

Object Privileges

Default privileges for newly created objects are controlled by pg_default_privileges:

- GRANT USAGE      ON SCHEMAS   TO dbrole_readonly
- GRANT SELECT     ON TABLES    TO dbrole_readonly
- GRANT SELECT     ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE    ON FUNCTIONS TO dbrole_readonly
- GRANT USAGE      ON SCHEMAS   TO dbrole_offline
- GRANT SELECT     ON TABLES    TO dbrole_offline
- GRANT SELECT     ON SEQUENCES TO dbrole_offline
- GRANT EXECUTE    ON FUNCTIONS TO dbrole_offline
- GRANT INSERT     ON TABLES    TO dbrole_readwrite
- GRANT UPDATE     ON TABLES    TO dbrole_readwrite
- GRANT DELETE     ON TABLES    TO dbrole_readwrite
- GRANT USAGE      ON SEQUENCES TO dbrole_readwrite
- GRANT UPDATE     ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE   ON TABLES    TO dbrole_admin
- GRANT REFERENCES ON TABLES    TO dbrole_admin
- GRANT TRIGGER    ON TABLES    TO dbrole_admin
- GRANT CREATE     ON SCHEMAS   TO dbrole_admin

Objects newly created by admins will have the above privileges by default. Use \ddp+ to view these default privileges:

TypeAccess Privileges
Function=X
dbrole_readonly=X
dbrole_offline=X
dbrole_admin=X
Schemadbrole_readonly=U
dbrole_offline=U
dbrole_admin=UC
Sequencedbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=wU
dbrole_admin=rwU
Tabledbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=awd
dbrole_admin=arwdDxt

Default Privileges

The SQL statement ALTER DEFAULT PRIVILEGES lets you set privileges for future objects. It doesn’t affect existing objects or objects created by non-admin users.

In Pigsty, default privileges are defined for three roles:

{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_dbsu }} {{ priv }};
{% endfor %}

{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_admin_username }} {{ priv }};
{% endfor %}

-- For other business admins, they should SET ROLE dbrole_admin before executing DDL
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" {{ priv }};
{% endfor %}

To maintain correct object permissions, you must execute DDL with admin users:

  1. {{ pg_dbsu }}, defaults to postgres
  2. {{ pg_admin_username }}, defaults to dbuser_dba
  3. Business admin users granted dbrole_admin role (using SET ROLE to switch to dbrole_admin)

Using postgres as global object owner is wise. If creating objects as business admin, use SET ROLE dbrole_admin before creation to maintain correct permissions.


Database Privileges

In Pigsty, database-level privileges are covered in database definition.

Databases have three privilege levels: CONNECT, CREATE, TEMP, and a special ‘privilege’: OWNERSHIP.

- name: meta         # Required, `name` is the only required field
  owner: postgres    # Optional, database owner, defaults to postgres
  allowconn: true    # Optional, allow connections, default true
  revokeconn: false  # Optional, revoke public connect privilege, default false
  • If owner parameter exists, it becomes database owner instead of {{ pg_dbsu }}
  • If revokeconn is false, all users have database CONNECT privilege (default behavior)
  • If revokeconn is explicitly true:
    • Database CONNECT privilege is revoked from PUBLIC
    • CONNECT privilege is explicitly granted to {{ pg_replication_username }}, {{ pg_monitor_username }}, {{ pg_admin_username }}
    • CONNECT privilege with GRANT OPTION is granted to database owner
  • revokeconn can isolate cross-database access within the same cluster

CREATE Privilege

For security, Pigsty revokes CREATE privilege on databases from PUBLIC by default. This is also default behavior since PostgreSQL 15.

Database owners can always adjust CREATE privileges based on actual needs.