This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Kernel Forks

How to use other PostgreSQL kernel forks in Pigsty? Such as Citus, Babelfish, IvorySQL, PolarDB, etc.

1: PostgreSQL
2: Supabase
3: Percona
4: OpenHalo
5: OrioleDB
6: Citus
7: Babelfish
8: IvorySQL
9: PolarDB PG
10: PolarDB Oracle
11: PostgresML
12: Greenplum
13: Cloudberry
14: Neon

In Pigsty, you can replace the “native PG kernel” with different “flavors” of PostgreSQL forks to achieve special features and effects.

Pigsty supports various PostgreSQL kernels and compatible forks, enabling you to simulate different database systems while leveraging PostgreSQL’s ecosystem. Each kernel provides unique capabilities and compatibility layers.

Kernel	Key Feature	Description
PostgreSQL	Original Flavor	Vanilla PostgreSQL with 440 extensions
Citus	Horizontal Scaling	Distributed PostgreSQL via native extension
WiltonDB	SQL Server Compatible	SQL Server wire-protocol compatibility
IvorySQL	Oracle Compatible	Oracle syntax and PL/SQL compatibility
OpenHalo	MySQL Compatible	MySQL wire-protocol compatibility
Percona	Transparent Encryption	Percona Distribution with pg_tde
FerretDB	MongoDB Migration	MongoDB wire-protocol compatibility
OrioleDB	OLTP Optimization	Zheap, No bloat, S3 Storage
PolarDB	Aurora-style RAC	RAC, China domestic compliance
Supabase	Backend as a Service	BaaS based on PostgreSQL, Firebase alternative
Cloudberry	MPP DW & Analytics	Massively parallel processing data warehouse

1 - PostgreSQL

Vanilla PostgreSQL kernel with 440 extensions

PostgreSQL is the world’s most advanced and popular open-source database.

Pigsty supports PostgreSQL 13 ~ 18 and provides 440 PG extensions.

Quick Start

Install Pigsty using the pgsql configuration template.

./configure -c pgsql     # Use postgres kernel
./deploy.yml             # Set up everything with pigsty

Most configuration templates use PostgreSQL kernel by default, for example:

meta : Default, postgres with core extensions (vector, postgis, timescale)
rich : postgres with all extensions installed
slim : postgres only, no monitoring infrastructure
full : 4-node sandbox for HA demonstration
pgsql : minimal postgres kernel configuration example

Configuration

Vanilla PostgreSQL kernel requires no special adjustments:

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - { name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin   ] ,comment: pigsty admin user }
      - { name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer  }
    pg_databases:
      - { name: meta, baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [ vector ]}
    pg_hba_rules:
      - { user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes' }
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily
    pg_packages: [ pgsql-main, pgsql-common ]   # pg kernel and common utilities
    #pg_extensions: [ pg18-time ,pg18-gis ,pg18-rag ,pg18-fts ,pg18-olap ,pg18-feat ,pg18-lang ,pg18-type ,pg18-util ,pg18-func ,pg18-admin ,pg18-stat ,pg18-sec ,pg18-fdw ,pg18-sim ,pg18-etl]

Version Selection

To use a different PostgreSQL major version, you can configure it using the -v parameter:

./configure -c pgsql            # Default is postgresql 18, no need to specify explicitly
./configure -c pgsql -v 17      # Use postgresql 17
./configure -c pgsql -v 16      # Use postgresql 16
./configure -c pgsql -v 15      # Use postgresql 15
./configure -c pgsql -v 14      # Use postgresql 14
./configure -c pgsql -v 13      # Use postgresql 13

If a PostgreSQL cluster is already installed, you need to uninstall it before installing a new version:

./pgsql-rm.yml # -l pg-meta

Extension Ecosystem

Pigsty provides a rich extension ecosystem for PostgreSQL, including:

Time-series: timescaledb, pg_cron, periods
Geospatial: postgis, h3, pgrouting
Vector: pgvector, pgml, vchord
Search: pg_trgm, zhparser, pgroonga
Analytics: citus, pg_duckdb, pg_analytics
Features: age, pg_graphql, rum
Languages: plpython3u, pljava, plv8
Types: hstore, ltree, citext
Utilities: http, pg_net, pgjwt
Functions: pgcrypto, uuid-ossp, pg_uuidv7
Administration: pg_repack, pgagent, pg_squeeze
Statistics: pg_stat_statements, pg_qualstats, auto_explain
Security: pgaudit, pgcrypto, pgsodium
Foreign: postgres_fdw, mysql_fdw, oracle_fdw
Compatibility: orafce, babelfishpg_tds
Data: pglogical, wal2json, decoderbufs

For details, please refer to Extension Catalog.

2 - Supabase

How to self-host Supabase with Pigsty, deploy an open-source Firebase alternative with a complete backend stack in one click.

Supabase — Build in a weekend, Scale to millions

Supabase is an open-source Firebase alternative that wraps PostgreSQL and provides authentication, out-of-the-box APIs, edge functions, real-time subscriptions, object storage, and vector embedding capabilities. This is a low-code all-in-one backend platform that lets you skip most backend development work, requiring only database design and frontend knowledge to quickly ship products!

Supabase’s motto is: “Build in a weekend, Scale to millions”. Indeed, Supabase is extremely cost-effective at small to micro scales (4c8g), like a cyber bodhisattva. — But when you really scale to millions of users — you should seriously consider self-hosting Supabase — whether for functionality, performance, or cost considerations.

Pigsty provides you with a complete one-click self-hosting solution for Supabase. Self-hosted Supabase enjoys full PostgreSQL monitoring, IaC, PITR, and high availability, and compared to Supabase cloud services, it provides up to 440 out-of-the-box PostgreSQL extensions and can more fully utilize the performance and cost advantages of modern hardware.

For the complete self-hosting tutorial, please refer to: Supabase Self-Hosting Manual

Quick Start

Pigsty’s default supa.yml configuration template defines a single-node Supabase.

First, use Pigsty’s standard installation process to install the MinIO and PostgreSQL instances required for Supabase:

 curl -fsSL https://repo.pigsty.io/get | bash
./bootstrap          # environment check, install dependencies
./configure -c supa  # Important: please modify passwords and other key information in the configuration file!
./deploy.yml         # install Pigsty, deploy PGSQL and MINIO!

Before deploying Supabase, please modify the Supabase parameters in the pigsty.yml configuration file according to your actual situation (mainly passwords!)

Then, run supabase.yml to complete the remaining work and deploy Supabase containers

./supabase.yml       # install Docker and deploy stateless Supabase components!

For users in China, please configure appropriate Docker mirror sites or proxy servers to bypass GFW to pull DockerHub images. For professional subscriptions, we provide the ability to offline install Pigsty and Supabase without internet access.

Pigsty exposes web services through Nginx on the admin node/INFRA node by default. You can add DNS resolution for supa.pigsty pointing to this node locally, then access https://supa.pigsty through a browser to enter the Supabase Studio management interface.

Default username and password: supabase / pigsty

Architecture Overview

Pigsty uses the Docker Compose template provided by Supabase as a blueprint, extracting the stateless components to be handled by Docker Compose. The stateful database and object storage containers are replaced with external PostgreSQL clusters and MinIO services managed by Pigsty.

Supabase: Self-Hosting with Docker

After transformation, Supabase itself is stateless, so you can run, stop, or even run multiple stateless Supabase containers simultaneously on the same PGSQL/MINIO for scaling.

Pigsty uses a single-node PostgreSQL instance on the local machine as Supabase’s core backend database by default. For serious production deployments, we recommend using Pigsty to deploy a PG high-availability cluster with at least three nodes. Or at least use external object storage as a PITR backup repository for failover.

Pigsty uses the SNSD MinIO service on the local machine as file storage by default. For serious production environment deployments, you can use external S3-compatible object storage services, or use other multi-node multi-drive MinIO clusters independently deployed by Pigsty.

Configuration Details

When self-hosting Supabase, the directory app/supabase containing resources required for Docker Compose will be copied entirely to the target node (default supabase group) at /opt/supabase, and deployed in the background using docker compose up -d.

All configuration parameters are defined in the .env file and docker-compose.yml template. But you usually don’t need to modify these two templates directly. You can specify parameters in .env in supa_config, and these configurations will automatically override or append to the final /opt/supabase/.env core configuration file.

The most critical parameters here are jwt_secret, and the corresponding anon_key and service_role_key. For serious production use, please be sure to refer to the instructions and tools in the Supabase Self-Hosting Manual for settings. If you want to provide services using a domain name, you can specify your domain name in site_url, api_external_url, and supabase_public_url.

Pigsty uses local MinIO by default. If you want to use S3 or MinIO as file storage, you need to configure parameters such as s3_bucket, s3_endpoint, s3_access_key, s3_secret_key.

Generally speaking, you also need to use an external SMTP service to send emails. Email services are not recommended for self-hosting, please consider using mature third-party services such as Mailchimp, Aliyun Mail Push, etc.

For users in mainland China, we recommend you configure docker_registry_mirrors mirror sites, or use proxy_env to specify available proxy servers to bypass GFW, otherwise pulling images from DockerHub may fail or be extremely slow!

# launch supabase stateless part with docker compose:
# ./supabase.yml
supabase:
  hosts:
    10.10.10.10: { supa_seq: 1 }  # instance id
  vars:
    supa_cluster: supa            # cluster name
    docker_enabled: true          # enable docker

    # use these to pull docker images via proxy and mirror registries
    #docker_registry_mirrors: ['https://docker.xxxxx.io']
    #proxy_env:   # add [OPTIONAL] proxy env to /etc/docker/daemon.json configuration file
    #  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"
    #  #all_proxy: http://user:pass@host:port

    # these configuration entries will OVERWRITE or APPEND to /opt/supabase/.env file (src template: app/supabase/.env)
    # check https://github.com/pgsty/pigsty/blob/main/app/supabase/.env for default values
    supa_config:

      # IMPORTANT: CHANGE JWT_SECRET AND REGENERATE CREDENTIAL ACCORDING!!!!!!!!!!!
      # https://supabase.com/docs/guides/self-hosting/docker#securing-your-services
      jwt_secret: your-super-secret-jwt-token-with-at-least-32-characters-long
      anon_key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJhbm9uIiwKICAgICJpc3MiOiAic3VwYWJhc2UtZGVtbyIsCiAgICAiaWF0IjogMTY0MTc2OTIwMCwKICAgICJleHAiOiAxNzk5NTM1NjAwCn0.dc_X5iR_VP_qT0zsiyj_I_OZ2T9FtRU2BBNWN8Bu4GE
      service_role_key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJzZXJ2aWNlX3JvbGUiLAogICAgImlzcyI6ICJzdXBhYmFzZS1kZW1vIiwKICAgICJpYXQiOiAxNjQxNzY5MjAwLAogICAgImV4cCI6IDE3OTk1MzU2MDAKfQ.DaYlNEoUrrEn2Ig7tqibS-PHK5vgusbcbo7X36XVt4Q
      dashboard_username: supabase
      dashboard_password: pigsty

      # postgres connection string (use the correct ip and port)
      postgres_host: 10.10.10.10
      postgres_port: 5436             # access via the 'default' service, which always route to the primary postgres
      postgres_db: postgres
      postgres_password: DBUser.Supa  # password for supabase_admin and multiple supabase users

      # expose supabase via domain name
      site_url: http://supa.pigsty
      api_external_url: http://supa.pigsty
      supabase_public_url: http://supa.pigsty

      # if using s3/minio as file storage
      s3_bucket: supa
      s3_endpoint: https://sss.pigsty:9000
      s3_access_key: supabase
      s3_secret_key: S3User.Supabase
      s3_force_path_style: true
      s3_protocol: https
      s3_region: stub
      minio_domain_ip: 10.10.10.10  # sss.pigsty domain name will resolve to this ip statically

      # if using SMTP (optional)
      #smtp_admin_email: admin@example.com
      #smtp_host: supabase-mail
      #smtp_port: 2500
      #smtp_user: fake_mail_user
      #smtp_pass: fake_mail_password
      #smtp_sender_name: fake_sender
      #enable_anonymous_users: false

3 - Percona

Percona Postgres distribution with TDE transparent encryption support

Percona Postgres is a patched Postgres kernel with pg_tde (Transparent Data Encryption) extension.

It’s compatible with PostgreSQL 18.1 and available on all Pigsty-supported platforms.

Performance Test for Percona Transparent Data Encryption (TDE)

Quick Start

Use Pigsty’s standard installation process with the pgtde configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c pgtde     # Use percona postgres kernel
./deploy.yml             # Set up everything with pigsty

Configuration

The following parameters need to be adjusted to deploy a Percona cluster:

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - { name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin   ] ,comment: pgsql admin user }
      - { name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer  }
    pg_databases:
      - name: meta
        baseline: cmdb.sql
        comment: pigsty tde database
        schemas: [pigsty]
        extensions: [ vector, postgis, pg_tde ,pgaudit, { name: pg_stat_monitor, schema: monitor } ]
    pg_hba_rules:
      - { user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes' }
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # Percona PostgreSQL TDE specific settings
    pg_packages: [ percona-main, pgsql-common ]  # Install percona postgres packages
    pg_libs: 'pg_tde, pgaudit, pg_stat_statements, pg_stat_monitor, auto_explain'

Extensions

Percona provides 80 available extensions, including pg_tde, pgvector, postgis, pgaudit, set_user, pg_stat_monitor, and other useful third-party extensions.

Extension	Version	Description
pg_tde	2.1	Percona transparent data encryption access method
vector	0.8.1	Vector data type and ivfflat and hnsw access methods
postgis	3.5.4	PostGIS geometry and geography types and functions
pgaudit	18.0	Provides auditing functionality
pg_stat_monitor	2.3	PostgreSQL query performance monitoring tool
set_user	4.2.0	Similar to SET ROLE but with additional logging
pg_repack	1.5.3	Reorganize tables in PostgreSQL databases with minimal locks
hstore	1.8	Data type for storing sets of (key, value) pairs
ltree	1.3	Data type for hierarchical tree-like structures
pg_trgm	1.6	Text similarity measurement and index searching based on trigrams

For the complete list of 80 extensions, please refer to the Percona Postgres official documentation.

Key Features

Transparent Data Encryption: Provides data-at-rest encryption using the pg_tde extension
PostgreSQL 18 Compatible: Based on the latest PostgreSQL 18 version
Enterprise Extensions: Includes enterprise-grade features like pgaudit, pg_stat_monitor
Complete Ecosystem: Supports popular extensions like pgvector, PostGIS

Note: Currently in stable stage - thoroughly evaluate before production use.

4 - OpenHalo

MySQL compatible Postgres 14 fork

OpenHalo is an open-source PostgreSQL kernel that provides MySQL wire protocol compatibility.

OpenHalo is based on PostgreSQL 14.10 kernel version and provides wire protocol compatibility with MySQL 5.7.32-log / 8.0 versions.

Pigsty provides deployment support for OpenHalo on all supported Linux platforms.

Quick Start

Use Pigsty’s standard installation process with the mysql configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c mysql    # Use MySQL (openHalo) configuration template
./deploy.yml            # Install, for production deployment please modify passwords in pigsty.yml first

For production deployment, ensure you modify the password parameters in the pigsty.yml configuration file before running the install playbook.

Configuration

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - {name: postgres, extensions: [aux_mysql]} # mysql compatible database
      - {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # OpenHalo specific settings
    pg_mode: mysql                    # HaloDB's MySQL compatibility mode
    pg_version: 14                    # Current HaloDB compatible PG major version 14
    pg_packages: [ openhalodb, pgsql-common ]  # Install openhalodb instead of postgresql kernel

Usage

When accessing MySQL, the actual connection uses the postgres database. Please note that the concept of “database” in MySQL actually corresponds to “Schema” in PostgreSQL. Therefore, use mysql actually uses the mysql Schema within the postgres database.

The username and password for MySQL are the same as in PostgreSQL. You can manage users and permissions using standard PostgreSQL methods.

Client Access

OpenHalo provides MySQL wire protocol compatibility, listening on port 3306 by default, allowing MySQL clients and drivers to connect directly.

Pigsty’s conf/mysql configuration installs the mysql client tool by default.

You can access MySQL using the following command:

mysql -h 127.0.0.1 -u dbuser_dba

Currently, OpenHalo officially ensures Navicat can properly access this MySQL port, but Intellij IDEA’s DataGrip access will cause errors.

Modification Notes

The OpenHalo kernel installed by Pigsty is based on the HaloTech-Co-Ltd/openHalo kernel with minor modifications:

Changed the default database name from halo0root back to postgres
Removed the 1.0. prefix from the default version number, restoring it to 14.10
Modified the default configuration file to enable MySQL compatibility and listen on port 3306 by default

Please note that Pigsty does not provide any warranty for using the OpenHalo kernel. Any issues or requirements encountered when using this kernel should be addressed with the original vendor.

Warning: Currently experimental - thoroughly evaluate before production use.

5 - OrioleDB

Next-generation OLTP engine for PostgreSQL

OrioleDB is a PostgreSQL storage engine extension that claims to provide 4x OLTP performance, no xid wraparound and table bloat issues, and “cloud-native” (data stored in S3) capabilities.

OrioleDB’s latest version is based on a patched PostgreSQL 17.0 and an additional extension

You can run OrioleDB as an RDS using Pigsty. It’s compatible with PG 17 and available on all supported Linux platforms. The latest version is beta12, based on PG 17_11 patch.

Quick Start

Follow Pigsty’s standard installation process using the oriole configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c oriole    # Use OrioleDB configuration template
./deploy.yml             # Install Pigsty with OrioleDB

For production deployment, ensure you modify the password parameters in the pigsty.yml configuration before running the install playbook.

Configuration

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty], extensions: [orioledb]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # OrioleDB specific settings
    pg_mode: oriole                                         # oriole compatibility mode
    pg_packages: [ orioledb, pgsql-common ]                 # Install OrioleDB kernel
    pg_libs: 'orioledb, pg_stat_statements, auto_explain'   # Load OrioleDB extension

Usage

To use OrioleDB, you need to install the orioledb_17 and oriolepg_17 packages (currently only RPM versions are available).

Initialize TPC-B-like tables with pgbench using 100 warehouses:

pgbench -is 100 meta
pgbench -nv -P1 -c10 -S -T1000 meta
pgbench -nv -P1 -c50 -S -T1000 meta
pgbench -nv -P1 -c10    -T1000 meta
pgbench -nv -P1 -c50    -T1000 meta

Next, you can rebuild these tables using the orioledb storage engine and observe the performance difference:

-- Create OrioleDB tables
CREATE TABLE pgbench_accounts_o (LIKE pgbench_accounts INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_branches_o (LIKE pgbench_branches INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_history_o (LIKE pgbench_history INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_tellers_o (LIKE pgbench_tellers INCLUDING ALL) USING orioledb;

-- Copy data from regular tables to OrioleDB tables
INSERT INTO pgbench_accounts_o SELECT * FROM pgbench_accounts;
INSERT INTO pgbench_branches_o SELECT * FROM pgbench_branches;
INSERT INTO pgbench_history_o SELECT  * FROM pgbench_history;
INSERT INTO pgbench_tellers_o SELECT * FROM pgbench_tellers;

-- Drop original tables and rename OrioleDB tables
DROP TABLE pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers;
ALTER TABLE pgbench_accounts_o RENAME TO pgbench_accounts;
ALTER TABLE pgbench_branches_o RENAME TO pgbench_branches;
ALTER TABLE pgbench_history_o RENAME TO pgbench_history;
ALTER TABLE pgbench_tellers_o RENAME TO pgbench_tellers;

Key Features

No XID Wraparound: Eliminates transaction ID wraparound maintenance
No Table Bloat: Advanced storage management prevents table bloat
Cloud Storage: Native support for S3-compatible object storage
OLTP Optimized: Designed for transactional workloads
Improved Performance: Better space utilization and query performance

Note: Currently in Beta stage - thoroughly evaluate before production use.

6 - Citus

Deploy native high-availability Citus horizontally sharded clusters with Pigsty, seamlessly scaling PostgreSQL across multiple shards and accelerating OLTP/OLAP queries.

Pigsty natively supports Citus. This is a distributed horizontal scaling extension based on the native PostgreSQL kernel.

Installation

Citus is a PostgreSQL extension plugin that can be installed and enabled on a native PostgreSQL cluster following the standard plugin installation process.

./pgsql.yml -t pg_extension -e '{"pg_extensions":["citus"]}'

Configuration

To define a citus cluster, you need to specify the following parameters:

pg_mode must be set to citus instead of the default pgsql
You must define the shard name pg_shard and shard number pg_group on each shard cluster
You must define pg_primary_db to specify the database managed by Patroni
If you want to use postgres from pg_dbsu instead of the default pg_admin_username to execute admin commands, then pg_dbsu_password must be set to a non-empty plaintext password

Additionally, you need extra hba rules to allow SSL access from localhost and other data nodes.

You can define each Citus cluster as a separate group, like standard PostgreSQL clusters, as shown in conf/dbms/citus.yml:

all:
  children:
    pg-citus0: # citus shard 0
      hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus0 , pg_group: 0 }
    pg-citus1: # citus shard 1
      hosts: { 10.10.10.11: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus1 , pg_group: 1 }
    pg-citus2: # citus shard 2
      hosts: { 10.10.10.12: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus2 , pg_group: 2 }
    pg-citus3: # citus shard 3
      hosts:
        10.10.10.13: { pg_seq: 1, pg_role: primary }
        10.10.10.14: { pg_seq: 2, pg_role: replica }
      vars: { pg_cluster: pg-citus3 , pg_group: 3 }
  vars:                               # Global parameters for all Citus clusters
    pg_mode: citus                    # pgsql cluster mode must be set to: citus
    pg_shard: pg-citus                # citus horizontal shard name: pg-citus
    pg_primary_db: meta               # citus database name: meta
    pg_dbsu_password: DBUser.Postgres # If using dbsu, you need to configure a password for it
    pg_users: [ { name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: meta ,extensions: [ { name: citus }, { name: postgis }, { name: timescaledb } ] } ]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32 ,auth: ssl ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra        ,auth: ssl ,title: 'all user ssl access from intranet'  }

You can also specify identity parameters for all Citus cluster members within a single group, as shown in prod.yml:

#==========================================================#
# pg-citus: 10 node citus cluster (5 x primary-replica pair)
#==========================================================#
pg-citus: # citus group
  hosts:
    10.10.10.50: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.51: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.52: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.53: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.54: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.55: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.56: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.57: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.58: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.59: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 1, pg_role: replica }
  vars:
    pg_mode: citus                    # pgsql cluster mode: citus
    pg_shard: pg-citus                # citus shard name: pg-citus
    pg_primary_db: test               # primary database used by citus
    pg_dbsu_password: DBUser.Postgres # all dbsu password access for citus cluster
    pg_vip_enabled: true
    pg_vip_interface: eth1
    pg_extensions: [ 'citus postgis timescaledb pgvector' ]
    pg_libs: 'citus, timescaledb, pg_stat_statements, auto_explain' # citus will be added by patroni automatically
    pg_users: [ { name: test ,password: test ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: test ,owner: test ,extensions: [ { name: citus }, { name: postgis } ] } ]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 10.10.10.0/24 ,auth: trust ,title: 'trust citus cluster members'        }
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32  ,auth: ssl   ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra         ,auth: ssl   ,title: 'all user ssl access from intranet'  }

Usage

You can access any node just like accessing a regular cluster:

pgbench -i postgres://test:test@pg-citus0/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus0/test

By default, changes you make to one Shard only occur on that cluster and are not synchronized to other Shards.

If you want to distribute writes across all Shards, you can use the API functions provided by Citus to mark tables as:

Distributed tables (automatic partitioning, requires specifying partition key)
Reference tables (full replication: does not require specifying partition key)

Starting from Citus 11.2, any Citus database node can play the role of coordinator, meaning any primary node can write:

psql -h pg-citus0 -d test -c "SELECT create_distributed_table('pgbench_accounts', 'aid'); SELECT truncate_local_data_after_distributing_table('public.pgbench_accounts');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_branches')         ; SELECT truncate_local_data_after_distributing_table('public.pgbench_branches');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_history')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_history');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_tellers')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_tellers');"

After distributing the tables, you can also access them on other nodes:

psql -h pg-citus1 -d test -c '\dt+'

For example, a full table scan will show that the execution plan has become a distributed plan:

vagrant@meta-1:~$ psql -h pg-citus3 -d test -c 'explain select * from pgbench_accounts'
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=352)
   Task Count: 32
   Tasks Shown: One of 32
   ->  Task
         Node: host=10.10.10.52 port=5432 dbname=test
         ->  Seq Scan on pgbench_accounts_102008 pgbench_accounts  (cost=0.00..81.66 rows=3066 width=97)
(6 rows)

You can initiate writes from several different primary nodes:

pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus1/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus2/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus3/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus4/test

When a node fails, the native high availability support provided by Patroni will promote the standby node and automatically take over.

test=# select * from  pg_dist_node;
 nodeid | groupid |  nodename   | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
      1 |       0 | 10.10.10.51 |     5432 | default  | t           | t        | primary  | default     | t              | f
      2 |       2 | 10.10.10.54 |     5432 | default  | t           | t        | primary  | default     | t              | t
      5 |       1 | 10.10.10.52 |     5432 | default  | t           | t        | primary  | default     | t              | t
      3 |       4 | 10.10.10.58 |     5432 | default  | t           | t        | primary  | default     | t              | t
      4 |       3 | 10.10.10.56 |     5432 | default  | t           | t        | primary  | default     | t              | t

7 - Babelfish

Create Microsoft SQL Server compatible PostgreSQL clusters using WiltonDB and Babelfish! (Wire protocol level compatibility)

Babelfish is an MSSQL (Microsoft SQL Server) compatibility solution based on PostgreSQL, open-sourced by AWS.

Overview

Pigsty allows users to create Microsoft SQL Server compatible PostgreSQL clusters using Babelfish and WiltonDB!

Babelfish: An MSSQL (Microsoft SQL Server) compatibility extension plugin open-sourced by AWS
WiltonDB: A PostgreSQL kernel distribution focusing on integrating Babelfish

Babelfish is a PostgreSQL extension, but it only works on a slightly modified PostgreSQL kernel fork. WiltonDB provides compiled fork kernel binaries and extension binary packages on EL/Ubuntu systems.

Pigsty can replace the native PostgreSQL kernel with WiltonDB, providing an out-of-the-box MSSQL compatible cluster. Using and managing an MSSQL cluster is no different from a standard PostgreSQL 15 cluster. You can use all the features provided by Pigsty, such as high availability, backup, monitoring, etc.

WiltonDB comes with several extension plugins including Babelfish, but cannot use native PostgreSQL extension plugins.

After the MSSQL compatible cluster starts, in addition to listening on the PostgreSQL default port, it also listens on the MSSQL default port 1433, providing MSSQL services via the TDS Wire Protocol on this port. You can connect to the MSSQL service provided by Pigsty using any MSSQL client, such as SQL Server Management Studio, or using the sqlcmd command-line tool.

Installation

WiltonDB conflicts with the native PostgreSQL kernel. Only one kernel can be installed on a node. Use the following command to install the WiltonDB kernel online.

./node.yml -t node_install -e '{"node_repo_modules":"local,mssql","node_packages":["wiltondb"]}'

Please note that WiltonDB is only available on EL and Ubuntu systems. Debian support is not currently provided.

The Pigsty Professional Edition provides offline installation packages for WiltonDB, which can be installed from local software sources.

Configuration

When installing and deploying the MSSQL module, please pay special attention to the following:

WiltonDB is available on EL (7/8/9) and Ubuntu (20.04/22.04), but not available on Debian systems.
WiltonDB is currently compiled based on PostgreSQL 15, so you need to specify pg_version: 15.
On EL systems, the wiltondb binary is installed by default in the /usr/bin/ directory, while on Ubuntu systems it is installed in the /usr/lib/postgresql/15/bin/ directory, which is different from the official PostgreSQL binary placement.
In WiltonDB compatibility mode, the HBA password authentication rule needs to use md5 instead of scram-sha-256. Therefore, you need to override Pigsty’s default HBA rule set and insert the md5 authentication rule required by SQL Server before the dbrole_readonly wildcard authentication rule.
WiltonDB can only be enabled for one primary database, and you should designate a user as the Babelfish superuser, allowing Babelfish to create databases and users. The default is mssql and dbuser_mssql. If you change this, please also modify the user in files/mssql.sql.
The WiltonDB TDS wire protocol compatibility plugin babelfishpg_tds needs to be enabled in shared_preload_libraries.
After enabling the WiltonDB extension, it listens on the MSSQL default port 1433. You can override Pigsty’s default service definitions to point the primary and replica services to port 1433 instead of 5432 / 6432.

The following parameters need to be configured for the MSSQL database cluster:

#----------------------------------#
# PGSQL & MSSQL (Babelfish & Wilton)
#----------------------------------#
# PG Installation
node_repo_modules: local,node,mssql # add mssql and os upstream repos
pg_mode: mssql                      # Microsoft SQL Server Compatible Mode
pg_libs: 'babelfishpg_tds, pg_stat_statements, auto_explain' # add timescaledb to shared_preload_libraries
pg_version: 15                      # The current WiltonDB major version is 15
pg_packages:
  - wiltondb                        # install forked version of postgresql with babelfishpg support
  - patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager
pg_extensions: []                   # do not install any vanilla postgresql extensions

# PG Provision
pg_default_hba_rules:               # overwrite default HBA rules for babelfish cluster
- {user: '${dbsu}'    ,db: all         ,addr: local     ,auth: ident ,title: 'dbsu access via local os user ident'  }
- {user: '${dbsu}'    ,db: replication ,addr: local     ,auth: ident ,title: 'dbsu replication from local os ident' }
- {user: '${repl}'    ,db: replication ,addr: localhost ,auth: pwd   ,title: 'replicator replication from localhost'}
- {user: '${repl}'    ,db: replication ,addr: intra     ,auth: pwd   ,title: 'replicator replication from intranet' }
- {user: '${repl}'    ,db: postgres    ,addr: intra     ,auth: pwd   ,title: 'replicator postgres db from intranet' }
- {user: '${monitor}' ,db: all         ,addr: localhost ,auth: pwd   ,title: 'monitor from localhost with password' }
- {user: '${monitor}' ,db: all         ,addr: infra     ,auth: pwd   ,title: 'monitor from infra host with password'}
- {user: '${admin}'   ,db: all         ,addr: infra     ,auth: ssl   ,title: 'admin @ infra nodes with pwd & ssl'   }
- {user: '${admin}'   ,db: all         ,addr: world     ,auth: ssl   ,title: 'admin @ everywhere with ssl & pwd'    }
- {user: dbuser_mssql ,db: mssql       ,addr: intra     ,auth: md5   ,title: 'allow mssql dbsu intranet access'     } # <--- use md5 auth method for mssql user
- {user: '+dbrole_readonly',db: all    ,addr: localhost ,auth: pwd   ,title: 'pgbouncer read/write via local socket'}
- {user: '+dbrole_readonly',db: all    ,addr: intra     ,auth: pwd   ,title: 'read/write biz user via password'     }
- {user: '+dbrole_offline' ,db: all    ,addr: intra     ,auth: pwd   ,title: 'allow etl offline tasks from intranet'}
pg_default_services:                # route primary & replica service to mssql port 1433
- { name: primary ,port: 5433 ,dest: 1433  ,check: /primary   ,selector: "[]" }
- { name: replica ,port: 5434 ,dest: 1433  ,check: /read-only ,selector: "[]" , backup: "[? pg_role == `primary` || pg_role == `offline` ]" }
- { name: default ,port: 5436 ,dest: postgres ,check: /primary   ,selector: "[]" }
- { name: offline ,port: 5438 ,dest: postgres ,check: /replica   ,selector: "[? pg_role == `offline` || pg_offline_query ]" , backup: "[? pg_role == `replica` && !pg_offline_query]"}

You can define MSSQL business databases and business users:

#----------------------------------#
# pgsql (singleton on current node)
#----------------------------------#
# this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz users
pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary } # <---- primary instance with read-write capability
  vars:
    pg_cluster: pg-test
    pg_users:                           # create MSSQL superuser
      - {name: dbuser_mssql ,password: DBUser.MSSQL ,superuser: true, pgbouncer: true ,roles: [dbrole_admin], comment: superuser & owner for babelfish  }
    pg_primary_db: mssql                # use `mssql` as the primary sql server database
    pg_databases:
      - name: mssql
        baseline: mssql.sql             # init babelfish database & user
        extensions:
          - { name: uuid-ossp          }
          - { name: babelfishpg_common }
          - { name: babelfishpg_tsql   }
          - { name: babelfishpg_tds    }
          - { name: babelfishpg_money  }
          - { name: pg_hint_plan       }
          - { name: system_stats       }
          - { name: tds_fdw            }
        owner: dbuser_mssql
        parameters: { 'babelfishpg_tsql.migration_mode' : 'multi-db' }
        comment: babelfish cluster, a MSSQL compatible pg cluster

Access

You can use any SQL Server compatible client tool to access this database cluster.

Microsoft provides sqlcmd as the official command-line tool.

In addition, they also provide a Go version command-line tool go-sqlcmd.

Install go-sqlcmd:

curl -LO https://github.com/microsoft/go-sqlcmd/releases/download/v1.4.0/sqlcmd-v1.4.0-linux-amd64.tar.bz2
tar xjvf sqlcmd-v1.4.0-linux-amd64.tar.bz2
sudo mv sqlcmd* /usr/bin/

Quick start with go-sqlcmd:

$ sqlcmd -S 10.10.10.10,1433 -U dbuser_mssql -P DBUser.MSSQL
1> select @@version
2> go
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Babelfish for PostgreSQL with SQL Server Compatibility - 12.0.2000.8
Oct 22 2023 17:48:32
Copyright (c) Amazon Web Services
PostgreSQL 15.4 (EL 1:15.4.wiltondb3.3_2-2.el8) on x86_64-redhat-linux-gnu (Babelfish 3.3.0)

(1 row affected)

Using the service mechanism provided by Pigsty, you can use ports 5433 / 5434 to always connect to port 1433 on the primary/replica.

# Access port 5433 on any cluster member, pointing to port 1433 MSSQL port on the primary
sqlcmd -S 10.10.10.11,5433 -U dbuser_mssql -P DBUser.MSSQL

# Access port 5434 on any cluster member, pointing to port 1433 MSSQL port on any readable replica
sqlcmd -S 10.10.10.11,5434 -U dbuser_mssql -P DBUser.MSSQL

Extensions

Most of the PGSQL module’s extension plugins (non-pure SQL class) cannot be directly used on the WiltonDB kernel of the MSSQL module and need to be recompiled.

Currently, WiltonDB comes with the following extension plugins. In addition to PostgreSQL Contrib extensions and the four BabelfishPG core extensions, it also provides three third-party extensions: pg_hint_plan, tds_fdw, and system_stats.

Extension Name	Version	Description
dblink	1.2	connect to other PostgreSQL databases from within a database
adminpack	2.1	administrative functions for PostgreSQL
dict_int	1.0	text search dictionary template for integers
intagg	1.1	integer aggregator and enumerator (obsolete)
dict_xsyn	1.0	text search dictionary template for extended synonym processing
amcheck	1.3	functions for verifying relation integrity
autoinc	1.0	functions for autoincrementing fields
bloom	1.0	bloom access method - signature file based index
fuzzystrmatch	1.1	determine similarities and distance between strings
intarray	1.5	functions, operators, and index support for 1-D arrays of integers
btree_gin	1.3	support for indexing common datatypes in GIN
btree_gist	1.7	support for indexing common datatypes in GiST
hstore	1.8	data type for storing sets of (key, value) pairs
hstore_plperl	1.0	transform between hstore and plperl
isn	1.2	data types for international product numbering standards
hstore_plperlu	1.0	transform between hstore and plperlu
jsonb_plperl	1.0	transform between jsonb and plperl
citext	1.6	data type for case-insensitive character strings
jsonb_plperlu	1.0	transform between jsonb and plperlu
jsonb_plpython3u	1.0	transform between jsonb and plpython3u
cube	1.5	data type for multidimensional cubes
hstore_plpython3u	1.0	transform between hstore and plpython3u
earthdistance	1.1	calculate great-circle distances on the surface of the Earth
lo	1.1	Large Object maintenance
file_fdw	1.0	foreign-data wrapper for flat file access
insert_username	1.0	functions for tracking who changed a table
ltree	1.2	data type for hierarchical tree-like structures
ltree_plpython3u	1.0	transform between ltree and plpython3u
pg_walinspect	1.0	functions to inspect contents of PostgreSQL Write-Ahead Log
moddatetime	1.0	functions for tracking last modification time
old_snapshot	1.0	utilities in support of old_snapshot_threshold
pgcrypto	1.3	cryptographic functions
pgrowlocks	1.2	show row-level locking information
pageinspect	1.11	inspect the contents of database pages at a low level
pg_surgery	1.0	extension to perform surgery on a damaged relation
seg	1.4	data type for representing line segments or floating-point intervals
pgstattuple	1.5	show tuple-level statistics
pg_buffercache	1.3	examine the shared buffer cache
pg_freespacemap	1.2	examine the free space map (FSM)
postgres_fdw	1.1	foreign-data wrapper for remote PostgreSQL servers
pg_prewarm	1.2	prewarm relation data
tcn	1.0	Triggered change notifications
pg_trgm	1.6	text similarity measurement and index searching based on trigrams
xml2	1.1	XPath querying and XSLT
refint	1.0	functions for implementing referential integrity (obsolete)
pg_visibility	1.2	examine the visibility map (VM) and page-level visibility info
pg_stat_statements	1.10	track planning and execution statistics of all SQL statements executed
sslinfo	1.2	information about SSL certificates
tablefunc	1.0	functions that manipulate whole tables, including crosstab
tsm_system_rows	1.0	TABLESAMPLE method which accepts number of rows as a limit
tsm_system_time	1.0	TABLESAMPLE method which accepts time in milliseconds as a limit
unaccent	1.1	text search dictionary that removes accents
uuid-ossp	1.1	generate universally unique identifiers (UUIDs)
plpgsql	1.0	PL/pgSQL procedural language
babelfishpg_money	1.1.0	babelfishpg_money
system_stats	2.0	EnterpriseDB system statistics for PostgreSQL
tds_fdw	2.0.3	Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
babelfishpg_common	3.3.3	Transact SQL Datatype Support
babelfishpg_tds	1.0.0	TDS protocol extension
pg_hint_plan	1.5.1
babelfishpg_tsql	3.3.1	Transact SQL compatibility

The Pigsty Professional Edition provides offline installation capabilities for MSSQL compatible modules
Pigsty Professional Edition provides optional MSSQL compatible kernel extension porting and customization services, which can port extensions available in the PGSQL module to MSSQL clusters.

8 - IvorySQL

Use HighGo’s open-source IvorySQL kernel to achieve Oracle syntax/PLSQL compatibility based on PostgreSQL clusters.

IvorySQL is an open-source PostgreSQL kernel fork that aims to provide “Oracle compatibility” based on PG.

Overview

The IvorySQL kernel is supported in the Pigsty open-source version. Your server needs internet access to download relevant packages directly from IvorySQL’s official repository.

Please note that adding IvorySQL directly to Pigsty’s default software repository will affect the installation of the native PostgreSQL kernel. Pigsty Professional Edition provides offline installation solutions including the IvorySQL kernel.

The current latest version of IvorySQL is 5.0, corresponding to PostgreSQL version 18. Please note that IvorySQL is currently only available on EL8/EL9.

The last IvorySQL version supporting EL7 was 3.3, corresponding to PostgreSQL 16.3; the last version based on PostgreSQL 17 is IvorySQL 4.4

Installation

If your environment has internet access, you can add the IvorySQL repository directly to the node using the following method, then execute the PGSQL playbook for installation:

./node.yml -t node_repo -e '{"node_repo_modules":"local,node,pgsql,ivory"}'

Configuration

The following parameters need to be configured for IvorySQL database clusters:

#----------------------------------#
# Ivory SQL Configuration
#----------------------------------#
node_repo_modules: local,node,pgsql,ivory  # add ivorysql upstream repo
pg_mode: ivory                    # IvorySQL Oracle Compatible Mode
pg_packages: [ 'ivorysql patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager' ]
pg_libs: 'liboracle_parser, pg_stat_statements, auto_explain'
pg_extensions: [ ]                # do not install any vanilla postgresql extensions

When using Oracle compatibility mode, you need to dynamically load the liboracle_parser extension plugin.

Client Access

IvorySQL is equivalent to PostgreSQL 16, and any client tool compatible with the PostgreSQL wire protocol can access IvorySQL clusters.

Extension List

Most of the PGSQL module’s extensions (non-pure SQL types) cannot be used directly on the IvorySQL kernel. If you need to use them, please recompile and install from source for the new kernel.

Currently, the IvorySQL kernel comes with the following 101 extension plugins.

(The extension table remains unchanged as it’s already in English)

Please note that Pigsty does not assume any warranty responsibility for using the IvorySQL kernel. Any issues or requirements encountered when using this kernel should be addressed with the original vendor.

9 - PolarDB PG

Using Alibaba Cloud’s open-source PolarDB for PostgreSQL kernel to provide domestic innovation qualification support, with Oracle RAC-like user experience.

Overview

Pigsty allows you to create PostgreSQL clusters with “domestic innovation qualification” credentials using PolarDB!

PolarDB for PostgreSQL is essentially equivalent to PostgreSQL 15. Any client tool compatible with the PostgreSQL wire protocol can access PolarDB clusters.

Pigsty’s PGSQL repository provides PolarDB PG open-source installation packages for EL7 / EL8, but they are not downloaded to the local software repository during Pigsty installation.

If you need offline installation support for PolarDB PG, please consider our professional subscription service

Installation

If your environment has internet access, you can add the Pigsty PGSQL and dependency repositories to the node using the following method:

node_repo_modules: local,node,pgsql

Then in pg_packages, replace the native postgresql package with polardb.

Configuration

The following parameters need special configuration for PolarDB database clusters:

#----------------------------------#
# PGSQL & PolarDB
#----------------------------------#
pg_version: 15
pg_packages: [ 'polardb patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager' ]
pg_extensions: [ ]                # do not install any vanilla postgresql extensions
pg_mode: polar                    # PolarDB Compatible Mode
pg_default_roles:                 # default roles and users in postgres cluster
  - { name: dbrole_readonly  ,login: false ,comment: role for global read-only access     }
  - { name: dbrole_offline   ,login: false ,comment: role for restricted read-only access }
  - { name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment: role for global read-write access }
  - { name: dbrole_admin     ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment: role for object creation }
  - { name: postgres     ,superuser: true  ,comment: system superuser }
  - { name: replicator   ,superuser: true  ,replication: true ,roles: [pg_monitor, dbrole_readonly] ,comment: system replicator } # <- superuser is required for replication
  - { name: dbuser_dba   ,superuser: true  ,roles: [dbrole_admin]  ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 ,comment: pgsql admin user }
  - { name: dbuser_monitor ,roles: [pg_monitor] ,pgbouncer: true ,parameters: {log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment: pgsql monitor user }

Note particularly that PolarDB PG requires the replicator replication user to be a Superuser, unlike native PG.

Extension List

Most PGSQL module extension plugins (non-pure SQL types) cannot be used directly on the PolarDB kernel. If needed, please recompile and install from source for the new kernel.

Currently, the PolarDB kernel comes with the following 61 extension plugins. Apart from Contrib extensions, the additional extensions provided include:

polar_csn 1.0 : polar_csn
polar_monitor 1.2 : examine the polardb information
polar_monitor_preload 1.1 : examine the polardb information
polar_parameter_check 1.0 : kernel extension for parameter validation
polar_px 1.0 : Parallel Execution extension
polar_stat_env 1.0 : env stat functions for PolarDB
polar_stat_sql 1.3 : Kernel statistics gathering, and sql plan nodes information gathering
polar_tde_utils 1.0 : Internal extension for TDE
polar_vfs 1.0 : polar_vfs
polar_worker 1.0 : polar_worker
timetravel 1.0 : functions for implementing time travel
vector 0.5.1 : vector data type and ivfflat and hnsw access methods
smlar 1.0 : compute similary of any one-dimensional arrays

Complete list of available PolarDB plugins:

name	version	comment
hstore_plpython2u	1.0	transform between hstore and plpython2u
dict_int	1.0	text search dictionary template for integers
adminpack	2.0	administrative functions for PostgreSQL
hstore_plpython3u	1.0	transform between hstore and plpython3u
amcheck	1.1	functions for verifying relation integrity
hstore_plpythonu	1.0	transform between hstore and plpythonu
autoinc	1.0	functions for autoincrementing fields
insert_username	1.0	functions for tracking who changed a table
bloom	1.0	bloom access method - signature file based index
file_fdw	1.0	foreign-data wrapper for flat file access
dblink	1.2	connect to other PostgreSQL databases from within a database
btree_gin	1.3	support for indexing common datatypes in GIN
fuzzystrmatch	1.1	determine similarities and distance between strings
lo	1.1	Large Object maintenance
intagg	1.1	integer aggregator and enumerator (obsolete)
btree_gist	1.5	support for indexing common datatypes in GiST
hstore	1.5	data type for storing sets of (key, value) pairs
intarray	1.2	functions, operators, and index support for 1-D arrays of integers
citext	1.5	data type for case-insensitive character strings
cube	1.4	data type for multidimensional cubes
hstore_plperl	1.0	transform between hstore and plperl
isn	1.2	data types for international product numbering standards
jsonb_plperl	1.0	transform between jsonb and plperl
dict_xsyn	1.0	text search dictionary template for extended synonym processing
hstore_plperlu	1.0	transform between hstore and plperlu
earthdistance	1.1	calculate great-circle distances on the surface of the Earth
pg_prewarm	1.2	prewarm relation data
jsonb_plperlu	1.0	transform between jsonb and plperlu
pg_stat_statements	1.6	track execution statistics of all SQL statements executed
jsonb_plpython2u	1.0	transform between jsonb and plpython2u
jsonb_plpython3u	1.0	transform between jsonb and plpython3u
jsonb_plpythonu	1.0	transform between jsonb and plpythonu
pg_trgm	1.4	text similarity measurement and index searching based on trigrams
pgstattuple	1.5	show tuple-level statistics
ltree	1.1	data type for hierarchical tree-like structures
ltree_plpython2u	1.0	transform between ltree and plpython2u
pg_visibility	1.2	examine the visibility map (VM) and page-level visibility info
ltree_plpython3u	1.0	transform between ltree and plpython3u
ltree_plpythonu	1.0	transform between ltree and plpythonu
seg	1.3	data type for representing line segments or floating-point intervals
moddatetime	1.0	functions for tracking last modification time
pgcrypto	1.3	cryptographic functions
pgrowlocks	1.2	show row-level locking information
pageinspect	1.7	inspect the contents of database pages at a low level
pg_buffercache	1.3	examine the shared buffer cache
pg_freespacemap	1.2	examine the free space map (FSM)
tcn	1.0	Triggered change notifications
plperl	1.0	PL/Perl procedural language
uuid-ossp	1.1	generate universally unique identifiers (UUIDs)
plperlu	1.0	PL/PerlU untrusted procedural language
refint	1.0	functions for implementing referential integrity (obsolete)
xml2	1.1	XPath querying and XSLT
plpgsql	1.0	PL/pgSQL procedural language
plpython3u	1.0	PL/Python3U untrusted procedural language
pltcl	1.0	PL/Tcl procedural language
pltclu	1.0	PL/TclU untrusted procedural language
polar_csn	1.0	polar_csn
sslinfo	1.2	information about SSL certificates
polar_monitor	1.2	examine the polardb information
polar_monitor_preload	1.1	examine the polardb information
polar_parameter_check	1.0	kernel extension for parameter validation
polar_px	1.0	Parallel Execution extension
tablefunc	1.0	functions that manipulate whole tables, including crosstab
polar_stat_env	1.0	env stat functions for PolarDB
smlar	1.0	compute similary of any one-dimensional arrays
timetravel	1.0	functions for implementing time travel
tsm_system_rows	1.0	TABLESAMPLE method which accepts number of rows as a limit
polar_stat_sql	1.3	Kernel statistics gathering, and sql plan nodes information gathering
tsm_system_time	1.0	TABLESAMPLE method which accepts time in milliseconds as a limit
polar_tde_utils	1.0	Internal extension for TDE
polar_vfs	1.0	polar_vfs
polar_worker	1.0	polar_worker
unaccent	1.1	text search dictionary that removes accents
postgres_fdw	1.0	foreign-data wrapper for remote PostgreSQL servers

Pigsty Professional Edition provides PolarDB offline installation support, extension plugin compilation support, and monitoring and management support specifically adapted for PolarDB clusters.
Pigsty collaborates with the Alibaba Cloud kernel team and can provide paid kernel backup support services.

10 - PolarDB Oracle

Using Alibaba Cloud’s commercial PolarDB for Oracle kernel (closed source, PG14, only available in special enterprise edition customization)

Pigsty allows you to create PolarDB for Oracle clusters with “domestic innovation qualification” credentials using PolarDB!

According to the Security and Reliability Evaluation Results Announcement (No. 1, 2023), Appendix 3, Centralized Database. PolarDB v2.0 is an autonomous, controllable, secure, and reliable domestic innovation database.

PolarDB for Oracle is an Oracle-compatible version developed based on PolarDB for PostgreSQL. Both share the same kernel, distinguished by the --compatibility-mode parameter.

We collaborate with the Alibaba Cloud kernel team to provide a complete database solution based on PolarDB v2.0 kernel and Pigsty v3.0 RDS. Please contact sales for inquiries, or purchase on Alibaba Cloud Marketplace.

The PolarDB for Oracle kernel is currently only available on EL systems.

Extensions

Currently, the PolarDB 2.0 (Oracle compatible) kernel comes with the following 188 extension plugins:

name	default_version	comment
cube	1.5	data type for multidimensional cubes
ip4r	2.4	NULL
adminpack	2.1	administrative functions for PostgreSQL
dict_xsyn	1.0	text search dictionary template for extended synonym processing
amcheck	1.4	functions for verifying relation integrity
autoinc	1.0	functions for autoincrementing fields
hstore	1.8	data type for storing sets of (key, value) pairs
bloom	1.0	bloom access method - signature file based index
earthdistance	1.1	calculate great-circle distances on the surface of the Earth
hstore_plperl	1.0	transform between hstore and plperl
bool_plperl	1.0	transform between bool and plperl
file_fdw	1.0	foreign-data wrapper for flat file access
bool_plperlu	1.0	transform between bool and plperlu
fuzzystrmatch	1.1	determine similarities and distance between strings
hstore_plperlu	1.0	transform between hstore and plperlu
btree_gin	1.3	support for indexing common datatypes in GIN
hstore_plpython2u	1.0	transform between hstore and plpython2u
btree_gist	1.6	support for indexing common datatypes in GiST
hll	2.17	type for storing hyperloglog data
hstore_plpython3u	1.0	transform between hstore and plpython3u
citext	1.6	data type for case-insensitive character strings
hstore_plpythonu	1.0	transform between hstore and plpythonu
hypopg	1.3.1	Hypothetical indexes for PostgreSQL
insert_username	1.0	functions for tracking who changed a table
dblink	1.2	connect to other PostgreSQL databases from within a database
decoderbufs	0.1.0	Logical decoding plugin that delivers WAL stream changes using a Protocol Buffer format
intagg	1.1	integer aggregator and enumerator (obsolete)
dict_int	1.0	text search dictionary template for integers
intarray	1.5	functions, operators, and index support for 1-D arrays of integers
isn	1.2	data types for international product numbering standards
jsonb_plperl	1.0	transform between jsonb and plperl
jsonb_plperlu	1.0	transform between jsonb and plperlu
jsonb_plpython2u	1.0	transform between jsonb and plpython2u
jsonb_plpython3u	1.0	transform between jsonb and plpython3u
jsonb_plpythonu	1.0	transform between jsonb and plpythonu
lo	1.1	Large Object maintenance
log_fdw	1.0	foreign-data wrapper for csvlog
ltree	1.2	data type for hierarchical tree-like structures
ltree_plpython2u	1.0	transform between ltree and plpython2u
ltree_plpython3u	1.0	transform between ltree and plpython3u
ltree_plpythonu	1.0	transform between ltree and plpythonu
moddatetime	1.0	functions for tracking last modification time
old_snapshot	1.0	utilities in support of old_snapshot_threshold
oracle_fdw	1.2	foreign data wrapper for Oracle access
oss_fdw	1.1	foreign-data wrapper for OSS access
pageinspect	2.1	inspect the contents of database pages at a low level
pase	0.0.1	ant ai similarity search
pg_bigm	1.2	text similarity measurement and index searching based on bigrams
pg_freespacemap	1.2	examine the free space map (FSM)
pg_hint_plan	1.4	controls execution plan with hinting phrases in comment of special form
pg_buffercache	1.5	examine the shared buffer cache
pg_prewarm	1.2	prewarm relation data
pg_repack	1.4.8-1	Reorganize tables in PostgreSQL databases with minimal locks
pg_sphere	1.0	spherical objects with useful functions, operators and index support
pg_cron	1.5	Job scheduler for PostgreSQL
pg_jieba	1.1.0	a parser for full-text search of Chinese
pg_stat_kcache	2.2.1	Kernel statistics gathering
pg_stat_statements	1.9	track planning and execution statistics of all SQL statements executed
pg_surgery	1.0	extension to perform surgery on a damaged relation
pg_trgm	1.6	text similarity measurement and index searching based on trigrams
pg_visibility	1.2	examine the visibility map (VM) and page-level visibility info
pg_wait_sampling	1.1	sampling based statistics of wait events
pgaudit	1.6.2	provides auditing functionality
pgcrypto	1.3	cryptographic functions
pgrowlocks	1.2	show row-level locking information
pgstattuple	1.5	show tuple-level statistics
pgtap	1.2.0	Unit testing for PostgreSQL
pldbgapi	1.1	server-side support for debugging PL/pgSQL functions
plperl	1.0	PL/Perl procedural language
plperlu	1.0	PL/PerlU untrusted procedural language
plpgsql	1.0	PL/pgSQL procedural language
plpython2u	1.0	PL/Python2U untrusted procedural language
plpythonu	1.0	PL/PythonU untrusted procedural language
plsql	1.0	Oracle compatible PL/SQL procedural language
pltcl	1.0	PL/Tcl procedural language
pltclu	1.0	PL/TclU untrusted procedural language
polar_bfile	1.0	The BFILE data type enables access to binary file LOBs that are stored in file systems outside Database
polar_bpe	1.0	polar_bpe
polar_builtin_cast	1.1	Internal extension for builtin casts
polar_builtin_funcs	2.0	implement polar builtin functions
polar_builtin_type	1.5	polar_builtin_type for PolarDB
polar_builtin_view	1.5	polar_builtin_view
polar_catalog	1.2	polardb pg extend catalog
polar_channel	1.0	polar_channel
polar_constraint	1.0	polar_constraint
polar_csn	1.0	polar_csn
polar_dba_views	1.0	polar_dba_views
polar_dbms_alert	1.2	implement polar_dbms_alert - supports asynchronous notification of database events.
polar_dbms_application_info	1.0	implement polar_dbms_application_info - record names of executing modules or transactions in the database.
polar_dbms_pipe	1.1	implements polar_dbms_pipe - package lets two or more sessions in the same instance communicate.
polar_dbms_aq	1.2	implement dbms_aq - provides an interface to Advanced Queuing.
polar_dbms_lob	1.3	implement dbms_lob - provides subprograms to operate on BLOBs, CLOBs, and NCLOBs.
polar_dbms_output	1.2	implement polar_dbms_output - enables you to send messages from stored procedures.
polar_dbms_lock	1.0	implement polar_dbms_lock - provides an interface to Oracle Lock Management services.
polar_dbms_aqadm	1.3	polar_dbms_aqadm - procedures to manage Advanced Queuing configuration and administration information.
polar_dbms_assert	1.0	implement polar_dbms_assert - provide an interface to validate properties of the input value.
polar_dbms_metadata	1.0	implement polar_dbms_metadata - provides a way for you to retrieve metadata from the database dictionary.
polar_dbms_random	1.0	implement polar_dbms_random - a built-in random number generator, not intended for cryptography
polar_dbms_crypto	1.1	implement dbms_crypto - provides an interface to encrypt and decrypt stored data.
polar_dbms_redact	1.0	implement polar_dbms_redact - provides an interface to mask data from queries by an application.
polar_dbms_debug	1.1	server-side support for debugging PL/SQL functions
polar_dbms_job	1.0	polar_dbms_job
polar_dbms_mview	1.1	implement polar_dbms_mview - enables to refresh materialized views.
polar_dbms_job_preload	1.0	polar_dbms_job_preload
polar_dbms_obfuscation_toolkit	1.1	implement polar_dbms_obfuscation_toolkit - enables an application to get data md5.
polar_dbms_rls	1.1	implement polar_dbms_rls - a fine-grained access control administrative built-in package
polar_multi_toast_utils	1.0	polar_multi_toast_utils
polar_dbms_session	1.2	implement polar_dbms_session - support to set preferences and security levels.
polar_odciconst	1.0	implement ODCIConst - Provide some built-in constants in Oracle.
polar_dbms_sql	1.2	implement polar_dbms_sql - provides an interface to execute dynamic SQL.
polar_osfs_toolkit	1.0	osfs library tools and functions extension
polar_dbms_stats	14.0	stabilize plans by fixing statistics
polar_monitor	1.5	monitor functions for PolarDB
polar_osfs_utils	1.0	osfs library utils extension
polar_dbms_utility	1.3	implement polar_dbms_utility - provides various utility subprograms.
polar_parameter_check	1.0	kernel extension for parameter validation
polar_dbms_xmldom	1.0	implement dbms_xmldom and dbms_xmlparser - support standard DOM interface and xml parser object
polar_parameter_manager	1.1	Extension to select parameters for manger.
polar_faults	1.0.0	simulate some database faults for end user or testing system.
polar_monitor_preload	1.1	examine the polardb information
polar_proxy_utils	1.0	Extension to provide operations about proxy.
polar_feature_utils	1.2	PolarDB feature utilization
polar_global_awr	1.0	PolarDB Global AWR Report
polar_publication	1.0	support polardb pg logical replication
polar_global_cache	1.0	polar_global_cache
polar_px	1.0	Parallel Execution extension
polar_serverless	1.0	polar serverless extension
polar_resource_manager	1.0	a background process that forcibly frees user session process memory
polar_sys_context	1.1	implement polar_sys_context - returns the value of parameter associated with the context namespace at the current instant.
polar_gpc	1.3	polar_gpc
polar_tde_utils	1.0	Internal extension for TDE
polar_gtt	1.1	polar_gtt
polar_utl_encode	1.2	implement polar_utl_encode - provides functions that encode RAW data into a standard encoded format
polar_htap	1.1	extension for PolarDB HTAP
polar_htap_db	1.0	extension for PolarDB HTAP database level operation
polar_io_stat	1.0	polar io stat in multi dimension
polar_utl_file	1.0	implement utl_file - support PL/SQL programs can read and write operating system text files
polar_ivm	1.0	polar_ivm
polar_sql_mapping	1.2	Record error sqls and mapping them to correct one
polar_stat_sql	1.0	Kernel statistics gathering, and sql plan nodes information gathering
tds_fdw	2.0.2	Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
xml2	1.1	XPath querying and XSLT
polar_upgrade_catalogs	1.1	Upgrade catalogs for old version instance
polar_utl_i18n	1.1	polar_utl_i18n
polar_utl_raw	1.0	implement utl_raw - provides SQL functions for manipulating RAW datatypes.
timescaledb	2.9.2	Enables scalable inserts and complex queries for time-series data
polar_vfs	1.0	polar virtual file system for different storage
polar_worker	1.0	polar_worker
postgres_fdw	1.1	foreign-data wrapper for remote PostgreSQL servers
refint	1.0	functions for implementing referential integrity (obsolete)
roaringbitmap	0.5	support for Roaring Bitmaps
tsm_system_time	1.0	TABLESAMPLE method which accepts time in milliseconds as a limit
vector	0.5.0	vector data type and ivfflat and hnsw access methods
rum	1.3	RUM index access method
unaccent	1.1	text search dictionary that removes accents
seg	1.4	data type for representing line segments or floating-point intervals
sequential_uuids	1.0.2	generator of sequential UUIDs
uuid-ossp	1.1	generate universally unique identifiers (UUIDs)
smlar	1.0	compute similary of any one-dimensional arrays
varbitx	1.1	varbit functions pack
sslinfo	1.2	information about SSL certificates
tablefunc	1.0	functions that manipulate whole tables, including crosstab
tcn	1.0	Triggered change notifications
zhparser	1.0	a parser for full-text search of Chinese
address_standardizer	3.3.2	Ganos PostGIS address standardizer
address_standardizer_data_us	3.3.2	Ganos PostGIS address standardizer data us
ganos_fdw	6.0	Ganos Spatial FDW extension for POLARDB
ganos_geometry	6.0	Ganos geometry lite extension for POLARDB
ganos_geometry_pyramid	6.0	Ganos Geometry Pyramid extension for POLARDB
ganos_geometry_sfcgal	6.0	Ganos geometry lite sfcgal extension for POLARDB
ganos_geomgrid	6.0	Ganos geometry grid extension for POLARDB
ganos_importer	6.0	Ganos Spatial importer extension for POLARDB
ganos_networking	6.0	Ganos networking
ganos_pointcloud	6.0	Ganos pointcloud extension For POLARDB
ganos_pointcloud_geometry	6.0	Ganos_pointcloud LIDAR data and ganos_geometry data for POLARDB
ganos_raster	6.0	Ganos raster extension for POLARDB
ganos_scene	6.0	Ganos scene extension for POLARDB
ganos_sfmesh	6.0	Ganos surface mesh extension for POLARDB
ganos_spatialref	6.0	Ganos spatial reference extension for POLARDB
ganos_trajectory	6.0	Ganos trajectory extension for POLARDB
ganos_vomesh	6.0	Ganos volumn mesh extension for POLARDB
postgis_tiger_geocoder	3.3.2	Ganos PostGIS tiger geocoder
postgis_topology	3.3.2	Ganos PostGIS topology

11 - PostgresML

How to deploy PostgresML with Pigsty: ML, training, inference, Embedding, RAG inside DB.

PostgresML is a PostgreSQL extension that supports the latest large language models (LLM), vector operations, classical machine learning, and traditional Postgres application workloads.

PostgresML (pgml) is a PostgreSQL extension written in Rust. You can run standalone Docker images, but this documentation is not a docker-compose template introduction, for reference only.

PostgresML officially supports Ubuntu 22.04, but we also maintain RPM versions for EL 8/9, if you don’t need CUDA and NVIDIA-related features.

You need internet access on database nodes to download Python dependencies from PyPI and models from HuggingFace.

PostgresML is Deprecated

Because the company behind it has ceased operations.

Configuration

PostgresML is an extension written in Rust, officially supporting Ubuntu. Pigsty maintains RPM versions of PostgresML on EL8 and EL9.

Creating a New Cluster

PostgresML 2.7.9 is available for PostgreSQL 15, supporting Ubuntu 22.04 (official), Debian 12, and EL 8/9 (maintained by Pigsty). To enable pgml, you first need to install the extension:

pg-meta:
  hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta     ,password: DBUser.Meta     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view     ,password: DBUser.Viewer   ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - { name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [{name: postgis, schema: public}, {name: timescaledb}]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    pg_libs: 'pgml, pg_stat_statements, auto_explain'
    pg_extensions: [ 'pgml_15 pgvector_15 wal2json_15 repack_15' ]  # ubuntu
    #pg_extensions: [ 'postgresql-pgml-15 postgresql-15-pgvector postgresql-15-wal2json postgresql-15-repack' ]  # ubuntu

On EL 8/9, the extension name is pgml_15, corresponding to the Ubuntu/Debian name postgresql-pgml-15. You also need to add pgml to pg_libs.

Enabling on an Existing Cluster

To enable pgml on an existing cluster, you can install it using Ansible’s package module:

ansible pg-meta -m package -b -a 'name=pgml_15'
# ansible el8,el9 -m package -b -a 'name=pgml_15'           # EL 8/9
# ansible u22 -m package -b -a 'name=postgresql-pgml-15'    # Ubuntu 22.04 jammy

Python Dependencies

You also need to install PostgresML’s Python dependencies on cluster nodes. Official tutorial: Installation Guide

Install Python and PIP

Ensure python3, pip, and venv are installed:

# Ubuntu 22.04 (python3.10), need to install pip and venv using apt
sudo apt install -y python3 python3-pip python3-venv

For EL 8 / EL9 and compatible distributions, you can use python3.11:

# EL 8/9, can upgrade the default pip and virtualenv
sudo yum install -y python3.11 python3.11-pip       # install latest python3.11
python3.11 -m pip install --upgrade pip virtualenv  # use python3.11 on EL8 / EL9

Using PyPI Mirrors

For users in mainland China, we recommend using Tsinghua University’s PyPI mirror.

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple    # set global mirror (recommended)
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple some-package        # use for single installation

Install Dependencies

Create a Python virtual environment and use pip to install dependencies from requirements.txt and requirements-xformers.txt.

If you’re using EL 8/9, replace python3 with python3.11 in the following commands.

su - postgres;                          # create virtual environment as database superuser
mkdir -p /data/pgml; cd /data/pgml;     # create virtual environment directory
python3    -m venv /data/pgml           # create virtual environment directory (Ubuntu 22.04)
source /data/pgml/bin/activate          # activate virtual environment

# write Python dependencies and install with pip
cat > /data/pgml/requirments.txt <<EOF
accelerate==0.22.0
auto-gptq==0.4.2
bitsandbytes==0.41.1
catboost==1.2
ctransformers==0.2.27
datasets==2.14.5
deepspeed==0.10.3
huggingface-hub==0.17.1
InstructorEmbedding==1.0.1
lightgbm==4.1.0
orjson==3.9.7
pandas==2.1.0
rich==13.5.2
rouge==1.0.1
sacrebleu==2.3.1
sacremoses==0.0.53
scikit-learn==1.3.0
sentencepiece==0.1.99
sentence-transformers==2.2.2
tokenizers==0.13.3
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
tqdm==4.66.1
transformers==4.33.1
xgboost==2.0.0
langchain==0.0.287
einops==0.6.1
pynvml==11.5.0
EOF

# install dependencies using pip in the virtual environment
python3 -m pip install -r /data/pgml/requirments.txt
python3 -m pip install xformers==0.0.21 --no-dependencies

# additionally, 3 Python packages need to be installed globally using sudo!
sudo python3 -m pip install xgboost lightgbm scikit-learn

Enable PostgresML

After installing the pgml extension and Python dependencies on all cluster nodes, you can enable pgml on the PostgreSQL cluster.

Use the patronictl command to configure the cluster, add pgml to shared_preload_libraries, and specify your virtual environment directory in pgml.venv:

shared_preload_libraries: pgml, timescaledb, pg_stat_statements, auto_explain
pgml.venv: '/data/pgml'

Then restart the database cluster and create the extension using SQL commands:

CREATE EXTENSION vector;        -- also recommend installing pgvector!
CREATE EXTENSION pgml;          -- create PostgresML in the current database
SELECT pgml.version();          -- print PostgresML version information

If everything is normal, you should see output similar to the following:

# create extension pgml;
INFO:  Python version: 3.11.2 (main, Oct  5 2023, 16:06:03) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
INFO:  Scikit-learn 1.3.0, XGBoost 2.0.0, LightGBM 4.1.0, NumPy 1.26.1
CREATE EXTENSION

# SELECT pgml.version(); -- print PostgresML version information
 version
---------
 2.7.8

Done! For more details, please refer to the official PostgresML documentation: https://postgresml.org/docs/guides/use-cases/

12 - Greenplum

Deploy/Monitor Greenplum clusters with Pigsty, build Massively Parallel Processing (MPP) PostgreSQL data warehouse clusters!

Pigsty supports deploying Greenplum clusters and its derivative distribution YMatrixDB, and provides the capability to integrate existing Greenplum deployments into Pigsty monitoring.

Overview

Greenplum / YMatrix cluster deployment capabilities are only available in the professional/enterprise editions and are not currently open source.

Installation

Pigsty provides installation packages for Greenplum 6 (@el7) and Greenplum 7 (@el8). Open source users can install and configure them manually.

# EL 7 Only (Greenplum6)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["open-source-greenplum-db-6"]}'

# EL 8 Only (Greenplum7)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["open-source-greenplum-db-7"]}'

Configuration

To define a Greenplum cluster, you need to use pg_mode = gpsql and additional identity parameters pg_shard and gp_role.

#================================================================#
#                        GPSQL Clusters                          #
#================================================================#

#----------------------------------#
# cluster: mx-mdw (gp master)
#----------------------------------#
mx-mdw:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary , nodename: mx-mdw-1 }
  vars:
    gp_role: master          # this cluster is used as greenplum master
    pg_shard: mx             # pgsql sharding name & gpsql deployment name
    pg_cluster: mx-mdw       # this master cluster name is mx-mdw
    pg_databases:
      - { name: matrixmgr , extensions: [ { name: matrixdbts } ] }
      - { name: meta }
    pg_users:
      - { name: meta , password: DBUser.Meta , pgbouncer: true }
      - { name: dbuser_monitor , password: DBUser.Monitor , roles: [ dbrole_readonly ], superuser: true }

    pgbouncer_enabled: true                # enable pgbouncer for greenplum master
    pgbouncer_exporter_enabled: false      # enable pgbouncer_exporter for greenplum master
    pg_exporter_params: 'host=127.0.0.1&sslmode=disable'  # use 127.0.0.1 as local monitor host

#----------------------------------#
# cluster: mx-sdw (gp master)
#----------------------------------#
mx-sdw:
  hosts:
    10.10.10.11:
      nodename: mx-sdw-1        # greenplum segment node
      pg_instances:             # greenplum segment instances
        6000: { pg_cluster: mx-seg1, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633 }
        6001: { pg_cluster: mx-seg2, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634 }
    10.10.10.12:
      nodename: mx-sdw-2
      pg_instances:
        6000: { pg_cluster: mx-seg2, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633  }
        6001: { pg_cluster: mx-seg3, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634  }
    10.10.10.13:
      nodename: mx-sdw-3
      pg_instances:
        6000: { pg_cluster: mx-seg3, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633 }
        6001: { pg_cluster: mx-seg1, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634 }
  vars:
    gp_role: segment               # these are nodes for gp segments
    pg_shard: mx                   # pgsql sharding name & gpsql deployment name
    pg_cluster: mx-sdw             # these segment clusters name is mx-sdw
    pg_preflight_skip: true        # skip preflight check (since pg_seq & pg_role & pg_cluster not exists)
    pg_exporter_config: pg_exporter_basic.yml                             # use basic config to avoid segment server crash
    pg_exporter_params: 'options=-c%20gp_role%3Dutility&sslmode=disable'  # use gp_role = utility to connect to segments

Additionally, PG Exporter requires extra connection parameters to connect to Greenplum Segment instances for metric collection.

13 - Cloudberry

Deploy/Monitor Cloudberry clusters with Pigsty, an MPP data warehouse cluster forked from Greenplum!

Installation

Pigsty provides installation packages for Greenplum 6 (@el7) and Greenplum 7 (@el8). Open source users can install and configure them manually.

# EL 7 Only (Greenplum6)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["cloudberrydb"]}'

# EL 8 Only (Greenplum7)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["cloudberrydb"]}'

14 - Neon

Use Neon’s open-source Serverless PostgreSQL kernel to build flexible, scale-to-zero, forkable PG services.

Neon adopts a storage and compute separation architecture, providing seamless autoscaling, scale to zero, and unique database branching capabilities.

Neon official website: https://neon.tech/

The compiled binaries of Neon are excessively large and are currently not available to open-source users. It is currently in the pilot stage. If you have requirements, please contact Pigsty sales.