In the database world, there are many common concepts like High Availability, Failover, and Connection pooling. All of them are useful things to implement on any system, and even a must in some cases.
A connection pooling is a method of creating a pool of connections and reuse them avoiding opening new connections to the database all the time, which will increase the performance of your applications considerably. PgBouncer is a popular connection pooler designed for PostgreSQL, but it is not enough to achieve PostgreSQL High Availability by itself as it doesn’t have multi-host configuration, failover, or detection.
Using a Load Balancer is a way to have High Availability in your database topology. It could be useful for redirecting traffic to healthy database nodes, distribute the traffic across multiple servers to improve performance, or just to have a single endpoint configured in your application for an easier configuration and failover process. For this, HAProxy is a good option to complement your connection pooler, as it is an open-source proxy that can be used to implement high availability, load balancing, and proxying for TCP and HTTP based applications.
In this blog, we will use both concepts, Load Balancer and Connection pooling (HAProxy + PgBouncer), to deploy a High Availability environment for your PostgreSQL database.
How PgBouncer Works
PgBouncer acts as a PostgreSQL server, so you just need to access your database using the PgBouncer information (IP Address/Hostname and Port), and PgBouncer will create a connection to the PostgreSQL server, or it will reuse one if it exists.
When PgBouncer receives a connection, it performs the authentication, which depends on the method specified in the configuration file. PgBouncer supports all the authentication mechanisms that the PostgreSQL server supports. After this, PgBouncer checks for a cached connection, with the same username+database combination. If a cached connection is found, it returns the connection to the client, if not, it creates a new connection. Depending on the PgBouncer configuration and the number of active connections, it could be possible that the new connection is queued until it can be created, or even aborted.
The PgBouncer behavior depends on the pooling mode configured:
- session pooling (default): When a client connects, a server connection will be assigned to it for the whole duration the client stays connected. When the client disconnects, the server connection will be put back into the pool.
- transaction pooling: A server connection is assigned to a client only during a transaction. When PgBouncer notices that the transaction is over, the server connection will be put back into the pool.
- statement pooling: The server connection will be put back into the pool immediately after a query completes. Multi-statement transactions are disallowed in this mode as they would break.
To balance queries between several servers, on the PgBouncer side, it may be a good idea to make server_lifetime smaller and also turn server_round_robin on. By default, idle connections are reused by the LIFO algorithm, which may work not so well when a load-balancer is used.
How to Install PgBouncer
We will assume you have your PostgreSQL cluster and HAProxy deployed, and it is up and running, otherwise, you can follow this blog post to easily deploy PostgreSQL for High Availability.
You can install PgBouncer on each database node or on an external machine, in any case, you will have something like this:
To get the PgBouncer software you can go to the PgBouncer download section, or use the RPM or DEB repositories. For this example, we will use CentOS 8 and will install it from the official PostgreSQL repository.
First, download and install the corresponding repository from the PostgreSQL site (if you don’t have it in place yet):
$ wget https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm$ rpm -Uvh pgdg-redhat-repo-latest.noarch.rpm
Then, install the PgBouncer package:
$ yum install pgbouncer
Verify the installation:
$ pgbouncer --versionPgBouncer 1.14.0libevent 2.1.8-stableadns: c-ares 1.13.0tls: OpenSSL 1.1.1c FIPS 28 May 2019
When it is completed, you will have a new configuration file located in /etc/pgbouncer/pgbouncer.ini:
[databases][users][pgbouncer]logfile = /var/log/pgbouncer/pgbouncer.logpidfile = /var/run/pgbouncer/pgbouncer.pidlisten_addr = 127.0.0.1listen_port = 6432auth_type = trustauth_file = /etc/pgbouncer/userlist.txtadmin_users = postgresstats_users = stats, postgres
Let’s see these parameters one by one:
- Databases section [databases]: This contains key=value pairs where the key will be taken as a database name and the value as a libpq connection string style list of key=value pairs.
- User section [users]: This contains key=value pairs where the key will be taken as a user name and the value as a libpq connection string style list of key=value pairs of configuration settings specific for this user.
- logfile: Specifies the log file. The log file is kept open, so after rotation kill -HUP or on console RELOAD; should be done.
- pidfile: Specifies the PID file. Without the pidfile set, the daemon is not allowed.
- listen_addr: Specifies a list of addresses where to listen for TCP connections. You may also use * meaning “listen on all addresses”. When not set, only Unix socket connections are accepted.
- listen_port: Which port to listen on. Applies to both TCP and Unix sockets. The default port is 6432.
- auth_type: How to authenticate users.
- auth_file: The name of the file to load usernames and passwords from.
- admin_users: Comma-separated list of database users that are allowed to connect and run all commands on the console.
- stats_users: Comma-separated list of database users that are allowed to connect and run read-only queries on the console.
This is just a sample of the default configuration file, as the original has 359 lines, but the rest of the lines are commented out by default. To get all the available parameters, you can check the official documentation.
How to Use PgBouncer
Now, let’s see a basic configuration to make it work.
The pgbouncer.ini configuration file:
$ cat /etc/pgbouncer/pgbouncer.ini[databases]world = host=127.0.0.1 port=5432 dbname=world[pgbouncer]logfile = /var/log/pgbouncer/pgbouncer.logpidfile = /var/run/pgbouncer/pgbouncer.pidlisten_addr = *listen_port = 6432auth_type = md5auth_file = /etc/pgbouncer/userlist.txtadmin_users = admindb
And the authentication file:
$ cat /etc/pgbouncer/userlist.txt"admindb" "root123"
So, in this case, I have installed PgBouncer in the same database node, listening in all IP addresses, and it connects to a PostgreSQL database called “world”. I am also managing the allowed users in the userlist.txt file with a plain-text password that can be encrypted if needed.
To start the PgBouncer service, you just need to run the following command:
$ pgbouncer -d /etc/pgbouncer/pgbouncer.ini
Where -d means “daemon”, so it will run in the background.
$ netstat -pltnProto Recv-Q Send-Q Local Address Foreign Address State PID/Program nametcp 0 0 0.0.0.0:6432 0.0.0.0:* LISTEN 4274/pgbouncertcp6 0 0 :::6432 :::* LISTEN 4274/pgbouncer
As you can see, PgBouncer is up and waiting for connections in the port 6432. To access the PostgreSQL database, run the following command using your local information (port, host, username, and database name):
$ psql -p 6432 -h 127.0.0.1 -U admindb worldPassword for user admindb:psql (12.4)Type "help" for help.world=#
Keep in mind that the database name (world) is the database configured in your PgBouncer configuration file:
[databases]world = host=127.0.0.1 port=5432 dbname=world
Monitoring and Managing PgBouncer
Instead of accessing your PostgreSQL database, you can connect directly to PgBouncer to manage or monitor it. For this, use the same command that you used previously, but change the database to “pgbouncer”:
$ psql -p 6432 -h 127.0.0.1 -U admindb pgbouncerPassword for user admindb:psql (12.4, server 1.14.0/bouncer)Type "help" for help.pgbouncer=# SHOW HELP;NOTICE: Console usageDETAIL:SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|USERS|VERSIONSHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEMSHOW DNS_HOSTS|DNS_ZONESSHOW STATS|STATS_TOTALS|STATS_AVERAGES|TOTALSSET key = argRELOADPAUSE [
]RESUME [ ]DISABLE ENABLE RECONNECT [ ]KILL SUSPENDSHUTDOWNSHOW
Now, you can run different PgBouncer commands to monitor it:
pgbouncer=# SHOW STATS_TOTALS; database | xact_count | query_count | bytes_received | bytes_sent | xact_time | query_time | wait_time-----------+------------+-------------+----------------+------------+-----------+------------+----------- pgbouncer | 1 | 1 | 0 | 0 | 0 | 0 | 0 world | 2 | 2 | 59 | 234205 | 8351 | 8351 | 4828(2 rows)
pgbouncer=# SHOW SERVERS; type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time| wait | wait_us | close_needed | ptr | link | remote_pid | tls------+---------+----------+--------+-----------+------+------------+------------+-------------------------+-------------------------+------+---------+--------------+----------------+----------------+------------+----- S | admindb | world | active | 127.0.0.1 | 5432 | 127.0.0.1 | 45052 | 2020-09-09 18:31:57 UTC | 2020-09-09 18:32:04 UTC| 0 | 0 | 0 | 0x55b04a51b3d0 | 0x55b04a514810 | 5738 |(1 row)
pgbouncer=# SHOW CLIENTS; type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls------+---------+-----------+--------+-----------+-------+------------+------------+-------------------------+-------------------------+------+---------+--------------+----------------+----------------+------------+----- C | admindb | pgbouncer | active | 127.0.0.1 | 46950 | 127.0.0.1 | 6432 | 2020-09-09 18:29:46 UTC | 2020-09-09 18:55:11 UTC | 1441 | 855140 | 0 | 0x55b04a5145e0 | | 0 | C | admindb | world | active | 127.0.0.1 | 47710 | 127.0.0.1 | 6432 | 2020-09-09 18:31:41 UTC | 2020-09-09 18:32:04 UTC | 0 | 0 | 0 | 0x55b04a514810 | 0x55b04a51b3d0 | 0 |(2 rows)
pgbouncer=# SHOW POOLS; database | user | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+----------- pgbouncer | pgbouncer | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | statement world | admindb | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | session(2 rows)
And to manage it…
pgbouncer=# PAUSE world;PAUSE
pgbouncer=# RESUME world;RESUME
Those commands are just an example. For a complete list of commands, please refer to the official documentation.
Using a combination of PgBouncer + HAProxy + PostgreSQL is a good way to achieve High Availability for your PostgreSQL cluster improving your database performance at the same time.
As you can see, if you have your PostgreSQL environment in place, which you can deploy using ClusterControl in just a few clicks, you can easily add PgBouncer to take advantage of having a connection pooler for your systems.
How do I get high availability in PostgreSQL? ›
- Step 1: Set up 2 Compute Engine Instances to run PostgreSQL.
- Step 2: Generate a new table for the Guestbook Application.
- Step 3: Configure the Primary Server.
- Step 4: Create a Primary Server Backup on the Standby Server.
- Create a PgBouncer configuration file. ...
- Create an authentication file. ...
- Launch pgbouncer : $ $GPHOME/bin/pgbouncer -d pgbouncer.ini. ...
- Update your client applications to connect to pgbouncer instead of directly to Greenplum Database server.
In general, a single PgBouncer can process up to 10,000 connections. 1,000 or so can be active at one time. The exact numbers will depend on your configuration and the amount of data you it is copying between the database and the application.What are the advantages of Pgbouncer? ›
The primary benefit of PgBouncer is to improve idle connections and short-lived connections at the database server. PgBouncer uses a more lightweight model that utilizes asynchronous I/O, and only uses actual Postgres connections when needed, that is, when inside an open transaction, or when a query is active.What is the difference between Pgpool and PgBouncer? ›
Pgpool-II does support passwordless authentication through PAM or SSL-certificates. However, these must be set up outside the PostgreSQL system, while PgBouncer can offload this to the PostgreSQL server. PgBouncer provides a virtual database that reports various useful statistics.How do I enable high availability? ›
To set up servers in a high-availability configuration, you install the server on separate systems and connect the servers to the same database. Then, you configure a load balancer to distribute the traffic between the servers. Instead of accessing the servers directly, users access the load balancer URL.What is the default connection limit for PgBouncer? ›
PostgreSQL's default connection limit is set to 100 concurrent connections, which is also the default on Compose for PostgreSQL. Many connection pooling libraries and tools also set connections to 100 by default.What are the pooling modes in PgBouncer? ›
PgBouncer has three pooling modes available: transaction pooling, session pooling, and statement pooling. It's important that you understand how each work. The pooling mode used: Determines how long a server connection stays assigned to a client connection.What is the usage of PgBouncer? ›
The PgBouncer process will reload its configuration files and update changeable settings. This includes the main configuration file as well as the files specified by the settings auth_file and auth_hba_file . PgBouncer notices when a configuration file reload changes the connection parameters of a database definition.How many requests can Postgres handle per second? ›
If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes).
How many max connections can Postgres handle? ›
The default is typically 100 connections, but might be less if your kernel settings will not support it (as determined during initdb).How many concurrent queries can PostgreSQL handle? ›
PostgreSQL Connection Limits
At provision, Databases for PostgreSQL sets the maximum number of connections to your PostgreSQL database to 115. 15 connections are reserved for the superuser to maintain the state and integrity of your database, and 100 connections are available for you and your applications.
pgcat is a connection pool like pgbouncer, and shares part of the logical way it's working, but it has a fundamental difference: it allows to use multiple databases for a single client connection. pgbouncer only ever connects a client to a single database.What is pgAgent in PostgreSQL? ›
pgAgent is a tool used for scheduling jobs for PostgreSQL databases. It also has more powerful scheduling capabilities than the often-used cron because it is specifically built for handling Postgres tasks. For example, pgAgent can schedule multiple steps without a batch script or without repeating the command.What is Pgpool in PostgreSQL? ›
Pgpool-II is a proxy software that sits between PostgreSQL servers and a PostgreSQL database client. It provides the following features: Connection Pooling.What is Wal archiving in PostgreSQL? ›
One of these specialized procedures that is supported by Postgres is called WAL archiving. The Postgres WAL (Write-Ahead Log) is the location in the Postgres cluster where all changes to the cluster's data files are recorded before they're written to the heap.What is the difference between DataSource and connection pool? ›
DataSource objects that implement connection pooling also produce a connection to the particular data source that the DataSource class represents. The connection object that the getConnection method returns is a handle to a PooledConnection object rather than being a physical connection.What are Wal logs in Postgres? ›
Write-Ahead Logging ( WAL ) is a standard method for ensuring data integrity. A detailed description can be found in most (if not all) books about transaction processing.What are the parameters to check for high availability? ›
- Budget. ...
- Uptime requirements. ...
- Outage coverage. ...
- Recovery time objective (RTO) ...
- Recovery point objective (RPO) ...
- Resilience requirements. ...
- Automated failover and switchover. ...
- Distance requirements.
Either the Network Time Protocol (NTP) must be configured or the clock must be set identical on both devices to allow timestamps and call timers to match. Virtual router forwarding (VRF) must be defined in the same order on both active and standby routers for an accurate synchronization of data.
How to configure high availability Palo Alto? ›
- Locate the setup section.
- Click on the gear cog to view/edit the settings.
- Enable HA.
- Enter a group ID that matches both members.
- Enter an IP address for the Peer's Control LInk. This will be used in the next step.
- Enable Config Sync.
Using the “alter system set” command to raise the maximum number of connections: The alter system command is used to alter directly in the PostgreSQL software's system files. In this command, we will increase the previous set of maximum system connections.What is the maximum number of simultaneous connections? ›
By default, SQL Server allows a maximum of 32767 concurrent connections which is the maximum number of users that can simultaneously log in to the SQL server instance.What is the default isolation level in Postgres? ›
Read-committed is the default isolation level. Read-committed — The default PostgreSQL transaction isolation level. It prevents sessions from seeing data from concurrent transactions until it is committed.What is pool size in PgBouncer? ›
pool_size — Just like it sounds: the size of the pool. The default is 20. For Heroku server-side plans, the default is half of your plan's connection limit.What is the difference between Max pooling and average pooling? ›
Average pooling method smooths out the image and hence the sharp features may not be identified when this pooling method is used. Max pooling selects the brighter pixels from the image. It is useful when the background of the image is dark and we are interested in only the lighter pixels of the image.What is the default pool size in PgBouncer? ›
Pool size. It's not a that simple, PgBouncer has 5 different setting related to limiting connection count! You can specify pool_size for each proxied database. If not set, it defaults to default_pool_size setting, which again by default has a value of 20 .What is the use of Pg_stat_statements? ›
The pg_stat_statements module provides a means for tracking planning and execution statistics of all SQL statements executed by a server. The module must be loaded by adding pg_stat_statements to shared_preload_libraries in postgresql. conf , because it requires additional shared memory.How does connection pooling work? ›
Connection pooling means that connections are reused rather than created each time a connection is requested. To facilitate connection reuse, a memory cache of database connections, called a connection pool, is maintained by a connection pooling module as a layer on top of any standard JDBC driver product.What is pooling in database? ›
What is database connection pooling? Database connection pooling is a way to reduce the cost of opening and closing connections by maintaining a “pool” of open connections that can be passed from database operation to database operation as needed.
How to handle 1 million requests per second? ›
Simple Backend optimizations
- Make sure you are using database connection pooling.
- Inspect your SQL queries and add caching for them.
- Add caching for whole responses.
As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.Which database can handle 10000 requests per second and how? ›
The system is based entirely on MySQL and processes an average of 3,000 queries per second and handles 10,000 queries per second at peak times! This amounts to over 300 million queries per day.How many tables is too much for Postgres? ›
Table Count Limit. Technically Postgres does not have a limit on the number of tables. However, each table is a file on the OS filesystem.Is Postgres good for large data? ›
PostgreSQL is an enterprise-class Open Source Database Management System. It is scalable, flexible, and extensible making it the DB of choice to build next-generation, large scale applications.How many cores can Postgres use? ›
PostgreSQL uses only one core.Can Postgres use multiple indexes in one query? ›
Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND and OR conditions across several index scans.How long can a Postgres query be? ›
So a query is limited to 1 gigabyte (2^30) in size, minus 1 byte for a terminating null byte.Can we run 2 PostgreSQL on the same server? ›
Adding a new PostgreSQL is as simple as executing pg_createcluster with the version of the PostgreSQL and clustername. The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.
PgBouncer is a light-weight connection pool manager for Greenplum and PostgreSQL. PgBouncer maintains a pool for connections for each database and user combination. PgBouncer either creates a new database connection for a client or reuses an existing connection for the same user and database.
What is Postgres High Availability? ›
PostgreSQL High Availability contains the measure of a system's resilience in the face of infrastructure failure. PostgreSQL maintains the high availability of its clusters by ensuring that a secondary server will take over if the primary server crashes.Why is Pg_hba conf used in PostgreSQL? ›
When PostgreSQL receives a connection request it will check the pg_hba. conf file to verify that the machine from which the application is requesting a connection has rights to connect to the specified database.What are the advantages of Pgpool? ›
It reduces the connection overhead, and improves system's overall throughput. Load Balancing. If a database is replicated (because running in either replication mode or native replication mode), performing a SELECT query on any server will return the same result.What is ETCD in Postgres? ›
Etcd is a fault-tolerant, distributed key-value store used to store the state of the Postgres cluster. Using Patroni, all of the Postgres nodes make use of etcd to keep the Postgres cluster up and running.What is Pg_stat_activity in PostgreSQL? ›
pg_stat_activity is a system view that allows you to identify active SQL queries in AnalyticDB for PostgreSQL instances. The pg_stat_activity view shows a server process and its related session and query in each row.How do I make a SQL database highly available? ›
- Log Shipping.
- Clustering (Failover Cluster)
- AlwaysON Availability Groups (AG)
Open source databases like Postgres, MariaDB, MySQL, and Redis are great options for HA but generally don't include a built-in HA solution.How do I ensure high availability in SQL Server? ›
The Always On availability groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012 (11. x), Always On availability groups maximizes the availability of a set of user databases for an enterprise.How do you build a high availability cluster? ›
To configure a high-availability cluster, you store shared files on network storage. Then, you configure multiple servers, set each server to access the same files and database, and configure a load balancer to distribute the traffic between the servers.What is SQL replication for high availability? ›
SQL Server Transactional Replication is a real time, database level, high availability solution, that consists of one primary server, known as Publisher, that distributes all the database tables, or selected tables known as articles, to one or more secondary servers, known as Subscribers, that can be also used for ...
How do you check if high availability is enabled in SQL Server? ›
- In the Google Cloud console, go to the Cloud SQL Instances page. Go to Cloud SQL Instances.
- To open the Overview page of an instance, click the instance name.
- In the Configuration section, it shows Highly available (regional).
- Change the database to read and write state(set the database Read-only option to false from SSMS)
- Add the user to the db_datareader role for this database.
- Change the database to Read-only state(set the database Read-only option to true from SSMS)
To configure your Db2® database solution for high availability, you must: schedule database maintenance activities; configure the primary and standby database servers to know about each other and their respective roles in the event of a failure; and configure any cluster managing software to transfer workload from a ...How do I add a database to a high availability group? ›
Use SQL Server Management Studio
Expand the Always On High Availability node and the Availability Groups node. Right-click the availability group, and select one of the following commands: To launch the Add Database to Availability Group Wizard, select the Add Database command.
Application Server provides the High Availability Database (HADB) for high availability storage of HTTP session and stateful session bean data. HADB is designed to support up to 99.999% service and data availability with load balancing, failover, and state recovery.How can you avoid a lot of burden on the database when doing SQL queries? ›
- Define business requirements first. ...
- SELECT fields instead of using SELECT * ...
- Avoid SELECT DISTINCT. ...
- Create joins with INNER JOIN (not WHERE) ...
- Use WHERE instead of HAVING to define filters. ...
- Use wildcards at the end of a phrase only.
In Object Explorer, right-click a server and select Properties. Click the Connections node. Under Remote server connections, in the Remote query timeout box, type or select a value from 0 through 2,147,483,647 to set the maximum number seconds for SQL Server to wait before timing out.How do I update SQL Server with Always On availability Groups? ›
- Perform a practice manual failover on at least one of your synchronous-commit replica instances.
- Protect your data by performing a full database backup on every availability database.
- Run DBCC CHECKDB on every availability database.
- Single points of failure. A single point of failure is a component that would cause the whole system to fail if it fails. ...
- Reliable crossover. Building redundancy into these systems is also important. ...
- Failure detectability.
- High availability through redundancy. An important strategy for maintaining high availability is having redundant components. ...
- High availability through failover. ...
- High availability through clustering. ...
- Database logging.
What is the difference between failover and high availability? ›
Failover is a means of achieving high availability (HA). Think of HA as a feature and failover as one possible implementation of that feature. Failover is not always the only consideration when achieving HA.