We recently decided to factor out PostgreSQL authentication into dedicated PgBouncers for different teams, and no longer allow direct remote access to the database cluster.
Apart from its main use as a connection pooler, we’ve realized that the isolation afforded by a PgBouncer layer simplifies the management of authentication and connection settings of the different client teams, and frees PostgreSQL of these mundane duties.
Motivations
In the highly distributed and multi-tenant environment that our databases participate in, even seemingly minor issues easily bloat into an unsustainable mess. We got to the stage where we realized that we needed some formal overarching patterns across all our database servers.
Visibility within HBA Rules
Since application servers of teams do not necessarily have contiguous IP addresses, the ip-masks of
pg_hba.conf were made to cover very wide subnets.
Apart from the potential security issues, this meant that within the HBA rules we lost track of where our clients are expected to come from.
We could list the different IPs and use comments to separate different team servers, but then the HBA file could become huge and unmanageable
since all preceding lines are considered in the HBA logics e.g. a Reject
on line 1 could impact a different rule on line 500.
Without a scalable formalism, a central datateam could not manage all HBAs of all teams, for all our PostgreSQL clusters.
Security
In addition to HBA rules allowing connections from very wide subnets, we generally didn’t have an ideal security setup w.r.t dedicated users per use-case, apt authentication methods, etc. Going forward, we wanted to enforce the following constraints:
- Avoid SSH access to database servers to client teams, just because of authentication.
- Local connections should use the
peer
ormd5
auth-method. - Remote connections should come from specific users and IP-addresses, and preferably have
SCRAM or
md5
auth-method. - Remote connections which cannot present a password (e.g. haproxy’s pgsql-check option) must have very limited privileges. This means that postgres foreign-data-wrapper superusers (before pg13 passwordless connections are only allowed with a superuser) must first be switched to non-superusers before connecting to other remote database servers.
Connection settings
Most connection settings can be made on a PostgreSQL user or database level, but that’s not the case at least with max_connections
.
We however want to be able to allot different connection limits to different users and possibly even on different databases.
This way one service user is not able to exhaust all available connections, thereby DOS-ing
all other service users.
Maintenance
Given its clear conceptual separation, it’s desirable to pull out authentication and connection settings from PostgreSQL into a separate dedicated layer. Furthermore, in order to delineate ownership and scope, it is desirable to group assets of different teams into different files and instances. Such isolation helps scope deployment, monitoring and troubleshooting. Eventually, we wanted a paradigm where we could present an authentication and connection configuration interface which could be deployed independently, preferably even by the teams themselves, without the chance of breaking another team’s instance, and without touching PostgreSQL itself.
Hence we decided that, on each database server, every team should have a dedicated PgBouncer instance to manage their authentication and connections.
PgBouncer Authentication Layer
Since most teams are already using PgBouncer for connection pooling, it’s not such a surprise that we pounced on it as our authentication layer. As to the specific setup, there is not much wiggle room given our motivations.
NB: it is highly recommended to use different users for different use-cases e.g. for foreign-data-wrappers (FDW), for migration, for application, etc; this is so that the
search_path
and other contextual settings within transactional pools remain consistent
NB: with a PgBouncer layer atop PostgreSQL, teams which use an additional application connection pooler could hijack database connections from some client servers, while other client servers are deprived of a PgBouncer pool slot. This materializes into stale PostgreSQL connections which are
idle
and haven’t seen apg_stat_activity.state_change
for days, and eventually PgBouncer’sclient_wait
queue spilling over. The remedy would be topg_terminate()
such stale connections, and figure out suitable timeout settings on the application connection pooler.
Below are extracts from the different PgBouncer configuration files.
INI File
; leave an interface for haproxy health-check with pgsql-check option
; we'll install a user_lookup function in the database=postgres
; and exclude user=haproxy from the auth_file
; hence both pgbouncer and postgresql are included in the haproxy health check
haproxy = host=/run/postgresql/ port=5432 dbname=postgres user=haproxy pool_size=10
; incoming database connections
service1_db = host=/run/postgresql port=5432
...
; outgoing FDW connections to pgbouncers on remote servers
haproxy_remote1 = host=remote1.adjust.com port=8432 dbname=haproxy user=haproxy pool_size=10
remote1_db = host=remote1.adjust.com port=8432 user=$dedicated_non_superuser1
...
logfile = /var/log/pgbouncer/pgbouncer_<%= $instance_name %>.log
pidfile = /run/pgbouncer/pgbouncer_<%= $instance_name %>.pid
; NB: pgbouncer crashes in certain cases when auth_user=pgbouncer
; https://github.com/pgbouncer/pgbouncer/issues/568#issuecomment-840101192
auth_user = pgbouncer_auth
auth_query = SELECT * FROM public.user_lookup($1)
auth_type = hba
auth_file = /etc/pgbouncer/userlist_<%= $instance_name %>.txt
auth_hba_file = /etc/pgbouncer/hba_<%= $instance_name %>.txt
admin_users = pgbouncer, postgres, etc
stats_users = zabbix, etc
; we reserved the port 6432
;listen_port = 7432
...
Auth File
; NB: haproxy user MUST NOT be included here, so that health lookups to PG is made
; other users MUST be listed here, otherwise the user_lookup function has to be created in their respective databases
"zabbix" ""
"postgres" ""
"pgbouncer" ""
"pgbouncer_auth" ""
"etc" ""
"service1_user" "scram..."
"human1_user" "scram..."
...
HBA File
# TYPE DATABASE USER ADDRESS METHOD
local haproxy all peer
local pgbouncer all peer
# add rules for FDW superusers to connect to a local pgbouncer for user switching
local all postgres peer
# allow connections from each client-server
# allow connection for auth_user for user_lookup of remote haproxy
# it might be necessary to add rules to Trust remote FDW connections
host haproxy haproxy $app_server_ip1 trust # hostname1.adjust.com
host haproxy pgbouncer_auth $app_server_ip1 trust # hostname1.adjust.com
host all all $app_server_ip1 md5 # hostname1.adjust.com
host haproxy haproxy $app_server_ip2 trust # hostname2.adjust.com
host haproxy pgbouncer_auth $app_server_ip2 trust # hostname2.adjust.com
host all all $app_server_ip2 md5 # hostname2.adjust.com
...
PostgreSQL Authentication Layer
pg_hba.conf
and pg_ident.conf
have to be adapted in order to allow PgBouncer user to proxy peer
connections for auth_user
and haproxy
.
pg_ident.conf
ospg pgbouncer haproxy
ospg pgbouncer pgbouncer_auth
pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
# replication and other special connections
host replication postgres 198.168.l00.1/32 trust
# proxy peer-auth; pg_ident mappings are required
local postgres haproxy peer map=ospg
local postgres pgbouncer_auth peer map=ospg
# pgbouncer instance groups
# the human and service users and hashes could be synced into PostgreSQL from each pgbouncer instance's auth_file
# each such user should be granted the respective Role of the pgbouncer instance
# e.g. GRANT pgbouncer_instance_name1 TO service1_user;
local all +pgb_instance_name1 md5
local all +pgb_instance_name2 md5
...
local all +pgb_instance_nameN md5
# default peer-auth
local all all peer
The lookup function
The lookup function is required only in the postgres
database; this is only used for haproxy health checks.
–- the auth_user has to be Granted select on the user_lookup function
-- NB: pgbouncer crashes in certain cases when the function returns Record and the user doesn't exist => return SETOF record
-- https://github.com/pgbouncer/pgbouncer/issues/568#issuecomment-1198500665
CREATE OR REPLACE FUNCTION public.user_lookup(i_username text, OUT uname text, OUT phash text)
RETURNS SETOF record
LANGUAGE sql
SECURITY DEFINER
SET search_path TO 'public', 'pg_temp'
AS $function$
SELECT usename::text, passwd
FROM pg_catalog.pg_shadow
WHERE usename = i_username;
$function$
REVOKE ALL ON FUNCTION public.user_lookup(text) FROM public;
Putting it all together
We can capture all the above configs in the following yaml interface:
pgbouncer:
instances:
adjust_teamA:
databases:
haproxy:
host: /run/postgresql/
port: 5432
dbname: postgres
user: haproxy
pool_size: 10
service1_db:
host: /run/postgresql
port: 5432
haproxy_remote1:
host: remote1.adjust.com
port: 8432
dbname: haproxy
user: haproxy
pool_size: 10
remote: true
remote1_db:
host: remote1.adjust.com
port: 8432
user: $dedicated_non_superuser1
remote: true
config:
scalar:
listen_port: 7432
pool_size: 20
...
vector:
stats_users:
- zabbix
admin_users:
- pgbouncer
- postgres
...
hba:
client_groups: # ip-addresses are interpreted within the configuration management
- address: [service1_clients]
database: all
user: all
method: scram
type: host
...
Client_hosts: # ip-addresses are listed verbatim
- address: [198.168.l00.1]
database: all
user: all
method: scram
type: host
userlist:
- user: zabbix
pgb_only: true # user only connects to pgbouncer database
- user: pgbouncer
pgb_only: true
- user: postgres
pgb_only: true
- user: service1_user
hash: scram...
acl: # postgres authorization configuration could go here
- user: human1_user
hash: scram...
applovin_teamA:
...
Summary
With the capable authentication facilities of PgBouncer, we managed to strip down PostgreSQL HBA to a bare minimum and shoved most of the details into multiple independent PgBouncer instances. At a mere glance, the scope of the connections the datateam allows is apparent and we can direct our auditors to the different owners of the PgBouncer instances for more context on which team connections are allowed and for which reasons.
The user-authentication process no longer requires oversight or manual input from the datateam. Instead, autonomous teams within their own confines, can deploy their own users and password-hashes and tune other connection settings (they have better contextual info about their services), then our local cronjob checks deployed PgBouncer auth_files
and ensures PostgreSQL logins are in sync with these.
This setup works even for teams with isolated configuration-management repositories; the only requirement is to successfully deploy a PgBouncer instance to the database server.
Teams can also deploy haproxies on their application servers to check the health of their PgBouncer connection endpoint; the health of PostgreSQL is checked implicitly.
The current setup doesn’t address authorization on PostgreSQL objects; perhaps we can talk about this sometime in the future.