We recently decided to factor out PostgreSQL authentication into dedicated PgBouncers for different teams, and no longer allow direct remote access to the database cluster.
Apart from its main use as a connection pooler, we’ve realized that the isolation afforded by a PgBouncer layer simplifies the management of authentication and connection settings of the different client teams, and frees PostgreSQL of these mundane duties.
In the highly distributed and multi-tenant environment that our databases participate in, even seemingly minor issues easily bloat into an unsustainable mess. We got to the stage where we realized that we needed some formal overarching patterns across all our database servers.
Visibility within HBA Rules
Since application servers of teams do not necessarily have contiguous IP addresses, the ip-masks of
pg_hba.conf were made to cover very wide subnets.
Apart from the potential security issues, this meant that within the HBA rules we lost track of where our clients are expected to come from.
We could list the different IPs and use comments to separate different team servers, but then the HBA file could become huge and unmanageable
since all preceding lines are considered in the HBA logics e.g. a
Reject on line 1 could impact a different rule on line 500.
Without a scalable formalism, a central datateam could not manage all HBAs of all teams, for all our PostgreSQL clusters.
In addition to HBA rules allowing connections from very wide subnets, we generally didn’t have an ideal security setup w.r.t dedicated users per use-case, apt authentication methods, etc. Going forward, we wanted to enforce the following constraints:
- Avoid SSH access to database servers to client teams, just because of authentication.
- Local connections should use the
- Remote connections should come from specific users and IP-addresses, and preferably have
- Remote connections which cannot present a password (e.g. haproxy’s pgsql-check option) must have very limited privileges. This means that postgres foreign-data-wrapper superusers (before pg13 passwordless connections are only allowed with a superuser) must first be switched to non-superusers before connecting to other remote database servers.
Most connection settings can be made on a PostgreSQL user or database level, but that’s not the case at least with
We however want to be able to allot different connection limits to different users and possibly even on different databases.
This way one service user is not able to exhaust all available connections, thereby DOS-ing
all other service users.
Given its clear conceptual separation, it’s desirable to pull out authentication and connection settings from PostgreSQL into a separate dedicated layer. Furthermore, in order to delineate ownership and scope, it is desirable to group assets of different teams into different files and instances. Such isolation helps scope deployment, monitoring and troubleshooting. Eventually, we wanted a paradigm where we could present an authentication and connection configuration interface which could be deployed independently, preferably even by the teams themselves, without the chance of breaking another team’s instance, and without touching PostgreSQL itself.
Hence we decided that, on each database server, every team should have a dedicated PgBouncer instance to manage their authentication and connections.
PgBouncer Authentication Layer
Since most teams are already using PgBouncer for connection pooling, it’s not such a surprise that we pounced on it as our authentication layer. As to the specific setup, there is not much wiggle room given our motivations.
NB: it is highly recommended to use different users for different use-cases e.g. for foreign-data-wrappers (FDW), for migration, for application, etc; this is so that the
search_pathand other contextual settings within transactional pools remain consistent
NB: with a PgBouncer layer atop PostgreSQL, teams which use an additional application connection pooler could hijack database connections from some client servers, while other client servers are deprived of a PgBouncer pool slot. This materializes into stale PostgreSQL connections which are
idleand haven’t seen a
pg_stat_activity.state_changefor days, and eventually PgBouncer’s
client_waitqueue spilling over. The remedy would be to
pg_terminate()such stale connections, and figure out suitable timeout settings on the application connection pooler.
Below are extracts from the different PgBouncer configuration files.
; leave an interface for haproxy health-check with pgsql-check option ; we'll install a user_lookup function in the database=postgres ; and exclude user=haproxy from the auth_file ; hence both pgbouncer and postgresql are included in the haproxy health check haproxy = host=/run/postgresql/ port=5432 dbname=postgres user=haproxy pool_size=10 ; incoming database connections service1_db = host=/run/postgresql port=5432 ... ; outgoing FDW connections to pgbouncers on remote servers haproxy_remote1 = host=remote1.adjust.com port=8432 dbname=haproxy user=haproxy pool_size=10 remote1_db = host=remote1.adjust.com port=8432 user=$dedicated_non_superuser1 ... logfile = /var/log/pgbouncer/pgbouncer_<%= $instance_name %>.log pidfile = /run/pgbouncer/pgbouncer_<%= $instance_name %>.pid ; NB: pgbouncer crashes in certain cases when auth_user=pgbouncer ; https://github.com/pgbouncer/pgbouncer/issues/568#issuecomment-840101192 auth_user = pgbouncer_auth auth_query = SELECT * FROM public.user_lookup($1) auth_type = hba auth_file = /etc/pgbouncer/userlist_<%= $instance_name %>.txt auth_hba_file = /etc/pgbouncer/hba_<%= $instance_name %>.txt admin_users = pgbouncer, postgres, etc stats_users = zabbix, etc ; we reserved the port 6432 ;listen_port = 7432 ...
; NB: haproxy user MUST NOT be included here, so that health lookups to PG is made ; other users MUST be listed here, otherwise the user_lookup function has to be created in their respective databases "zabbix" "" "postgres" "" "pgbouncer" "" "pgbouncer_auth" "" "etc" "" "service1_user" "scram..." "human1_user" "scram..." ...
# TYPE DATABASE USER ADDRESS METHOD local haproxy all peer local pgbouncer all peer # add rules for FDW superusers to connect to a local pgbouncer for user switching local all postgres peer # allow connections from each client-server # allow connection for auth_user for user_lookup of remote haproxy # it might be necessary to add rules to Trust remote FDW connections host haproxy haproxy $app_server_ip1 trust # hostname1.adjust.com host haproxy pgbouncer_auth $app_server_ip1 trust # hostname1.adjust.com host all all $app_server_ip1 md5 # hostname1.adjust.com host haproxy haproxy $app_server_ip2 trust # hostname2.adjust.com host haproxy pgbouncer_auth $app_server_ip2 trust # hostname2.adjust.com host all all $app_server_ip2 md5 # hostname2.adjust.com ...
PostgreSQL Authentication Layer
pg_ident.conf have to be adapted in order to allow PgBouncer user to proxy
peer connections for
ospg pgbouncer haproxy ospg pgbouncer pgbouncer_auth
# TYPE DATABASE USER ADDRESS METHOD # replication and other special connections host replication postgres 198.168.l00.1/32 trust # proxy peer-auth; pg_ident mappings are required local postgres haproxy peer map=ospg local postgres pgbouncer_auth peer map=ospg # pgbouncer instance groups # the human and service users and hashes could be synced into PostgreSQL from each pgbouncer instance's auth_file # each such user should be granted the respective Role of the pgbouncer instance # e.g. GRANT pgbouncer_instance_name1 TO service1_user; local all +pgb_instance_name1 md5 local all +pgb_instance_name2 md5 ... local all +pgb_instance_nameN md5 # default peer-auth local all all peer
The lookup function
The lookup function is required only in the
postgres database; this is only used for haproxy health checks.
–- the auth_user has to be Granted select on the user_lookup function -- NB: pgbouncer crashes in certain cases when the function returns Record and the user doesn't exist => return SETOF record -- https://github.com/pgbouncer/pgbouncer/issues/568#issuecomment-1198500665 CREATE OR REPLACE FUNCTION public.user_lookup(i_username text, OUT uname text, OUT phash text) RETURNS SETOF record LANGUAGE sql SECURITY DEFINER SET search_path TO 'public', 'pg_temp' AS $function$ SELECT usename::text, passwd FROM pg_catalog.pg_shadow WHERE usename = i_username; $function$ REVOKE ALL ON FUNCTION public.user_lookup(text) FROM public;
Putting it all together
We can capture all the above configs in the following yaml interface:
pgbouncer: instances: adjust_teamA: databases: haproxy: host: /run/postgresql/ port: 5432 dbname: postgres user: haproxy pool_size: 10 service1_db: host: /run/postgresql port: 5432 haproxy_remote1: host: remote1.adjust.com port: 8432 dbname: haproxy user: haproxy pool_size: 10 remote: true remote1_db: host: remote1.adjust.com port: 8432 user: $dedicated_non_superuser1 remote: true config: scalar: listen_port: 7432 pool_size: 20 ... vector: stats_users: - zabbix admin_users: - pgbouncer - postgres ... hba: client_groups: # ip-addresses are interpreted within the configuration management - address: [service1_clients] database: all user: all method: scram type: host ... Client_hosts: # ip-addresses are listed verbatim - address: [198.168.l00.1] database: all user: all method: scram type: host userlist: - user: zabbix pgb_only: true # user only connects to pgbouncer database - user: pgbouncer pgb_only: true - user: postgres pgb_only: true - user: service1_user hash: scram... acl: # postgres authorization configuration could go here - user: human1_user hash: scram... applovin_teamA: ...
With the capable authentication facilities of PgBouncer, we managed to strip down PostgreSQL HBA to a bare minimum and shoved most of the details into multiple independent PgBouncer instances. At a mere glance, the scope of the connections the datateam allows is apparent and we can direct our auditors to the different owners of the PgBouncer instances for more context on which team connections are allowed and for which reasons.
The user-authentication process no longer requires oversight or manual input from the datateam. Instead, autonomous teams within their own confines, can deploy their own users and password-hashes and tune other connection settings (they have better contextual info about their services), then our local cronjob checks deployed PgBouncer
auth_files and ensures PostgreSQL logins are in sync with these.
This setup works even for teams with isolated configuration-management repositories; the only requirement is to successfully deploy a PgBouncer instance to the database server.
Teams can also deploy haproxies on their application servers to check the health of their PgBouncer connection endpoint; the health of PostgreSQL is checked implicitly.
The current setup doesn’t address authorization on PostgreSQL objects; perhaps we can talk about this sometime in the future.