Extensions for PostgreSQL, how we participate at Adjust?

PostgreSQL is an open-source relational database management system. A large number of PostgreSQL functions are available as standard, and documentation for these can be found in the official documentation. However, it is possible to create your own extensions for specific needs. You can check PGXN, the PostgreSQL Extension network. This is a central distribution system for open-source PostgreSQL extensions. In this blog post I’m going to talk about the different PostgreSQL extensions developed by Adjust.

At Adjust, we use distinct extensions for different needs. In the list of extensions we have, there are public extensions that have been developed by us and shared to the PostgreSQL community. We also have private extensions, but these contain sensitive data that cannot be divulged, so they cannot be shared here.

We have about 30 extensions, 13 of which are public. Let me show you the public ones and explain what they are for.

elephants_postgres_extension Image from pixbay

Open Source Extensions

ajbool

The ajbool extension introduces a special boolean data type that allows for a distinct “unknown” value separate from NULL. With the states of “true” (t), “false” (f), and “unknown” (u). Using “ajbool” simplifies the manipulation of boolean values in databases while providing advanced features for complex environments.

SELECT true::ajbool, false::ajbool, NULL::bool::ajbool;
 ajbool | ajbool | ajbool
--------+--------+--------
 t      | f      | u

first_last_agg

The “first_last_agg” extension offers a simple yet powerful solution for retrieving the first and last values within a group. Since the default ordering of groups created by a GROUP BY expression is not defined in PostgreSQL, it is advisable to use an ODER BY clause inside the aggregate expression. This extension provides the necessary aggregate functions to obtain the desired results reliably. By incorporating an ORDER BY clause within the aggregate expression, users can ensure consistent and accurate retrieval of the first and last values within each group.

SELECT akey, first(val1 ORDER BY val2) AS first, last(val1 ORDER BY val2) AS last FROM agg_test GROUP BY akey ORDER BY akey;

Link to Extension
Link to PGXN Entry (Not published by adjust because already existe on PGXN)

hashtypes

This extension is actually a fork of shatypes which adds some other data types as crc32 and provides some fixes to original implementations.

istore

This extension lets you store and manipulate key-value pairs where both keys and values are integers. It provides operators and functions to manipulate this data efficiently. The main objective of the istore extension is to optimize analytical workloads that require element distributions represented as integers.

Suppose you have an event log table where you aggregate events with an identifier and segmentation by date. Using the istore extension would allow you to use the identifier as a key and the account or revenue as values.

The extension provides operators to perform specific operations on key-value pairs.

It also offers aggregation functions such assum, min and max to perform calculations on the values of a collection of istores.

In summary, the istore extension offers an efficient alternative for storing and manipulating integer key-value pairs, particularly suited to analytical workloads requiring fast calculation and aggregation operations.

pg-base62 and pg-base36

These extensions let you encode and decode values using the`base62/base36 encoding scheme. It provides specific data types for storing these encoded values.

The extension lets you convert values between base62/base36 and integer types (int or bigint). It also supports conversion from type bytea to type hugebase62 respective hugebase36.

Example

-- Cast from text
=# select '2LKcb1'::base62::int;
    int4    
------------
 2147483647

The pg-base62 extension enables values to be encoded and decoded using the base62 encoding scheme, providing indexing functionality for efficient use. base62 stores encoded value in 4 bytes integer. The maximum length of string representation is 6 characters and maximum value of numeric representation is 2147483647.

Blog post about base 62: https://helloacm.com/base62

pg-base62

pg-base36

pg-country

The pg-country extension introduces a new data type called country, which is implemented as an enum type.

The values of the country enumeration adhere to the ISO 3166-1 alpha-2 standard.

In addition to the data type, the extension includes comparison operators and operator classes tailored for btree and hash indexes. This enables efficient indexing and searching operations based on the country data type.

pg-currency

The pg-currency extension adds a currency data type for storing and manipulating ISO 4217 currency codes. It offers comparison and indexing operations for currencies. There are also comparison operators to support ordering and btree indexing.

List of supported currencies (around 163 supported)

SELECT * FROM supported_currencies();

pg-device_type

The pg-device_type extension introduces a new data type called device_type, which is implemented as an enum type.

The device_type enum includes the following possible values:

Bot
Console
Ipod
Mac
Pc
Phone
Server
Simulator
Tablet
Tv
Unknown

Furthermore, the extension provides comparison operators and operator classes optimized for btree and hash indexes. These operators facilitate efficient indexing and searching operations based on the device_type data type, enhancing the performance of queries involving device types. This extension is very useful if you work with different devices, and if the type of device you use has an influence on the data.

pg_lock_pool

First of all, it can be useful to set a lock on a pool, which can be used to manage concurrent access, connection control or performance optimization.

The pg-lock_pool extension, as its name suggests, is for waiting on a lock pool. We have functions for session get_lock_pool(pool int, poolsize int, timeout int) and transaction get_xact_lock_pool(pool int, poolsize int, timeout int). The pool size can be defined by the application for each call. Timeout can be defined in seconds (default is 10).

A return value is returned:

positive integer which can be used to manually release the lock
-1 If timeout occurs

Otherwise the lock is automatically released on session end or transaction end

Example with session:

-- wait 30 sec to get a lock from the pool of size 3
SELECT get_lock_pool(999, 3, 30);
 get_lock_pool
---------------
             1
(1 row)

-- release lock
SELECT pg_advisory_unlock(999, 1);
 pg_advisory_unlock
--------------------
 t
(1 row)

pg-mvtbl

As we saw in my previous article, Using Tablespaces in PostgreSQL, it’s not always easy to move tables between different tablespaces.That’s where the pg-mvtbl extension comes in.

This extension enables you to use the mvtbl(tablname, tablespace) function. This function moves a table with all its indexes to a given tablespace. It returns the size of the moved data in bytes.

Example:

SELECT pg_size_pretty(mvtbl('test','mvtbl_test_tblspace'));
 pg_size_pretty 
----------------
 123 MB
(1 row)

pg-telemetry

PostgreSQL is a big tool, and we sometimes need information on how it’s being used. So we have pg_telemetry, which is an extension for monitoring usage of PostgreSQL.

The purpose of this module is to offer a set of functions that extract usage statistics from system statistics and other system administration tasks. We have a collection of metrics functions that are widely applicable and can be reused for monitoring purposes. pg_telemetry is made so that data can be defined as export numbers to be used to supply data to SNMP agents, Zabbix agents and other monitoring programs. This module is designed to be compatible with monitoring and visualization programs like pganalyze, Zabbix, and Munin. These programs typically employ an agent that gathers performance data and transmits it to a server for generating graphics.

In addition to the general stats functions, this module also includes a separate set of functions specifically designed for administrators to troubleshoot and debug performance issues. However these are to be used as needed while the general stats functions are assumed to be run every few min at least.

During the initial phase of this project, there are several areas of focus:

Disk usage
Sources of concurrent queries
WAL throughput
Replication Monitoring

Requirement:

wltree

This is an enhanced version of the Postgres ltree module. The main difference in this patched version is the usage of “::” instead of “.” as the label separator. Additionally, it allows for the inclusion of special characters such as “{”, “}”, “!”, and “*”, within ltree labels. To query and match these labels in ltree queries, you can escape these special characters accordingly.

Summary

We can see that the Adjust IT team is doing its best to participate in the PostgreSQL project. Whenever possible, extensions are made public to make them accessible to other PostgreSQL users. By making select extensions publicly accessible, Adjust is fostering a culture of shared knowledge. We also have a certain amount of confidential information, which explains the number of extensions that must remain private. I strongly encourage you to either utilize these Adjust’s extensions or create your own, and then share them with others to foster collaboration and innovation. As I say in the introduction, the best place to do that for Postgres is PGXN. Following this link you can have the list of our public extensions.

Open Source Extensions#

ajbool#

first_last_agg#

hashtypes#

istore#

pg-base62 and pg-base36#

pg-country#

pg-currency#

pg-device_type#

pg_lock_pool#

pg-mvtbl#

pg-telemetry#

wltree#

Summary#