Postgres for an Oracle DBA

Postgres for an Oracle DBA

An Introduction to Postgres for Oracle DBAs

David Bergmeier, Senior Database Consultant MGA

Introduction

This article seeks to introduce Postgres to the Oracle DBA. I will attempt to describe Postgres features using Oracle concepts and terminology. We will only consider the main community edition of Postgres – the free, open source one.

For a much closer Oracle experience with Postgres, have a look at Enterprise DB. Their version of Postgres has an Oracle compatibility layer that provides many features so that Oracle data and code (packages, procedures, triggers, etc.) can be easily converted. They also provide paid support options (for both EDB and the free Postgres) so arguments for not moving to Postgres due to “lack of support” are no longer valid.

Linux Operating System

Oracle only supports a limited number of Linux distributions, and in my experience, RedHat is the most commonly used. I will make a grand assumption that most Oracle DBAs working on Linux will be familiar with RedHat (RHEL-6 or RHEL-7). I will tend to favour RHEL-6. So if you’re an Oracle DBA looking to learn some Postgres, do yourself a favour and learn on a distro you’re already familiar with.

Postgres supports a much wider range of Linux distributions so it’s not unusual to see Ubuntu, Debian or FreeBSD. This article uses examples from Centos 6 to keep things familiar.

If you’re just starting out, I suggest you grab a copy of Oracle VirtualBox and setup a Centos 6 or 7 virtual machine. Postgres can run adequately (for learning) on a VM with as little as 1GB of RAM.

Software owner

Oracle is usually installed under a Linux account called “oracle” (ignoring databases running SAP systems) with primary group “oinstall”.

Postgres is usually installed under a Linux account called “postgres” with a primary group also called “postgres”. Bear in mind, the startup scripts assume this owner/group and don’t change the primary group to something else, like “dba”.

Installation directory

There’s a long tradition of installing Oracle into /u01/app/oracle/product//

Postgres traditionally goes into /var/lib/postgres
However, most RedHat (or similar) implementations I’ve seen generally use a site standard directory like /data or /app instead.

Home Directory

The Oracle installation procedure requires the “oracle” account to be created and the software is installed as “oracle”. Most sites will end up with the oracle account having a default home directory of /home/oracle

Postgres is often installed as root with a package manager (e.g. yum) which creates the postgres Linux account. This means the home directory is frequently /var/lib/postgres also.

As an Oracle DBA, I find this very annoying since I typically have a bunch of bash scripts, a crontab, and a bunch of SQL scripts to make life easy. I don’t like having these in the same directory as the database home. So creating the postgres user prior to installing the software means I can have a more familiar home directory /home/postgres which is different from the database location.

Incidentally, the $ORACLE_HOME equivalent is $PGDATA.

Multiple Homes

Oracle naturally supports multiple homes in the default directory structure. Postgres requires some forethought and planning and will be covered in a later article, but it is possible and quite easy with RedHat.

Configuration Files

Postgres has two main configuration files:
1. $PGDATA/postgresql.conf
This is the init.ora equivalent in text format.

2. $PGDATA/pg_hba.conf
There isn’t really an Oracle equivalent of this. This is the client authentication configuration file that determines who can access the database and how they authenticate. The default file limits connections to only those originating from localhost.

Note: some Linux distros put these files into /etc but RedHat distros are more like Oracle where the configs are with other database files rather than buried in the O/S config directories.

Startup Script

A standard Postgres startup script is automatically placed in /etc/init.d (Centos 6). There is strong temptation to modify this file to customize settings (like PGDATA or PGPORT) for a specific installation. This is a mistake. The proper solution is to create a file in /etc/sysconfig/pgsql/ with the same name as the startup script with the correct settings. This will be automatically sourced by the startup script and will override the default settings.

Terminology

Finally, we come to the terminology that describes the database. Postgres is a mix of both Oracle and Microsoft SQL Server.

The “postmaster” is equivalent to the main Oracle executable – in Linux it’s just called “oracle” but think of it as “oracle.exe”.

There’s no RAC equivalent in Postgres, so a “Postgres cluster” is just a single instance or a cluster of databases rather than a cluster of instances.

Inside a cluster, we have three or more databases (think SQL Server). There’s the default “postgres” database and two template databases, plus any databases created by the DBA.

Each database has one or more schemas (same as Oracle) but unlike Oracle, schemas have a separate namespace from users, so it’s possible to have a schema called “accounts” without the corresponding user of the same name. The default schema is called “public”.

Users in Postgres are implemented as roles with a login attribute. Normal roles (same as Oracle roles) are just the collection of grants but with no-login. This is probably why Postgres has not implemented any password expiry functionality. However, to lock a user account it’s simply a matter of changing the role from “login” to “nologin”. Users exist at the Postgres cluster level, but access may be restricted to only some databases.

Views are pretty standard in Postgres.

Stored code (packages, procedures and functions) are only implemented as functions in Postgres. A procedure is merely a function that returns zero or null.

Background Processes

An Oracle instance runs several background processes, each doing a specific task (e.g. SMON, PMON, DBWR, LOGWR, ARCH, RECO, etc.) Plus there’s the listener and a process per session (assuming no MTS – which I haven’t seen used in years).

Postgres also runs several background processes in much the same way. Here are the processes from my version 10 primary database:

• postmaster
• logger process
• checkpointer process
• writer process
• wal writer process
• autovacuum launcher process
• archiver process
• stats collector process
• bgworker: logical replication launcher

Postgres also launches a separate process per user session.

Transactions

Redo logs are called Write Ahead Logs or just WAL.

There are the online WAL files that are a uniform size but managed by Postgres – dynamically adding and removing files as required but within the settings of parameters min_wal_size and max_wal_size.

Then there are the archived WAL files. The big difference is that Oracle allow you simply specify a location for the archived logs. With Postgres, you need to provide the command (generally “cp” with the required arguments) to put the WAL copy into the desired location. Oracle provide all the code to do this plus the handling for any error conditions (like directory does not exist, or filesystem full, etc.). With Postgres you need to worry about all that yourself. Oracle generally tells you what the problem is, with Postgres, you must find it out for yourself.

Memory

Memory parameter setup is quite different between Oracle and Postgres. In Oracle (ignoring ASM) you typically set FILESYSTEMIO_OPTIONS to “setall” which enables both direct and asynchronous IO where possible. The buffer cache is generally sized quite high, sometimes as high as 60-80% of the available RAM.

Postgres is different and relies on the operating system IO buffers. So the general advice is to set the buffer cache (parameter shared_buffers) to 25% of the available RAM.

Storage

A basic (simplistic) storage hierarchy in Oracle starts with a table (non-partitioned). The table belongs to one tablespace. Between the table and the tablespace are many datafiles which can be spread across multiple filesystems. So, spreading a single table across many filesystems without partitioning is nothing special in Oracle.
Postgres is different. There is no equivalent concept of Oracle datafiles and a tablespace is implemented as a filesystem. Hence, there is no way to spread a single table across multiple filesystems. This means your largest table must be smaller than your largest filesystem. If that table grows, then you may need to grow the underlying filesystem.

Thankfully in Postgres version 10, partitioning finally got a decent implementation so the lack of datafiles is no longer the problem it used to be.

UNDO

Consistent reads and readers not blocking writers has long been one of the distinguishing features of Oracle. It is implemented by keeping prior versions of updated rows for in-flight SELECTs on a special “undo” tablespace.

Postgres undo data is stored within the relevant object. This is likely a much easier implementation but leads to table and index bloat. The solution is that a background activity named “vacuum” that removes the undo versions and releases the space for reuse. This is a necessary task and turning it off will result (eventually) in a hung database that won’t do anything until the vacuum has run.

Replication

Another recently new feature in Postgres is streaming replication. This is equivalent to Active Data Guard. The WAL records are streamed asynchronously (or synchronously if required) from primary and applied to a standby replica. The standby can be open for read only transactions.

Logical replication is also supported in Postgres. The WAL records are converted into standard SQL and applied to a read/write replica.

Versions

The default RedHat yum repository for many years only had Postgres version 8.4. If you wanted anything newer, you had to find a different yum repo or install from source. Version 8.4 was first released in July 2009. It is over 10 years old and no longer supported. It should be avoided at all costs.

There is enough new functionality and stability in versions 11 and 10 to recommend their use for production database environments. Of special interest are the partitioning and parallel query enhancements.

Other Postgres Tools

One of the benefits of open source software is that anyone can contribute. Whether that’s a direct contribution to the product, documentation, blogs or related products to enhance Postgres.

Postgres alone is arguably not quite “enterprisey” yet. However, there are a few related products released by the community that enhance Postgres and give it that extra boost in support, monitoring and functionality. Here are a few that I personally have found helpful.

• pgbadger
This is a utility that reads the Postgres alert logs and produces a html report. It’s possibly the closest thing to AWR for Postgres. It does require some of the logging parameters to be set a certain way so that the reports are meaningful.

• Repmgr
If you want to use Postgres streaming replication (think active Data Guard), I strongly recommend setting it up manually and trying some failovers. This will help you understand what it is and how it works. Then install repmgr and see how easy it can be managed. This tool is worth it just for the simple switchover functionality (swapping primary and standby roles).

• Barman
Also by the makers of repmgr – this tool is a simple backup and recovery manager. It works best when setup on a different server from the one you are backing up as it uses streaming replication to take the backup and backup the archive logs (archive WAL files). This allows it to achieve near-zero or zero data loss.

• check_postgres.pl
Vanilla Postgres does not have an OEM equivalent. I guess the attitude is that monitoring is a different problem for which there are numerous open source solutions already (Nagios, MRTG, Cacti, etc.) This perl script bundles a whole stack of Postgres checks into a simple script that easily plugs into these monitoring tools. It can even be used standalone with some simple custom-built bash wrapper scripts.

• pgpool
This is one of the more complex products to setup. It basically provides connection pooling for a streaming replication environment. All database connections are made to pgpool where writes get sent to the primary and reads may go to either the primary or standby. It provides automatic failover if the primary goes down unexpectedly and will promote a standby to be a new primary.
The pgpool servers are also clustered, so if the master pgpool server goes down, the slave can take over. It uses a floating IP address so client connections will only experience a short outage, typically less than 30 seconds.