PostgreSQL Installation and Custom Data Directory on Amazon Linux 2:





Overview

In this article we will install PostgreSQL on an AWS EC2 instance with the Amazon Linux 2 AMI. Furthermore, we will be moving the PostgreSQL data directory from its default location to an attached Elastic Block Storage (EBS) Volume that is mounted at /data/. The EBS has already been mounted to the EC2 instance

Installing PostgreSQL

There are a lot of good resources out there that I followed to learn how to do this but none of them on their own describe the entire process (at least on the Amazon Linux 2 OS). In general I followed these three tutorials from Digital Ocean How To Move a PostgreSQL Data Directory to a New Location on Ubuntu 16.04, How To Use PostgreSQL with your Django Application on Ubuntu 14.04, How To Install and Use PostgreSQL on Ubuntu 16.04; this tutorial on basic PostgreSQL installation and functionality from linode that involves the Centos OS which is similar to the Amazon Linux 2 we are using, How to Install PostgreSQL Relational Databases on CentOS 7; and finally this blog post that addresses how to change the data directory location, Configure Customized Data Directory for Postgresql on Ubuntu 14.04. Additionally we are going to be using the systemd service manager and its command systemctl quite a bit here, its really not within the scope of what I'm trying to show here but a good overview is available from Digital Ocean and from the [Fedora Project][9]

The first step is to install the necessary PostgreSQL packages.

$ sudo yum install postgresql postgresql-contrib postgresql-libs postgresql-server

At the time of this writing, this command installs PostgreSQL version 9.2.23-3. Following the other tutorials we can initialize PostgreSQL and start the server, while leaving the data directory in its default location:

$ sudo postgresql-setup initdb
Initializing database ... OK

$ sudo systemctl start postgresql.service

When PostgreSQL is installed it automatically creates a linux user named postgres, you can confirm that the user was created by listing all of the Linux user names:

$ cut -d : -f 1 /etc/passwd
...<other users> ...
postgres

The documentation for this particular installation, i.e. on Amazon Linux 2, is available at /usr/share/doc/postgresql-9.2.23/README.rpm-dist

It creates this linux user because the users of the PostgreSQL database management system (called roles) are paired to a corresponding linux user with the same name and the default super-user for PostgreSQL is named 'postgres'.

We can login to our newly installed PostgreSQL system (via the postgresql linux user with the -u command) and check on the default location of our data directory

$ sudo -u postgres psql
psql (9.2.23)
Type "help" for help.

postgres=#
postgres=# SHOW data_directory;
data_directory    
---------------------
/var/lib/pgsql/data
(1 row)

To even access the data directory you will need to have super user permissions, run $ sudo su, to deactivate use 'ctrl-d'.

So for this installation the default data directory location is at /var/lib/pgsql/data, this is the location of the main configuration file too, /var/lib/pgsql/data/postgresql.conf'. If you open the file you will see that almost everything is commented out, including the variable that controls the location of the data directory, data_directory.

Moving the data directory

Our next goal is to move the PostgreSQL data directory from its current default location, which is on the root EBS, to the EBS mounted at /data/, specifically we will create a new directory within the volume /data/postgresql/. This is where we need to go off the beaten path provided by the other tutorials. According to them, stopping the PostgreSQL service, changing the config file data directory setting, copying the data directory, and restarting the service is sufficient. Here that won't work.

If you would like to you can always try it out and see for yourself. The worst thing that can happen in editing the PostgreSQL files is you mess up your installation so bad that you need to sudo yum remove postgresql* and then go back and reinstall. Just be careful in superuser mode not to edit/delete any other files.

Before we attempt to reset the data directory location we should stop the current PostgreSQL process that we started earlier and confirm that it is off.

$ sudo systemctl stop postgresql.service
$ sudo systemctl status postgresql.service
... Stopped PostgreSQL database server.

In our case we will need to do three things to change the data directory:

  1. Create and change the ownership of /data/postgresql from root to the postgres Linux user:

    Step one is the simplest, the ownership of /data and all subdirectories can be changed to the postgres user and its group (also named postgres) with:

      $ sudo mkdir /data/postgresql
      $ sudo chown -R postgres:postgres /data
  2. Initialize a new database location with the PostgreSQL initdb -D command:

    A command, initdb is provided by the PostgreSQL installation to initialize database locations, but we must execute the command as the postgres user. Using it we can initialize a data directory in our desired location.

      $ sudo -u postgres initdb -D /data/postgresql    
      The files belonging to this database system will be owned by user "postgres".
      This user must also own the server process.
      .
      .
      .
      WARNING: enabling "trust" authentication for local connections
      You can change this by editing pg_hba.conf or using the option -A, or
      --auth-local and --auth-host, the next time you run initdb.
      Success. You can now start the database server using:
          postgres -D /data/postgresql
      or
          pg_ctl -D /data/postgresql -l logfile start
    
      You can confirm that this sets up a data directory in `/data/postgresql/`
      by temporarily logging in as the postgres user and looking around  
    $ sudo -iu postgres
    -bash-4.2$ cd /data/postgresql

    Furthermore, the commands provided at the end of the initdb command output will work to start our server and use our database from it's new location.

    -bash-4.2$ postgres -D /data/postgresql &
    -bash-4.2$ psql
    # SHOW data_directory;
    data_directory  
    ------------------
    /data/postgresql
    (1 row)

    Make sure to stop the process before you logout the postgres user

    -bash-4.2$ fg
    -bash-4.2$ <ctrl-C>
    -bash-4.2$ <ctrl-D>
    $
  3. Update our system management files (i.e. the systemd unit files that the systemctl command reads) to run the PostgreSQL server daemon from the new data directory:

    So now we have the PostgreSQL server running and using our new directory. But we don't want to have to run this as a background process from the command line every time we need our database (i.e. all the time). Ideally we would run it as a linux service controlled by systemd with the systemctl command.

    If we try to restart it that way we run into troubles. If we use the same commands as before to login and look at the data_directory we will see that it hasn't moved:

    $ sudo systemctl start postgresql.service
    $ sudo -u postgres psql
    psql (9.2.23)
    Type "help" for help.
    
    postgres=# SHOW data_directory;
    data_directory    
    ---------------------
    /var/lib/pgsql/data
    (1 row)

    Be sure to stop the service with sudo systemctl stop postgresql.service.

    To fix this we need to update the unit file, postgres.service, with the new data directory location. That file is located at /usr/lib/systemd/system/postgresql.service and you will need to be logged in as a superuser to modify it.
    Once you open it you can see in the commented out header section that it suggests that we not modify this file directly (and a link to the Fedora Project page describing how to use systemctl). We will follow their instructions and make another unit file, with the same name but in /etc/systemd/system/.

    While we are looking at the original file we can see in the [Service] section the variable that we would like to change is PGDATA

    [Service]
    ...
    # Location of database directory
    Environment=PGDATA=/var/lib/pgsql/data

    So following the instructions provided we should create another unit file for PostgreSQL and then we can change the data directory.

    $ sudo su
    root$ touch /etc/systemd/system/postgresql.service
    Then open that file and copy these settings:
    # /etc/systemd/system/postgresql.service
    
    # The custom unit file for postgresql
    # I want to move the data directory
    # to '/data/postgresql'
    
    # Copied from original unit file ...
    # Note: changing PGPORT or PGDATA will typically require adjusting SELinux
    # configuration as well; see /usr/share/doc/postgresql-*/README.rpm-dist.
    
    # In our case SElinux is not enabled
    
    # Load the default unit file
    .include /lib/systemd/system/postgresql.service
    
    # Change data directory
    [Service]
    Environment=PGDATA=/data/postgresql
    
    # Note that this overwrites the setting in the
    # original postgresql unit file

    Now we can restart the service and make sure that the data_directory is located where we want it.

    $ sudo systemctl start postgresql.service
    $ sudo -u postgres psql
    psql (9.2.23)
    Type "help" for help.
    
    postgres=# SHOW data_directory;
      data_directory  
    ------------------
     /data/postgresql
    (1 row)

Now when we create a database and tables we will be saving the data to /data/postgresql just like we wanted. A really good walkthrough of configuring the PostgreSQL database with a Django project is provided in this Digital Ocean tutorial.