Jim Wu's FPGA Blog: 2016

Wednesday, September 7, 2016

Use PGStrom with PostgreSQL on Ubuntu 14.04 LTS

Software Installation

Install useful utilities
sudo apt-get install wget git rpm

Install Cuda

Download Cuda local install package from https://developer.nvidia.com/cuda-downloads

Run the commands below to install Cuda and reboot

$sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
$sudo apt-get update
$sudo apt-get install cuda

Create/Edit /etc/ld.so.conf.d/cuda-lib64.conf with Cuda library path
```
$echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda-lib64.conf
```
Build the deviceQuery sample from Cuda installation and verify that the Cuda driver works

$cp -r /usr/local/cuda/samples .
$cd samples/1_Utilities/deviceQuery 
$make
$./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K1000M"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2047 MBytes (2146762752 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Max Clock rate:                            851 MHz (0.85 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K1000M
Result = PASS

Install PostgreSQL

Import PostgreSQL repository key

$wget -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -

Create or edit /etc/apt/sources.list.d/pgdg.list with the line below
```
deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
```

Configure apt's package pinning to prefer the PGDG packages over the Debian ones in /etc/apt/preferences.d/pgdg.pref. this will replace all your Debian/Ubuntu packages with available packages from the PGDG repository.
```
Package: *
Pin: release o=apt.postgresql.org
Pin-Priority: 500
```
Update the package lists, and install the pgdg-keyring package to automatically get repository key updates
```
$sudo apt-get update
$sudo apt-get install pgdg-keyring
```

Install PostgreSQL 9.5

$sudo apt-get install -y postgresql-common
$sudo apt-get install -y postgresql-9.5 postgresql-contrib-9.5 postgresql-server-dev-9.5 libpq-dev

The PostgreSQL server will automatically be started after installation. Run the commands below to manually start or restart the server if needed.
```
$sudo service postgresql restart
$sudo service postgresql start
```
Run the command below to start psql terminal and connect to the server
```
$sudo -u postgre psql
```

Install PGStrom [PGStrom Wiki]

Download and install PGStrom

$git clone https://github.com/pg-strom/devel pg_strom
$cd pg_strom
$which pg_config
/usr/local/pgsql/bin/pg_config
$make
$sudo make install

Edit /etc/postgresql/9.5/main/postgresql.conf to update configuration for PGStrom
- Add /usr/lib/postgresql/9.5/lib/pg_strom.so to shared_preload_libraries
Restart PostgreSQL service
```
$sudo service postgresql restart
```

PostgreSQL log file /var/log/postgresql/postgresql-9.5-main.log shows PGStrom is loaded with supported GPU device

$ cat /var/log/postgresql/postgresql-9.5-main.log 
LOG:  PG-Strom version 1.0devel built for PostgreSQL 9.5
LOG:  CUDA Runtime version: 7.5.0
LOG:  NVIDIA driver version: 352.63
LOG:  GPU0 Quadro K1000M (192 CUDA cores, 850MHz), L2 256KB, RAM 2047MB (128bits, 900MHz), capability 3.0
LOG:  NVRTC - CUDA Runtime Compilation vertion 7.5

Test PGStrom Integration with PostgreSQL

Start psql terminal
```
sudo -u postgres psql
```

Create pg_strom extension

postgres=#CREATE EXTENSION pg_strom;
CREATE EXTENSION

Create a test DB and connect to it

postgres=# create database testdb;
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".

Create test tables from SQL script in PGStrom installation. It creates one 100M rows table and 9 of 100K rows table.
```
postgres=# \i pgstrom/test/testdb.sql
```

Check pg_strom parameters used by postgresql:

postgres=# show all;
pg_strom.bulkexec                   | on                                       | Enables the bulk-execution mode of PG-Strom
pg_strom.chunk_limit                | 79360kB                                  | limit size of pgstrom_data_store
pg_strom.chunk_size                 | 15872kB                                  | default size of pgstrom_data_store
pg_strom.chunk_size_margin          | 1.25                                     | margin of chunk size if not predictable exactly
pg_strom.cpu_fallback               | on                                       | Enables CPU fallback if GPU is 
pg_strom.cuda_visible_devices       |                                          | CUDA_VISIBLE_DEVICES of CUDA runtime
pg_strom.debug_cuda_coredump        | off                                      | Turn on/off GPU coredump feature
pg_strom.debug_force_gpupreagg      | off                                      | Force GpuPreAgg regardless of the cost (debug)
pg_strom.debug_force_gpusort        | off                                      | Force GpuSort regardless of the cost (debug)
pg_strom.debug_kernel_source        | off                                      | Turn on/off to display the kernel source path
pg_strom.enable_gpuhashjoin         | on                                       | Enables the use of GpuHashJoin logic
pg_strom.enable_gpunestloop         | on                                       | Enables the use of GpuNestLoop logic
pg_strom.enable_gpupreagg           | off                                      | Enables the use of GPU preprocessed aggregate
pg_strom.enable_gpuscan             | on                                       | Enables the use of GPU accelerated full-scan
pg_strom.enable_gpusort             | off                                      | Enables the use of GPU accelerated sorting
pg_strom.enabled                    | off                                      | Enables the planner's use of PG-Strom
pg_strom.gpu_dma_cost               | 10                                       | Cost to send/recv data via DMA
pg_strom.gpu_operator_cost          | 7.8125e-05                               | Cost of processing each operators by GPU
pg_strom.gpu_setup_cost             | 4000                                     | Cost to setup GPU device to run
pg_strom.gpu_tuple_cost             | 0.0003125                                | Cost of processing each tuple for GPU
pg_strom.max_async_tasks            | 32                                       | max number of GPU tasks to be run asynchronously
pg_strom.num_threads_margin         | 1.1                                      | margin of number of CUDA threads if not predictable exactly
pg_strom.perfmon                    | off                                      | Enables the performance monitor of PG-Strom
pg_strom.program_cache_size         | 48MB                                     | size of shared program cache
pg_strom.pullup_outer_scan          | on                                       | Enables to pull up simple outer scan

Review the execution plan of a query and confirm GPU offload is enabled (GpuPreAgg, GpuJoin, etc)

testdb=# EXPLAIN                 
SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 HashAggregate  (cost=3349749.08..3349749.40 rows=26 width=12)
   Group Key: t0.cat
   ->  Custom Scan (GpuPreAgg)  (cost=16325.86..2879097.75 rows=234 width=44)
         Reduction: Local + Global
         GPU Projection: cat, ax
         ->  Custom Scan (GpuJoin) on t0  (cost=12325.86..2851751.79 rows=99599458 width=12)
               GPU Projection: t0.cat, t1.ax
               Depth 1: GpuHashJoin, HashKeys: (t0.aid)
                        JoinQuals: (t0.aid = t1.aid)
                        Nrows (in/out: 99.60%), KDS-Hash (size: 13.47MB, nbatches: 1)
               Depth 2: GpuHashJoin, HashKeys: (t0.bid)
                        JoinQuals: (t0.bid = t2.bid)
                        Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
               ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
               ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
(15 rows)

Turn off pgstrom and review the same execution plan (default join, hash, etc).

testdb=# set pg_strom.enabled=off; 
SET
testdb=# EXPLAIN                   
SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 HashAggregate  (cost=6843172.91..6843173.24 rows=26 width=12)
   Group Key: t0.cat
   ->  Hash Join  (cost=7250.00..6345175.62 rows=99599458 width=12)
         Hash Cond: (t0.aid = t1.aid)
         ->  Hash Join  (cost=3576.00..4188792.08 rows=99997856 width=8)
               Hash Cond: (t0.bid = t2.bid)
               ->  Seq Scan on t0  (cost=0.00..1833312.56 rows=99997856 width=12)
               ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
                     ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
         ->  Hash  (cost=1935.00..1935.00 rows=100000 width=12)
               ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
(11 rows)

Sunday, August 28, 2016

Host and Containers

Host: Ubuntu 14.04
- Add <username> to docker group so sudo is not required to run docker command
  sudo usermod -a -G docker username
- Create two directories below. They will be mounted to containers to make DB and development files persist outside the containers
  /opt/db/ws
  /opt/db/postgresql/data
Container: postgresql : https://hub.docker.com/_/postgres/
- This image includes EXPOSE 5432 (the postgres port), so standard container linking will make it automatically available to the linked containers. The default postgres user and database are created in the entry point with initdb.
- The default data directory for PostgreSQL is /var/lib/postgresql/data

Run PostgreSQL Containers for Server and Client

ssh <host>
docker run --name pgs-server -v /opt/db/ws:/opt/db/ws -v /opt/db/postgresql/data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -d postgres

run
start a new container using the requested image

--name pgs-server
Set container name to pgs-server.

-v /opt/db/postgresql/data:/var/lib/postgresql/data
-v /opt/db/ws:/opt/db/ws
Mount host directory /opt/db/postgresql as /var/lib/postgresql on the container so DB files persist on the hard disk
Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

-e POSTGRES_PASSWORD=pgsdev
Set up the POSTGRES_PASSWORD environment variable, which sets the master PostgreSQL password

-d
Run the container in the background (daemon mode). It will stay alive until it is removed

postgres
Runs postgres docker image
Optional: Start a bash shell on the pgs-server container for miscellaneous tasks
docker exec -it pgs-server bash
Start a PostgreSQL client container
docker run --name pgs-client -v /opt/db/ws:/opt/db/ws -it --rm --link pgs-server:postgres postgres psql -h postgres -U postgres

run
Start a new container using the requested image

--name pgs-client
Set container name to pgs-client. If this option is not provided, docker will assign a random name.

-v /opt/db/ws:/opt/db/ws
Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

-it
Run the container in interactive mode

--rm
Automatically cleanup the container after exit to avoid zombie containers
-link pgs-server:postgres
connects to the pgs-server container from the pgs-cleint container

postgres
Use postgres Docker image

psql -h postgres -U postgres
at the end tells Docker what command to execute when the container starts. In this case, start the interactive postgres terminal
Optional. Start a bash shell on the 2nd container for miscellaneous tasks
$docker exec -it pgs-client bash

Download TPC-H Tool

Download the source code for TPC-H tool from http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp
The instructions below are based on TPC-H tool 2.17.1 and Ubuntu LTS 14.04

Build TPC-H Tool

$mkdir /opt/db/tpch-tool
Save the TCPH Tool zip to /opt/db/tpch-tool directory as tpc-h-tool_2_17_0.zip
$unzip tpc-h-tool_2_17_0.zip
$cd tpch_2_17_0/dbgen
$cp makefile.suite Makefile
Update the Makefile with the lines below
CC=gcc
DATABASE=ORACLE
MACHINE=LINUX
WORKLOAD=TPCH

$make
This will build dbgen (database generation) and qgen (query generation) tools
Add the build directory to PATH environment variable so they can be executed without full path

Generate TPC-H Postgres DB

Generate TPC-H tables with scale factor 1 (1GB)
$dbgen -s 1
Fri Aug 26 12:28:15 PDT 2016
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Below are the tables generated and their sizes
$ls -l *.tbl
-rw-r--r-- 1 use grp 24346144 Aug 26 12:28 customer.tbl
-rw-r--r-- 1 use grp 759863287 Aug 26 12:28 lineitem.tbl
-rw-r--r-- 1 use grp      2224 Aug 26 12:28 nation.tbl
-rw-r--r-- 1 use grp 171952161 Aug 26 12:28 orders.tbl
-rw-r--r-- 1 use grp 118984616 Aug 26 12:28 partsupp.tbl
-rw-r--r-- 1 use grp 24135125 Aug 26 12:28 part.tbl
-rw-r--r-- 1 use grp       389 Aug 26 12:28 region.tbl
-rw-r--r-- 1 use grp   1409184 Aug 26 12:28 supplier.tbl
Below is the TPC-H Schema (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf)

The .tbl files have a "|" character at the end of each line, which is not compatible with postgresql and causes the error below during data import:
ERROR: extra data after last expected column
CONTEXT: COPY part, line 1: "1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PK..."

Run the command below to remove the last | on each line and save the result to .csv files
```
for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;
```

Start postgresql server and client containers. Check Run PostgreSQL with Docker

Create TPCH DB by running the commands below from psql terminal on postgresal client container:

postgres=# create database tpch;
CREATE DATABASE
postgres=# \c tpch;
You are now connected to database "tpch" as user "postgres".
tpch=# \i /opt/db/ws/tpch/sql/tpch-build-db.sql
BEGIN
CREATE TABLE
COPY 200000
COMMIT
BEGIN
CREATE TABLE
COPY 5
COMMIT
BEGIN
CREATE TABLE
COPY 25
COMMIT
BEGIN
CREATE TABLE
COPY 10000
COMMIT
BEGIN
CREATE TABLE
COPY 150000
COMMIT
BEGIN
CREATE TABLE
COPY 800000
COMMIT
BEGIN
CREATE TABLE
COPY 1500000
COMMIT
BEGIN
CREATE TABLE
COPY 6001215
COMMIT

ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
tpch=# \d+
                                List of relations
 Schema |          Name          |   Type   |  Owner   |    Size    | Description 
--------+------------------------+----------+----------+------------+-------------
 public | customer               | table    | postgres | 29 MB      | 
 public | customer_c_custkey_seq | sequence | postgres | 8192 bytes | 
 public | lineitem               | table    | postgres | 950 MB     | 
 public | nation                 | table    | postgres | 8192 bytes | 
 public | nation_n_nationkey_seq | sequence | postgres | 8192 bytes | 
 public | orders                 | table    | postgres | 216 MB     | 
 public | orders_o_orderkey_seq  | sequence | postgres | 8192 bytes | 
 public | part                   | table    | postgres | 32 MB      | 
 public | part_p_partkey_seq     | sequence | postgres | 8192 bytes | 
 public | partsupp               | table    | postgres | 143 MB     | 
 public | region                 | table    | postgres | 8192 bytes | 
 public | region_r_regionkey_seq | sequence | postgres | 8192 bytes | 
 public | supplier               | table    | postgres | 1848 kB    | 
 public | supplier_s_suppkey_seq | sequence | postgres | 8192 bytes | 
(14 rows)

Generate TPC-H Queries

cd /opt/db/ws/tpch/sql
cp /opt/db/tpch-tool/tpch_2_17_0/dbgen/dists.dss .
Run the command below to generate all 22 TPC-H queries in separate .sql file. Note that DSS_QUERY environment variable in the command line is required for qgen to find query templates

$for q in `seq 1 22`;do DSS_QUERY=/opt/db/tpch-tool/tpch_2_17_0/dbgen/queries qgen $q > $q.sql;done;

The tpc-h query templates don't work with PostgreSQL out of the box. Below is the result when running 1.sql in psql terminal:

psql:/opt/db/ws/tpch/sql/1.sql:49: ERROR: syntax error at or near "(" LINE 15: l_shipdate <= date '1998-12-01' - interval '97' day (3) ^ Time: 0.227 ms psql:/opt/db/ws/tpch/sql/1.sql:50: ERROR: syntax error at or near "where" LINE 1: where rownum <= -1;
Below is the result after the errors are fixed 1.sql

l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price |     sum_charge     |       avg_qty       |     avg_price      |        avg_disc
       | count_order
--------------+--------------+----------+----------------+------------------+--------------------+---------------------+--------------------+-----------------
-------+-------------
A            | F            | 37734107 | 56586554400.73 | 53758257134.8700 | 55909065222.827692 | 25.5220058532573370 | 38273.129734621672 | 0.04998529583839
761162 |     1478493
(1 row)

Time: 11534.090 ms

The tpch-postgres.7z has all 22 queries that have been modified to run in PostgreSQL.

"where rownum <=" statement is replaced with "limit" statement
Correlated aggregate subqueries with huge outer tables is replaced with JOIN (query 17, 20)

TPC-H Benchmark Result

Below is the

TPC-H Benchmark Result on a Dell Precision T3610 Workstation (CPU @3.5GHz with 64GB RAM) and PostgreSQL 9.5.4 from the latest postgres Docker image (as of August 2016).

TCP-H Query	SF=1 (ms)	Table Header
1	11534.090	Table Cell
2	398.402	Table Cell
3	1085.500	Table Cell
4	551.310	Table Cell
5	650.774	Table Cell
6	1304.219	Table Cell
7	965.973	Table Cell
8	1836.722	Table Cell
9	5330.943	Table Cell
10	1801.567	Table Cell
11	280.123	Table Cell
12	1889.902	Table Cell
13	1775.164	Table Cell
14	1339.493	Table Cell
15	0.192	Table Cell
16	1198.179	Table Cell
17	4607.648	Table Cell
18	4161.339	Table Cell
19	1673.113	Table Cell
20	2308.152	Table Cell
21	2130.221	Table Cell
22	585.929	Table Cell