Monday, December 26, 2016

Xilinx SDx Tips

This blog shows the live tweet stream with #XilinxSDxTips hashtag. I will start sharing my tips with this hashtag. Your contribution is also welcome and it's as simple as tweeting with the same hashtag !

Wednesday, September 7, 2016

Use PGStrom with PostgreSQL on Ubuntu 14.04 LTS

Software Installation

Install useful utilities
sudo apt-get install wget git rpm 

Install Cuda
  • Download Cuda local install package from

  • Run the commands below to install Cuda and reboot
  • $sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
    $sudo apt-get update
    $sudo apt-get install cuda

  • Create/Edit /etc/ with Cuda library path
    $echo /usr/local/cuda/lib64 > /etc/
  • Build the deviceQuery sample from Cuda installation and verify that the Cuda driver works
  • $cp -r /usr/local/cuda/samples .
    $cd samples/1_Utilities/deviceQuery 
    ./deviceQuery Starting...
     CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: "Quadro K1000M"
      CUDA Driver Version / Runtime Version          7.5 / 7.5
      CUDA Capability Major/Minor version number:    3.0
      Total amount of global memory:                 2047 MBytes (2146762752 bytes)
      ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
      GPU Max Clock rate:                            851 MHz (0.85 GHz)
      Memory Clock rate:                             900 Mhz
      Memory Bus Width:                              128-bit
      L2 Cache Size:                                 262144 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K1000M
    Result = PASS
Install PostgreSQL
  • Import PostgreSQL repository key
    $wget -O - | sudo apt-key add -

  • Create or edit /etc/apt/sources.list.d/pgdg.list with the line below
    deb trusty-pgdg main

  • Configure apt's package pinning to prefer the PGDG packages over the Debian ones in /etc/apt/preferences.d/pgdg.pref. this will replace all your Debian/Ubuntu packages with available packages from the PGDG repository.
    Package: *
    Pin: release
    Pin-Priority: 500
  • Update the package lists, and install the pgdg-keyring package to automatically get repository key updates
    $sudo apt-get update
    $sudo apt-get install pgdg-keyring
  • Install PostgreSQL 9.5
    $sudo apt-get install -y postgresql-common
    $sudo apt-get install -y postgresql-9.5 postgresql-contrib-9.5 postgresql-server-dev-9.5 libpq-dev
  • The PostgreSQL server will automatically be started after installation. Run the commands below to manually start or restart the server if needed.
    $sudo service postgresql restart
    $sudo service postgresql start
  • Run the command below to start psql terminal and connect to the server
    $sudo -u postgre psql

Install PGStrom [PGStrom Wiki]
  • Download and install PGStrom
    $git clone pg_strom
    $cd pg_strom
    $which pg_config
    $sudo make install
  • Edit /etc/postgresql/9.5/main/postgresql.conf to update configuration for PGStrom
    • Add /usr/lib/postgresql/9.5/lib/ to shared_preload_libraries
  • Restart PostgreSQL service
    $sudo service postgresql restart
  • PostgreSQL log file /var/log/postgresql/postgresql-9.5-main.log shows PGStrom is loaded with supported GPU device
    $ cat /var/log/postgresql/postgresql-9.5-main.log 
    LOG:  PG-Strom version 1.0devel built for PostgreSQL 9.5
    LOG:  CUDA Runtime version: 7.5.0
    LOG:  NVIDIA driver version: 352.63
    LOG:  GPU0 Quadro K1000M (192 CUDA cores, 850MHz), L2 256KB, RAM 2047MB (128bits, 900MHz), capability 3.0
    LOG:  NVRTC - CUDA Runtime Compilation vertion 7.5

Test PGStrom Integration with PostgreSQL

  • Start psql terminal
    sudo -u postgres psql
  • Create pg_strom extension
    postgres=#CREATE EXTENSION pg_strom;
  • Create a test DB and connect to it
    postgres=# create database testdb;
    postgres=# \c testdb
    You are now connected to database "testdb" as user "postgres".
  • Create test tables from SQL script in PGStrom installation. It creates one 100M rows table and 9 of 100K rows table.
    postgres=# \i pgstrom/test/testdb.sql
  • Check pg_strom parameters used by postgresql:
    postgres=# show all;
    pg_strom.bulkexec                   | on                                       | Enables the bulk-execution mode of PG-Strom
    pg_strom.chunk_limit                | 79360kB                                  | limit size of pgstrom_data_store
    pg_strom.chunk_size                 | 15872kB                                  | default size of pgstrom_data_store
    pg_strom.chunk_size_margin          | 1.25                                     | margin of chunk size if not predictable exactly
    pg_strom.cpu_fallback               | on                                       | Enables CPU fallback if GPU is 
    pg_strom.cuda_visible_devices       |                                          | CUDA_VISIBLE_DEVICES of CUDA runtime
    pg_strom.debug_cuda_coredump        | off                                      | Turn on/off GPU coredump feature
    pg_strom.debug_force_gpupreagg      | off                                      | Force GpuPreAgg regardless of the cost (debug)
    pg_strom.debug_force_gpusort        | off                                      | Force GpuSort regardless of the cost (debug)
    pg_strom.debug_kernel_source        | off                                      | Turn on/off to display the kernel source path
    pg_strom.enable_gpuhashjoin         | on                                       | Enables the use of GpuHashJoin logic
    pg_strom.enable_gpunestloop         | on                                       | Enables the use of GpuNestLoop logic
    pg_strom.enable_gpupreagg           | off                                      | Enables the use of GPU preprocessed aggregate
    pg_strom.enable_gpuscan             | on                                       | Enables the use of GPU accelerated full-scan
    pg_strom.enable_gpusort             | off                                      | Enables the use of GPU accelerated sorting
    pg_strom.enabled                    | off                                      | Enables the planner's use of PG-Strom
    pg_strom.gpu_dma_cost               | 10                                       | Cost to send/recv data via DMA
    pg_strom.gpu_operator_cost          | 7.8125e-05                               | Cost of processing each operators by GPU
    pg_strom.gpu_setup_cost             | 4000                                     | Cost to setup GPU device to run
    pg_strom.gpu_tuple_cost             | 0.0003125                                | Cost of processing each tuple for GPU
    pg_strom.max_async_tasks            | 32                                       | max number of GPU tasks to be run asynchronously
    pg_strom.num_threads_margin         | 1.1                                      | margin of number of CUDA threads if not predictable exactly
    pg_strom.perfmon                    | off                                      | Enables the performance monitor of PG-Strom
    pg_strom.program_cache_size         | 48MB                                     | size of shared program cache
    pg_strom.pullup_outer_scan          | on                                       | Enables to pull up simple outer scan
  • Review the execution plan of a query and confirm GPU offload is enabled (GpuPreAgg, GpuJoin, etc)
    testdb=# EXPLAIN                 
                                             QUERY PLAN                                          
     HashAggregate  (cost=3349749.08..3349749.40 rows=26 width=12)
       Group Key:
       ->  Custom Scan (GpuPreAgg)  (cost=16325.86..2879097.75 rows=234 width=44)
             Reduction: Local + Global
             GPU Projection: cat, ax
             ->  Custom Scan (GpuJoin) on t0  (cost=12325.86..2851751.79 rows=99599458 width=12)
                   GPU Projection:,
                   Depth 1: GpuHashJoin, HashKeys: (t0.aid)
                            JoinQuals: (t0.aid = t1.aid)
                            Nrows (in/out: 99.60%), KDS-Hash (size: 13.47MB, nbatches: 1)
                   Depth 2: GpuHashJoin, HashKeys: (
                            JoinQuals: ( =
                            Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
                   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
                   ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
    (15 rows)
  • Turn off pgstrom and review the same execution plan (default join, hash, etc).
    testdb=# set pg_strom.enabled=off; 
    testdb=# EXPLAIN                   
                                        QUERY PLAN                                    
     HashAggregate  (cost=6843172.91..6843173.24 rows=26 width=12)
       Group Key:
       ->  Hash Join  (cost=7250.00..6345175.62 rows=99599458 width=12)
             Hash Cond: (t0.aid = t1.aid)
             ->  Hash Join  (cost=3576.00..4188792.08 rows=99997856 width=8)
                   Hash Cond: ( =
                   ->  Seq Scan on t0  (cost=0.00..1833312.56 rows=99997856 width=12)
                   ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
                         ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
             ->  Hash  (cost=1935.00..1935.00 rows=100000 width=12)
                   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
    (11 rows)

Sunday, August 28, 2016

Run PostgreSQL on Docker

Host and Containers

  • Host: Ubuntu 14.04
    • Add <username> to docker group so sudo is not required to run docker command
      sudo usermod -a -G docker username
    • Create two directories below. They will be mounted to containers to make DB and development files persist outside the containers

  • Container: postgresql :
    • This image includes EXPOSE 5432 (the postgres port), so standard container linking will make it automatically available to the linked containers. The default postgres user and database are created in the entry point with initdb.
    • The default data directory for PostgreSQL is /var/lib/postgresql/data

Run PostgreSQL Containers for Server and Client

  1. ssh <host>
  2. docker run --name pgs-server -v /opt/db/ws:/opt/db/ws -v /opt/db/postgresql/data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -d postgres 

    start a new container using the requested image

    --name pgs-server
    Set container name to pgs-server.

    -v /opt/db/postgresql/data:/var/lib/postgresql/data
    -v /opt/db/ws:/opt/db/ws
    Mount host directory /opt/db/postgresql as /var/lib/postgresql on the container so DB files persist on the hard disk  
    Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

    Set up the POSTGRES_PASSWORD environment variable, which sets the master PostgreSQL password

    Run the container in the background (daemon mode). It will stay alive until it is removed

    Runs postgres docker image

  3. Optional: Start a bash shell on the pgs-server container for miscellaneous tasks
    docker exec -it pgs-server bash

  4. Start a PostgreSQL client container
    docker run --name pgs-client -v /opt/db/ws:/opt/db/ws -it --rm --link pgs-server:postgres postgres psql -h postgres -U postgres

    Start a new container using the requested image

    --name pgs-client
    Set container name to pgs-client. If this option is not provided, docker will assign a random name.

    -v /opt/db/ws:/opt/db/ws
    Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

    Run the container in interactive mode

    Automatically cleanup the container after exit to avoid zombie containers
    -link pgs-server:postgres
    connects to the pgs-server container from the pgs-cleint container

    Use postgres Docker image

    psql -h postgres -U postgres
    at the end tells Docker what command to execute when the container starts. In this case, start the interactive postgres terminal

  5. Optional. Start a bash shell on the 2nd container for miscellaneous tasks
    $docker exec -it pgs-client bash

TPC-H Queries on PostgreSQL

Download TPC-H Tool

Download the source code for TPC-H tool from
The instructions below are based on TPC-H tool 2.17.1 and Ubuntu LTS 14.04

Build TPC-H Tool

  • $mkdir /opt/db/tpch-tool
  • Save the TCPH Tool zip to /opt/db/tpch-tool directory as
  • $unzip
  • $cd tpch_2_17_0/dbgen
  • $cp makefile.suite Makefile
  • Update the Makefile with the lines below
  • $make
    This will build dbgen (database generation) and qgen (query generation) tools
  • Add the build directory to PATH environment variable so they can be executed without full path

Generate TPC-H Postgres DB

    • Generate TPC-H tables with scale factor 1 (1GB)
      $dbgen -s 1
      Fri Aug 26 12:28:15 PDT 2016
      TPC-H Population Generator (Version 2.17.0)
      Copyright Transaction Processing Performance Council 1994 - 2010

    • Below are the tables generated and their sizes
      $ls -l *.tbl
      -rw-r--r-- 1 use grp  24346144 Aug 26 12:28 customer.tbl
      -rw-r--r-- 1 use grp 759863287 Aug 26 12:28 lineitem.tbl
      -rw-r--r-- 1 use grp      2224 Aug 26 12:28 nation.tbl
      -rw-r--r-- 1 use grp 171952161 Aug 26 12:28 orders.tbl
      -rw-r--r-- 1 use grp 118984616 Aug 26 12:28 partsupp.tbl
      -rw-r--r-- 1 use grp  24135125 Aug 26 12:28 part.tbl
      -rw-r--r-- 1 use grp       389 Aug 26 12:28 region.tbl
      -rw-r--r-- 1 use grp   1409184 Aug 26 12:28 supplier.tbl

    • Below is the TPC-H Schema (

    • The .tbl files have a "|" character at the end of each line, which is not compatible with postgresql and causes the error below during data import:
      ERROR:  extra data after last expected column
      CONTEXT:  COPY part, line 1: "1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PK..."
    • Run the command below to remove the last | on each line and save the result to .csv files
      for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;
    • Start postgresql server and client containers.  Check Run PostgreSQL with Docker
    • Create TPCH DB by running the commands below from psql terminal on postgresal client container:
      postgres=# create database tpch;
      postgres=# \c tpch;
      You are now connected to database "tpch" as user "postgres".
      tpch=# \i /opt/db/ws/tpch/sql/tpch-build-db.sql
      COPY 200000
      COPY 5
      COPY 25
      COPY 10000
      COPY 150000
      COPY 800000
      COPY 1500000
      COPY 6001215
      tpch=# \d+
                                      List of relations
       Schema |          Name          |   Type   |  Owner   |    Size    | Description 
       public | customer               | table    | postgres | 29 MB      | 
       public | customer_c_custkey_seq | sequence | postgres | 8192 bytes | 
       public | lineitem               | table    | postgres | 950 MB     | 
       public | nation                 | table    | postgres | 8192 bytes | 
       public | nation_n_nationkey_seq | sequence | postgres | 8192 bytes | 
       public | orders                 | table    | postgres | 216 MB     | 
       public | orders_o_orderkey_seq  | sequence | postgres | 8192 bytes | 
       public | part                   | table    | postgres | 32 MB      | 
       public | part_p_partkey_seq     | sequence | postgres | 8192 bytes | 
       public | partsupp               | table    | postgres | 143 MB     | 
       public | region                 | table    | postgres | 8192 bytes | 
       public | region_r_regionkey_seq | sequence | postgres | 8192 bytes | 
       public | supplier               | table    | postgres | 1848 kB    | 
       public | supplier_s_suppkey_seq | sequence | postgres | 8192 bytes | 
      (14 rows)

    Generate TPC-H Queries 

    • cd /opt/db/ws/tpch/sql
    • cp /opt/db/tpch-tool/tpch_2_17_0/dbgen/dists.dss .
    • Run the command below to generate all 22 TPC-H queries in separate .sql file. Note that DSS_QUERY environment variable in the command line is required for qgen to find query templates
        $for q in `seq 1 22`;do DSS_QUERY=/opt/db/tpch-tool/tpch_2_17_0/dbgen/queries qgen $q > $q.sql;done;
    • The tpc-h query templates don't work with PostgreSQL out of the box. Below is the result when running 1.sql in psql terminal:

      psql:/opt/db/ws/tpch/sql/1.sql:49: ERROR:  syntax error at or near "("
      LINE 15:  l_shipdate <= date '1998-12-01' - interval '97' day (3)
      Time: 0.227 ms
      psql:/opt/db/ws/tpch/sql/1.sql:50: ERROR:  syntax error at or near "where"
      LINE 1: where rownum <= -1;

    • Below is the result after the errors are fixed 1.sql
    •  l_returnflag | l_linestatus | sum_qty  | sum_base_price |  sum_disc_price  |     sum_charge     |       avg_qty       |     avg_price      |        avg_disc
             | count_order
       A            | F            | 37734107 | 56586554400.73 | 53758257134.8700 | 55909065222.827692 | 25.5220058532573370 | 38273.129734621672 | 0.04998529583839
      761162 |     1478493
      (1 row)

      Time: 11534.090 ms

    • The tpch-postgres.7z has all 22 queries that have been modified to run in PostgreSQL. 
      • "where rownum <=" statement is replaced with "limit" statement
      • Correlated aggregate subqueries with huge outer tables is replaced with JOIN (query 17, 20)

    TPC-H Benchmark Result

    Below is the TPC-H Benchmark Result on a Dell Precision T3610 Workstation (CPU @3.5GHz with 64GB RAM) and PostgreSQL 9.5.4 from the latest postgres Docker image (as of August 2016).

    TCP-H Query SF=1 (ms) Table Header
    1 11534.090 Table Cell
    2 398.402 Table Cell
    3 1085.500 Table Cell
    4 551.310 Table Cell
    5 650.774 Table Cell
    6 1304.219 Table Cell
    7 965.973 Table Cell
    8 1836.722 Table Cell
    9 5330.943 Table Cell
    10 1801.567 Table Cell
    11 280.123 Table Cell
    12 1889.902 Table Cell
    13 1775.164 Table Cell
    14 1339.493 Table Cell
    15 0.192 Table Cell
    16 1198.179 Table Cell
    17 4607.648 Table Cell
    18 4161.339 Table Cell
    19 1673.113 Table Cell
    20 2308.152 Table Cell
    21 2130.221 Table Cell
    22 585.929 Table Cell