Software Installation
Install useful utilitiessudo apt-get install wget git rpm
Install Cuda
- Download Cuda local install package from https://developer.nvidia.com/cuda-downloads
- Run the commands below to install Cuda and reboot
$sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb $sudo apt-get update $sudo apt-get install cuda
$echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda-lib64.conf
$cp -r /usr/local/cuda/samples . $cd samples/1_Utilities/deviceQuery $make $./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro K1000M" CUDA Driver Version / Runtime Version 7.5 / 7.5 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2047 MBytes (2146762752 bytes) ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores GPU Max Clock rate: 851 MHz (0.85 GHz) Memory Clock rate: 900 Mhz Memory Bus Width: 128-bit L2 Cache Size: 262144 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K1000M Result = PASSInstall PostgreSQL
- Import PostgreSQL repository key
$wget -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
- Create or edit /etc/apt/sources.list.d/pgdg.list with the line below
deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
- Configure apt's package pinning to prefer the PGDG packages over the Debian ones in /etc/apt/preferences.d/pgdg.pref. this will replace all your Debian/Ubuntu packages with available packages from the PGDG repository.
Package: * Pin: release o=apt.postgresql.org Pin-Priority: 500
- Update the package lists, and install the pgdg-keyring package to automatically get repository key updates
$sudo apt-get update $sudo apt-get install pgdg-keyring
- Install PostgreSQL 9.5
$sudo apt-get install -y postgresql-common $sudo apt-get install -y postgresql-9.5 postgresql-contrib-9.5 postgresql-server-dev-9.5 libpq-dev
- The PostgreSQL server will automatically be started after installation. Run the commands below to manually start or restart the server if needed.
$sudo service postgresql restart $sudo service postgresql start
- Run the command below to start psql terminal and connect to the server
$sudo -u postgre psql
Install PGStrom [PGStrom Wiki]
- Download and install PGStrom
$git clone https://github.com/pg-strom/devel pg_strom $cd pg_strom $which pg_config /usr/local/pgsql/bin/pg_config $make $sudo make install
- Edit /etc/postgresql/9.5/main/postgresql.conf to update configuration for PGStrom
- Add /usr/lib/postgresql/9.5/lib/pg_strom.so to shared_preload_libraries
- Restart PostgreSQL service
$sudo service postgresql restart
- PostgreSQL log file /var/log/postgresql/postgresql-9.5-main.log shows PGStrom is loaded with supported GPU device
$ cat /var/log/postgresql/postgresql-9.5-main.log LOG: PG-Strom version 1.0devel built for PostgreSQL 9.5 LOG: CUDA Runtime version: 7.5.0 LOG: NVIDIA driver version: 352.63 LOG: GPU0 Quadro K1000M (192 CUDA cores, 850MHz), L2 256KB, RAM 2047MB (128bits, 900MHz), capability 3.0 LOG: NVRTC - CUDA Runtime Compilation vertion 7.5
Test PGStrom Integration with PostgreSQL
- Start psql terminal
sudo -u postgres psql
- Create pg_strom extension
postgres=#CREATE EXTENSION pg_strom; CREATE EXTENSION
- Create a test DB and connect to it
postgres=# create database testdb; CREATE DATABASE postgres=# \c testdb You are now connected to database "testdb" as user "postgres".
- Create test tables from SQL script in PGStrom installation. It creates one 100M rows table and 9 of 100K rows table.
postgres=# \i pgstrom/test/testdb.sql
- Check pg_strom parameters used by postgresql:
postgres=# show all; pg_strom.bulkexec | on | Enables the bulk-execution mode of PG-Strom pg_strom.chunk_limit | 79360kB | limit size of pgstrom_data_store pg_strom.chunk_size | 15872kB | default size of pgstrom_data_store pg_strom.chunk_size_margin | 1.25 | margin of chunk size if not predictable exactly pg_strom.cpu_fallback | on | Enables CPU fallback if GPU is pg_strom.cuda_visible_devices | | CUDA_VISIBLE_DEVICES of CUDA runtime pg_strom.debug_cuda_coredump | off | Turn on/off GPU coredump feature pg_strom.debug_force_gpupreagg | off | Force GpuPreAgg regardless of the cost (debug) pg_strom.debug_force_gpusort | off | Force GpuSort regardless of the cost (debug) pg_strom.debug_kernel_source | off | Turn on/off to display the kernel source path pg_strom.enable_gpuhashjoin | on | Enables the use of GpuHashJoin logic pg_strom.enable_gpunestloop | on | Enables the use of GpuNestLoop logic pg_strom.enable_gpupreagg | off | Enables the use of GPU preprocessed aggregate pg_strom.enable_gpuscan | on | Enables the use of GPU accelerated full-scan pg_strom.enable_gpusort | off | Enables the use of GPU accelerated sorting pg_strom.enabled | off | Enables the planner's use of PG-Strom pg_strom.gpu_dma_cost | 10 | Cost to send/recv data via DMA pg_strom.gpu_operator_cost | 7.8125e-05 | Cost of processing each operators by GPU pg_strom.gpu_setup_cost | 4000 | Cost to setup GPU device to run pg_strom.gpu_tuple_cost | 0.0003125 | Cost of processing each tuple for GPU pg_strom.max_async_tasks | 32 | max number of GPU tasks to be run asynchronously pg_strom.num_threads_margin | 1.1 | margin of number of CUDA threads if not predictable exactly pg_strom.perfmon | off | Enables the performance monitor of PG-Strom pg_strom.program_cache_size | 48MB | size of shared program cache pg_strom.pullup_outer_scan | on | Enables to pull up simple outer scan
-
Review the execution plan of a query and confirm GPU offload is enabled (GpuPreAgg, GpuJoin, etc)
testdb=# EXPLAIN SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat; QUERY PLAN --------------------------------------------------------------------------------------------- HashAggregate (cost=3349749.08..3349749.40 rows=26 width=12) Group Key: t0.cat -> Custom Scan (GpuPreAgg) (cost=16325.86..2879097.75 rows=234 width=44) Reduction: Local + Global GPU Projection: cat, ax -> Custom Scan (GpuJoin) on t0 (cost=12325.86..2851751.79 rows=99599458 width=12) GPU Projection: t0.cat, t1.ax Depth 1: GpuHashJoin, HashKeys: (t0.aid) JoinQuals: (t0.aid = t1.aid) Nrows (in/out: 99.60%), KDS-Hash (size: 13.47MB, nbatches: 1) Depth 2: GpuHashJoin, HashKeys: (t0.bid) JoinQuals: (t0.bid = t2.bid) Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1) -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=12) -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4) (15 rows)
-
Turn off pgstrom and review the same execution plan (default join, hash, etc).
testdb=# set pg_strom.enabled=off; SET testdb=# EXPLAIN SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat; QUERY PLAN ---------------------------------------------------------------------------------- HashAggregate (cost=6843172.91..6843173.24 rows=26 width=12) Group Key: t0.cat -> Hash Join (cost=7250.00..6345175.62 rows=99599458 width=12) Hash Cond: (t0.aid = t1.aid) -> Hash Join (cost=3576.00..4188792.08 rows=99997856 width=8) Hash Cond: (t0.bid = t2.bid) -> Seq Scan on t0 (cost=0.00..1833312.56 rows=99997856 width=12) -> Hash (cost=1935.00..1935.00 rows=100000 width=4) -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4) -> Hash (cost=1935.00..1935.00 rows=100000 width=12) -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=12) (11 rows)