Download the source code for TPC-H tool from
http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp
The instructions below are based on TPC-H tool 2.17.1 and Ubuntu LTS 14.04
- $mkdir /opt/db/tpch-tool
- Save the TCPH Tool zip to /opt/db/tpch-tool directory as tpc-h-tool_2_17_0.zip
- $unzip tpc-h-tool_2_17_0.zip
- $cd tpch_2_17_0/dbgen
- $cp makefile.suite Makefile
-
Update the Makefile with the lines below
CC=gcc
DATABASE=ORACLE
MACHINE=LINUX
WORKLOAD=TPCH
- $make
This will build dbgen (database generation) and qgen (query generation) tools
- Add the build directory to PATH environment variable so they can be executed without full path
Generate TPC-H Postgres DB
- Generate TPC-H tables with scale factor 1 (1GB)
$dbgen -s 1
Fri Aug 26 12:28:15 PDT 2016
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
- Below are the tables generated and their sizes
$ls -l *.tbl
-rw-r--r-- 1 use grp 24346144 Aug 26 12:28 customer.tbl
-rw-r--r-- 1 use grp 759863287 Aug 26 12:28 lineitem.tbl
-rw-r--r-- 1 use grp 2224 Aug 26 12:28 nation.tbl
-rw-r--r-- 1 use grp 171952161 Aug 26 12:28 orders.tbl
-rw-r--r-- 1 use grp 118984616 Aug 26 12:28 partsupp.tbl
-rw-r--r-- 1 use grp 24135125 Aug 26 12:28 part.tbl
-rw-r--r-- 1 use grp 389 Aug 26 12:28 region.tbl
-rw-r--r-- 1 use grp 1409184 Aug 26 12:28 supplier.tbl
- Below is the TPC-H Schema (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf)
- The .tbl files have a "|" character at the end of each line, which is not compatible with postgresql and causes the error below during data import:
ERROR: extra data after last expected column
CONTEXT: COPY part, line 1: "1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PK..."
- Start postgresql server and client containers. Check Run PostgreSQL with Docker
- Create TPCH DB by running the commands below from psql terminal on postgresal client container:
postgres=# create database tpch;
CREATE DATABASE
postgres=# \c tpch;
You are now connected to database "tpch" as user "postgres".
tpch=# \i /opt/db/ws/tpch/sql/tpch-build-db.sql
BEGIN
CREATE TABLE
COPY 200000
COMMIT
BEGIN
CREATE TABLE
COPY 5
COMMIT
BEGIN
CREATE TABLE
COPY 25
COMMIT
BEGIN
CREATE TABLE
COPY 10000
COMMIT
BEGIN
CREATE TABLE
COPY 150000
COMMIT
BEGIN
CREATE TABLE
COPY 800000
COMMIT
BEGIN
CREATE TABLE
COPY 1500000
COMMIT
BEGIN
CREATE TABLE
COPY 6001215
COMMIT
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
tpch=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------------+----------+----------+------------+-------------
public | customer | table | postgres | 29 MB |
public | customer_c_custkey_seq | sequence | postgres | 8192 bytes |
public | lineitem | table | postgres | 950 MB |
public | nation | table | postgres | 8192 bytes |
public | nation_n_nationkey_seq | sequence | postgres | 8192 bytes |
public | orders | table | postgres | 216 MB |
public | orders_o_orderkey_seq | sequence | postgres | 8192 bytes |
public | part | table | postgres | 32 MB |
public | part_p_partkey_seq | sequence | postgres | 8192 bytes |
public | partsupp | table | postgres | 143 MB |
public | region | table | postgres | 8192 bytes |
public | region_r_regionkey_seq | sequence | postgres | 8192 bytes |
public | supplier | table | postgres | 1848 kB |
public | supplier_s_suppkey_seq | sequence | postgres | 8192 bytes |
(14 rows)
Generate TPC-H Queries
- cd /opt/db/ws/tpch/sql
- cp /opt/db/tpch-tool/tpch_2_17_0/dbgen/dists.dss .
- Run
the command below to generate all 22 TPC-H queries in separate .sql
file. Note that DSS_QUERY environment variable in the command line is
required for qgen to find query templates
$for q in `seq 1 22`;
do
DSS_QUERY=/opt/db/tpch-tool/tpch_2_17_0/dbgen/queries qgen $q > $q.sql;
done;
The tpc-h query templates don't work with PostgreSQL out of the box. Below is the result when running 1.sql in psql terminal:
psql:/opt/db/ws/tpch/sql/1.sql:49: ERROR: syntax error at or near "("
LINE 15: l_shipdate <= date '1998-12-01' - interval '97' day (3)
^
Time: 0.227 ms
psql:/opt/db/ws/tpch/sql/1.sql:50: ERROR: syntax error at or near "where"
LINE 1: where rownum <= -1;
Below is the result after the errors are fixed 1.sql
l_returnflag
| l_linestatus | sum_qty | sum_base_price | sum_disc_price |
sum_charge | avg_qty | avg_price |
avg_disc
| count_order
--------------+--------------+----------+----------------+------------------+--------------------+---------------------+--------------------+-----------------
-------+-------------
A
| F | 37734107 | 56586554400.73 | 53758257134.8700 |
55909065222.827692 | 25.5220058532573370 | 38273.129734621672 |
0.04998529583839
761162 | 1478493
(1 row)
Time: 11534.090 ms
- The tpch-postgres.7z has all 22 queries that have been modified to run in PostgreSQL.
- "where rownum <=" statement is replaced with "limit" statement
- Correlated aggregate subqueries
with huge outer tables is replaced with JOIN (query 17, 20)
TPC-H Benchmark Result
Below is the
TPC-H Benchmark Result on a Dell Precision T3610 Workstation (CPU @3.5GHz with 64GB RAM) and PostgreSQL 9.5.4 from the latest postgres Docker image (as of August 2016).
TCP-H Query |
SF=1 (ms) |
Table Header |
1 |
11534.090 |
Table Cell |
2 |
398.402 |
Table Cell |
3 |
1085.500 |
Table Cell |
4 |
551.310 |
Table Cell |
5 |
650.774 |
Table Cell |
6 |
1304.219 |
Table Cell |
7 |
965.973 |
Table Cell |
8 |
1836.722 |
Table Cell |
9 |
5330.943 |
Table Cell |
10 |
1801.567 |
Table Cell |
11 |
280.123 |
Table Cell |
12 |
1889.902 |
Table Cell |
13 |
1775.164 |
Table Cell |
14 |
1339.493 |
Table Cell |
15 |
0.192 |
Table Cell |
16 |
1198.179 |
Table Cell |
17 |
4607.648 |
Table Cell |
18 |
4161.339 |
Table Cell |
19 |
1673.113 |
Table Cell |
20 |
2308.152 |
Table Cell |
21 |
2130.221 |
Table Cell |
22 |
585.929 |
Table Cell |