Download TPC-H Tool
Download the source code for TPC-H tool from http://www.tpc.org/tpc_documents_current_versions/current_specifications.aspThe instructions below are based on TPC-H tool 2.17.1 and Ubuntu LTS 14.04
Build TPC-H Tool
- $mkdir /opt/db/tpch-tool
- Save the TCPH Tool zip to /opt/db/tpch-tool directory as tpc-h-tool_2_17_0.zip
- $unzip tpc-h-tool_2_17_0.zip
- $cd tpch_2_17_0/dbgen
- $cp makefile.suite Makefile
-
Update the Makefile with the lines below
CC=gcc
DATABASE=ORACLE
MACHINE=LINUX
WORKLOAD=TPCH
- $make
This will build dbgen (database generation) and qgen (query generation) tools - Add the build directory to PATH environment variable so they can be executed without full path
Generate TPC-H Postgres DB
- Generate TPC-H tables with scale factor 1 (1GB)
$dbgen -s 1
Fri Aug 26 12:28:15 PDT 2016
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
- Below are the tables generated and their sizes
$ls -l *.tbl
-rw-r--r-- 1 use grp 24346144 Aug 26 12:28 customer.tbl
-rw-r--r-- 1 use grp 759863287 Aug 26 12:28 lineitem.tbl
-rw-r--r-- 1 use grp 2224 Aug 26 12:28 nation.tbl
-rw-r--r-- 1 use grp 171952161 Aug 26 12:28 orders.tbl
-rw-r--r-- 1 use grp 118984616 Aug 26 12:28 partsupp.tbl
-rw-r--r-- 1 use grp 24135125 Aug 26 12:28 part.tbl
-rw-r--r-- 1 use grp 389 Aug 26 12:28 region.tbl
-rw-r--r-- 1 use grp 1409184 Aug 26 12:28 supplier.tbl
- Below is the TPC-H Schema (http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf)
- The .tbl files have a "|" character at the end of each line, which is not compatible with postgresql and causes the error below during data import:
ERROR: extra data after last expected column
CONTEXT: COPY part, line 1: "1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PK..."
- Run the command below to remove the last | on each line and save the result to .csv files
for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;
- Start postgresql server and client containers. Check Run PostgreSQL with Docker
- Create TPCH DB by running the commands below from psql terminal on postgresal client container:
postgres=# create database tpch; CREATE DATABASE postgres=# \c tpch; You are now connected to database "tpch" as user "postgres". tpch=# \i /opt/db/ws/tpch/sql/tpch-build-db.sql BEGIN CREATE TABLE COPY 200000 COMMIT BEGIN CREATE TABLE COPY 5 COMMIT BEGIN CREATE TABLE COPY 25 COMMIT BEGIN CREATE TABLE COPY 10000 COMMIT BEGIN CREATE TABLE COPY 150000 COMMIT BEGIN CREATE TABLE COPY 800000 COMMIT BEGIN CREATE TABLE COPY 1500000 COMMIT BEGIN CREATE TABLE COPY 6001215 COMMIT
ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE tpch=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+------------------------+----------+----------+------------+------------- public | customer | table | postgres | 29 MB | public | customer_c_custkey_seq | sequence | postgres | 8192 bytes | public | lineitem | table | postgres | 950 MB | public | nation | table | postgres | 8192 bytes | public | nation_n_nationkey_seq | sequence | postgres | 8192 bytes | public | orders | table | postgres | 216 MB | public | orders_o_orderkey_seq | sequence | postgres | 8192 bytes | public | part | table | postgres | 32 MB | public | part_p_partkey_seq | sequence | postgres | 8192 bytes | public | partsupp | table | postgres | 143 MB | public | region | table | postgres | 8192 bytes | public | region_r_regionkey_seq | sequence | postgres | 8192 bytes | public | supplier | table | postgres | 1848 kB | public | supplier_s_suppkey_seq | sequence | postgres | 8192 bytes | (14 rows)
Generate TPC-H Queries
- cd /opt/db/ws/tpch/sql
- cp /opt/db/tpch-tool/tpch_2_17_0/dbgen/dists.dss .
- Run
the command below to generate all 22 TPC-H queries in separate .sql
file. Note that DSS_QUERY environment variable in the command line is
required for qgen to find query templates
for q in `seq 1 22`;
do
DSS_QUERY=/opt/db/tpch-tool/tpch_2_17_0/dbgen/queries qgen $q > $q.sql;
done;
The tpc-h query templates don't work with PostgreSQL out of the box. Below is the result when running 1.sql in psql terminal:
psql:/opt/db/ws/tpch/sql/1.sql:49: ERROR: syntax error at or near "("
LINE 15: l_shipdate <= date '1998-12-01' - interval '97' day (3)
^
Time: 0.227 ms
psql:/opt/db/ws/tpch/sql/1.sql:50: ERROR: syntax error at or near "where"
LINE 1: where rownum <= -1;
Below is the result after the errors are fixed 1.sql
l_returnflag
| l_linestatus | sum_qty | sum_base_price | sum_disc_price |
sum_charge | avg_qty | avg_price |
avg_disc - The tpch-postgres.7z has all 22 queries that have been modified to run in PostgreSQL.
- "where rownum <=" statement is replaced with "limit" statement
- Correlated aggregate subqueries with huge outer tables is replaced with JOIN (query 17, 20)
| count_order
--------------+--------------+----------+----------------+------------------+--------------------+---------------------+--------------------+-----------------
-------+-------------
A | F | 37734107 | 56586554400.73 | 53758257134.8700 | 55909065222.827692 | 25.5220058532573370 | 38273.129734621672 | 0.04998529583839
761162 | 1478493
(1 row)
Time: 11534.090 ms
TPC-H Benchmark Result
Below is theTPC-H Benchmark Result on a Dell Precision T3610 Workstation (CPU @3.5GHz with 64GB RAM) and PostgreSQL 9.5.4 from the latest postgres Docker image (as of August 2016).
TCP-H Query | SF=1 (ms) | Table Header |
---|---|---|
1 | 11534.090 | Table Cell |
2 | 398.402 | Table Cell |
3 | 1085.500 | Table Cell |
4 | 551.310 | Table Cell |
5 | 650.774 | Table Cell |
6 | 1304.219 | Table Cell |
7 | 965.973 | Table Cell |
8 | 1836.722 | Table Cell |
9 | 5330.943 | Table Cell |
10 | 1801.567 | Table Cell |
11 | 280.123 | Table Cell |
12 | 1889.902 | Table Cell |
13 | 1775.164 | Table Cell |
14 | 1339.493 | Table Cell |
15 | 0.192 | Table Cell |
16 | 1198.179 | Table Cell |
17 | 4607.648 | Table Cell |
18 | 4161.339 | Table Cell |
19 | 1673.113 | Table Cell |
20 | 2308.152 | Table Cell |
21 | 2130.221 | Table Cell |
22 | 585.929 | Table Cell |
Dear Jim, Very accurate procedures in this article. I have get success using this procedures. I tried with other tutorials, but I did not find the DDL object creation code, just how to generate the data. Please leave this DDL table-creation script ( /opt/db/ws/tpch/sql/tpch-build-db.sql) available for a community and modify postgresql queries too. Do you have any DBT-3 installation article in so linux using postgresql? best regards
ReplyDeletedear sir/madam, i need this script.script ( /opt/db/ws/tpch/sql/tpch-build-db.sql)
DeleteThanks a ton for the wonderful article.
ReplyDeleteThank you!!!!
ReplyDeleteTHANK YOU. This article has been soooooo helpful :)
ReplyDeleteReally helpful THANKS!!!
ReplyDelete15.sql from your ZIP has a syntax error!
ReplyDeletehi jim. did you have success generating queries with different substitutions using qgen? I find that no matter how I seed the rng (with -r), I am getting the same set of queries (those that appear using the flag -d). Thanks!
ReplyDeleteThanks for the great article! Would you also be able to include the postgresql.conf file when you were running the benchmarks? I'm curious what your config was since my latency results for each test are significantly higher :)
ReplyDeleteAlso could you please share what command line parameters you used when running pgbench (i.e. clients, threads, number of transactions/duration, etc.)? Otherwise, how did you go about executing the query and benchmarking?
ReplyDelete