Sunday, July 16, 2023

torch.fx Notes

make_fx

decomposition_table 

The decomposition_table argument to the make_fx function in PyTorch is a dictionary that maps from ATen operators to their decomposed counterparts. This can be used to improve the performance of traced graphs by eliminating unnecessary copies and mutations. For example, the following code shows how to use the decomposition_table argument to improve the performance of the torch.nn.functional.dropout function: 

Python 
import torch
from torch.fx.experimental.proxy_tensor import make_fx

def make_decomposition_table():
  table = {}
  table[torch.nn.functional.dropout] = torch.jit.trace(
      lambda x, p: torch.nn.functional.dropout(x, p, training=False),
      example_inputs=(torch.randn(10), torch.tensor(0.5))
  )
  return table

decomposition_table = make_decomposition_table()
graph = make_fx(
    lambda x, p: torch.nn.functional.dropout(x, p),
    decomposition_table=decomposition_table
)
  • The make_decomposition_table function creates a dictionary that maps from the torch.nn.functional.dropout function to its traced counterpart. This traced counterpart is a more efficient implementation of the dropout function that eliminates unnecessary copies and mutations. 
  • The make_fx function then uses the decomposition_table argument to trace the dropout function using the traced counterpart. This results in a traced graph that is more efficient than the original graph. 
  • The decomposition_table argument can be used to improve the performance of any ATen operator. However, it is important to note that the traced counterparts of some operators may not be as efficient as the original operators. Therefore, it is important to benchmark the performance of the traced graphs before using them in production.

get_decompositions 

The get_decompositions function in PyTorch is a function that returns a dictionary of all the decompositions that are currently registered in the torch.fx.experimental.proxy_tensor module. This can be used to see which operators have been decomposed and how they have been decomposed. For example, the following code shows how to use the get_decompositions function to see which operators have been decomposed: Python
import torch
from torch.fx.experimental.proxy_tensor import get_decompositions

decompositions = get_decompositions()
print(decompositions)
  • This code will print a dictionary that maps from ATen operators to their decomposed counterparts. For example, the torch.nn.functional.dropout function will be mapped to its traced counterpart. 
  •  The get_decompositions function can be used to see which operators have been decomposed and how they have been decomposed. This can be useful for understanding the performance of traced graphs and for debugging traced graphs. Here is an example of the output of the get_decompositions function:
    {
      "aten::dropout": ,
      "aten::mul": ,
      "aten::add": ,
      ...
    }
    
    As you can see, the get_decompositions function returns a dictionary that maps from ATen operators to their decomposed counterparts. This can be useful for understanding the performance of traced graphs and for debugging traced graphs.

Friday, September 13, 2019

Xilinx SDx/Vitis Tips

This blog shows the live tweet stream with hashtags #sdxtips and #vitistips. Please help the Xilinx developer community by sharing your tips and experience using the same hashtags

Thursday, May 30, 2019

Jupyter Notebook Table of Contents Extension

Jupyter Notebook Table of Contents Extension comes really handy especially when I work on large notebooks with many different sections and want to navigate to specific sections easily and quickly. Since I had to google a few times to install it, I am posting the exact instructions here in case it may help others.

  • Install nbextensions Python package
    • pip install jupyter_contrib_nbextensions
  • Install javascript and css files
    • jupyter contrib nbextension install --user
  • Enable toc2 extension
    • jupyter nbextension enable toc2/main


Wednesday, September 7, 2016

Use PGStrom with PostgreSQL on Ubuntu 14.04 LTS

Software Installation

Install useful utilities
sudo apt-get install wget git rpm 

Install Cuda
  • Download Cuda local install package from https://developer.nvidia.com/cuda-downloads

  • Run the commands below to install Cuda and reboot
  • $sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
    $sudo apt-get update
    $sudo apt-get install cuda
    

  • Create/Edit /etc/ld.so.conf.d/cuda-lib64.conf with Cuda library path
    $echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda-lib64.conf
    
  • Build the deviceQuery sample from Cuda installation and verify that the Cuda driver works
  • $cp -r /usr/local/cuda/samples .
    $cd samples/1_Utilities/deviceQuery 
    $make
    $./deviceQuery 
    ./deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "Quadro K1000M"
      CUDA Driver Version / Runtime Version          7.5 / 7.5
      CUDA Capability Major/Minor version number:    3.0
      Total amount of global memory:                 2047 MBytes (2146762752 bytes)
      ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
      GPU Max Clock rate:                            851 MHz (0.85 GHz)
      Memory Clock rate:                             900 Mhz
      Memory Bus Width:                              128-bit
      L2 Cache Size:                                 262144 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro K1000M
    Result = PASS
Install PostgreSQL
  • Import PostgreSQL repository key
    $wget -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
    

  • Create or edit /etc/apt/sources.list.d/pgdg.list with the line below
    deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
    

  • Configure apt's package pinning to prefer the PGDG packages over the Debian ones in /etc/apt/preferences.d/pgdg.pref. this will replace all your Debian/Ubuntu packages with available packages from the PGDG repository.
    Package: *
    Pin: release o=apt.postgresql.org
    Pin-Priority: 500
    
  • Update the package lists, and install the pgdg-keyring package to automatically get repository key updates
    $sudo apt-get update
    $sudo apt-get install pgdg-keyring
    
  • Install PostgreSQL 9.5
    $sudo apt-get install -y postgresql-common
    $sudo apt-get install -y postgresql-9.5 postgresql-contrib-9.5 postgresql-server-dev-9.5 libpq-dev
    
  • The PostgreSQL server will automatically be started after installation. Run the commands below to manually start or restart the server if needed.
    $sudo service postgresql restart
    $sudo service postgresql start
    
  • Run the command below to start psql terminal and connect to the server
    $sudo -u postgre psql
    

Install PGStrom [PGStrom Wiki]
  • Download and install PGStrom
    $git clone https://github.com/pg-strom/devel pg_strom
    $cd pg_strom
    $which pg_config
    /usr/local/pgsql/bin/pg_config
    $make
    $sudo make install
    
  • Edit /etc/postgresql/9.5/main/postgresql.conf to update configuration for PGStrom
    • Add /usr/lib/postgresql/9.5/lib/pg_strom.so to shared_preload_libraries
  • Restart PostgreSQL service
    $sudo service postgresql restart
    
  • PostgreSQL log file /var/log/postgresql/postgresql-9.5-main.log shows PGStrom is loaded with supported GPU device
    $ cat /var/log/postgresql/postgresql-9.5-main.log 
    LOG:  PG-Strom version 1.0devel built for PostgreSQL 9.5
    LOG:  CUDA Runtime version: 7.5.0
    LOG:  NVIDIA driver version: 352.63
    LOG:  GPU0 Quadro K1000M (192 CUDA cores, 850MHz), L2 256KB, RAM 2047MB (128bits, 900MHz), capability 3.0
    LOG:  NVRTC - CUDA Runtime Compilation vertion 7.5
    

Test PGStrom Integration with PostgreSQL

  • Start psql terminal
    sudo -u postgres psql
    
  • Create pg_strom extension
    postgres=#CREATE EXTENSION pg_strom;
    CREATE EXTENSION
    
  • Create a test DB and connect to it
    postgres=# create database testdb;
    CREATE DATABASE
    postgres=# \c testdb
    You are now connected to database "testdb" as user "postgres".
    
  • Create test tables from SQL script in PGStrom installation. It creates one 100M rows table and 9 of 100K rows table.
    postgres=# \i pgstrom/test/testdb.sql
    
  • Check pg_strom parameters used by postgresql:
    postgres=# show all;
    pg_strom.bulkexec                   | on                                       | Enables the bulk-execution mode of PG-Strom
    pg_strom.chunk_limit                | 79360kB                                  | limit size of pgstrom_data_store
    pg_strom.chunk_size                 | 15872kB                                  | default size of pgstrom_data_store
    pg_strom.chunk_size_margin          | 1.25                                     | margin of chunk size if not predictable exactly
    pg_strom.cpu_fallback               | on                                       | Enables CPU fallback if GPU is 
    pg_strom.cuda_visible_devices       |                                          | CUDA_VISIBLE_DEVICES of CUDA runtime
    pg_strom.debug_cuda_coredump        | off                                      | Turn on/off GPU coredump feature
    pg_strom.debug_force_gpupreagg      | off                                      | Force GpuPreAgg regardless of the cost (debug)
    pg_strom.debug_force_gpusort        | off                                      | Force GpuSort regardless of the cost (debug)
    pg_strom.debug_kernel_source        | off                                      | Turn on/off to display the kernel source path
    pg_strom.enable_gpuhashjoin         | on                                       | Enables the use of GpuHashJoin logic
    pg_strom.enable_gpunestloop         | on                                       | Enables the use of GpuNestLoop logic
    pg_strom.enable_gpupreagg           | off                                      | Enables the use of GPU preprocessed aggregate
    pg_strom.enable_gpuscan             | on                                       | Enables the use of GPU accelerated full-scan
    pg_strom.enable_gpusort             | off                                      | Enables the use of GPU accelerated sorting
    pg_strom.enabled                    | off                                      | Enables the planner's use of PG-Strom
    pg_strom.gpu_dma_cost               | 10                                       | Cost to send/recv data via DMA
    pg_strom.gpu_operator_cost          | 7.8125e-05                               | Cost of processing each operators by GPU
    pg_strom.gpu_setup_cost             | 4000                                     | Cost to setup GPU device to run
    pg_strom.gpu_tuple_cost             | 0.0003125                                | Cost of processing each tuple for GPU
    pg_strom.max_async_tasks            | 32                                       | max number of GPU tasks to be run asynchronously
    pg_strom.num_threads_margin         | 1.1                                      | margin of number of CUDA threads if not predictable exactly
    pg_strom.perfmon                    | off                                      | Enables the performance monitor of PG-Strom
    pg_strom.program_cache_size         | 48MB                                     | size of shared program cache
    pg_strom.pullup_outer_scan          | on                                       | Enables to pull up simple outer scan
    
  • Review the execution plan of a query and confirm GPU offload is enabled (GpuPreAgg, GpuJoin, etc)
    testdb=# EXPLAIN                 
    SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat;
                                             QUERY PLAN                                          
    ---------------------------------------------------------------------------------------------
     HashAggregate  (cost=3349749.08..3349749.40 rows=26 width=12)
       Group Key: t0.cat
       ->  Custom Scan (GpuPreAgg)  (cost=16325.86..2879097.75 rows=234 width=44)
             Reduction: Local + Global
             GPU Projection: cat, ax
             ->  Custom Scan (GpuJoin) on t0  (cost=12325.86..2851751.79 rows=99599458 width=12)
                   GPU Projection: t0.cat, t1.ax
                   Depth 1: GpuHashJoin, HashKeys: (t0.aid)
                            JoinQuals: (t0.aid = t1.aid)
                            Nrows (in/out: 99.60%), KDS-Hash (size: 13.47MB, nbatches: 1)
                   Depth 2: GpuHashJoin, HashKeys: (t0.bid)
                            JoinQuals: (t0.bid = t2.bid)
                            Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
                   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
                   ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
    (15 rows)
    
  • Turn off pgstrom and review the same execution plan (default join, hash, etc).
    testdb=# set pg_strom.enabled=off; 
    SET
    testdb=# EXPLAIN                   
    SELECT cat, avg(ax) FROM t0 NATURAL JOIN t1 NATURAL JOIN t2 GROUP BY cat;
                                        QUERY PLAN                                    
    ----------------------------------------------------------------------------------
     HashAggregate  (cost=6843172.91..6843173.24 rows=26 width=12)
       Group Key: t0.cat
       ->  Hash Join  (cost=7250.00..6345175.62 rows=99599458 width=12)
             Hash Cond: (t0.aid = t1.aid)
             ->  Hash Join  (cost=3576.00..4188792.08 rows=99997856 width=8)
                   Hash Cond: (t0.bid = t2.bid)
                   ->  Seq Scan on t0  (cost=0.00..1833312.56 rows=99997856 width=12)
                   ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
                         ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
             ->  Hash  (cost=1935.00..1935.00 rows=100000 width=12)
                   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
    (11 rows)
    
    

Sunday, August 28, 2016

Run PostgreSQL on Docker

Host and Containers

  • Host: Ubuntu 14.04
    • Add <username> to docker group so sudo is not required to run docker command
      sudo usermod -a -G docker username
    • Create two directories below. They will be mounted to containers to make DB and development files persist outside the containers
      /opt/db/ws
      /opt/db/postgresql/data


  • Container: postgresql : https://hub.docker.com/_/postgres/
    • This image includes EXPOSE 5432 (the postgres port), so standard container linking will make it automatically available to the linked containers. The default postgres user and database are created in the entry point with initdb.
    • The default data directory for PostgreSQL is /var/lib/postgresql/data

Run PostgreSQL Containers for Server and Client

  1. ssh <host>
  2. docker run --name pgs-server -v /opt/db/ws:/opt/db/ws -v /opt/db/postgresql/data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -d postgres 

    run
    start a new container using the requested image

    --name pgs-server
    Set container name to pgs-server.

    -v /opt/db/postgresql/data:/var/lib/postgresql/data
    -v /opt/db/ws:/opt/db/ws
    Mount host directory /opt/db/postgresql as /var/lib/postgresql on the container so DB files persist on the hard disk  
    Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

    -e POSTGRES_PASSWORD=pgsdev
    Set up the POSTGRES_PASSWORD environment variable, which sets the master PostgreSQL password

    -d
    Run the container in the background (daemon mode). It will stay alive until it is removed

    postgres
    Runs postgres docker image

  3. Optional: Start a bash shell on the pgs-server container for miscellaneous tasks
    docker exec -it pgs-server bash

  4. Start a PostgreSQL client container
    docker run --name pgs-client -v /opt/db/ws:/opt/db/ws -it --rm --link pgs-server:postgres postgres psql -h postgres -U postgres

    run
    Start a new container using the requested image

    --name pgs-client
    Set container name to pgs-client. If this option is not provided, docker will assign a random name.

    -v /opt/db/ws:/opt/db/ws
    Mount host directory /opt/db/ws as /opt/db/ws to be used for development.

    -it
    Run the container in interactive mode

    --rm
    Automatically cleanup the container after exit to avoid zombie containers
    -link pgs-server:postgres
    connects to the pgs-server container from the pgs-cleint container

    postgres
    Use postgres Docker image

    psql -h postgres -U postgres
    at the end tells Docker what command to execute when the container starts. In this case, start the interactive postgres terminal

  5. Optional. Start a bash shell on the 2nd container for miscellaneous tasks
    $docker exec -it pgs-client bash