Quick Start - End-to-End Tutorial for NNVM/TVM Pipeline

Author: Yao Wang

This example shows how to build a neural network with NNVM python frontend and generate runtime library for Nvidia GPU and Raspberry Pi with TVM. (Thanks to Tianqi’s tutorial for cuda and Ziheng’s tutorial for Raspberry Pi) To run this notebook, you need to install tvm and nnvm following these instructions. Notice that you need to build tvm with cuda and llvm.

Overview for Supported Hardware Backend of TVM

The image below shows hardware backend currently supported by TVM:

https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png

In this tutorial, we’ll choose cuda and llvm as target backends. To begin with, let’s import NNVM and TVM.

import tvm
import nnvm.compiler
import nnvm.testing

Define Neural Network in NNVM

First, let’s define a neural network with nnvm python frontend. For simplicity, we’ll use pre-defined resnet-18 network in NNVM. Parameters are initialized with Xavier initializer. NNVM also supports other model formats such as MXNet, CoreML and ONNX.

In this tutorial, we assume we will do inference on our device and the batch size is set to be 1. Input images are RGB color images of size 224 * 224. We can call the nnvm.symbol.debug_str to show the network structure.

batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

net, params = nnvm.testing.resnet.get_workload(batch_size=batch_size, image_shape=image_shape)
print(net.debug_str())

Out:

Symbol Outputs:
        output[0]=softmax(0)
Variable:data
Variable:bn_data_gamma
Variable:bn_data_beta
Variable:bn_data_moving_mean
Variable:bn_data_moving_var
--------------------
Op:batch_norm, Name=bn_data
Inputs:
        arg[0]=data(0) version=0
        arg[1]=bn_data_gamma(0) version=0
        arg[2]=bn_data_beta(0) version=0
        arg[3]=bn_data_moving_mean(0) version=0
        arg[4]=bn_data_moving_var(0) version=0
Attrs:
        epsilon=2e-05
Variable:conv0_weight
--------------------
Op:conv2d, Name=conv0
Inputs:
        arg[0]=bn_data(0)
        arg[1]=conv0_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(7, 7)
        padding=(3, 3)
        strides=(2, 2)
        use_bias=False
Variable:bn0_gamma
Variable:bn0_beta
Variable:bn0_moving_mean
Variable:bn0_moving_var
--------------------
Op:batch_norm, Name=bn0
Inputs:
        arg[0]=conv0(0)
        arg[1]=bn0_gamma(0) version=0
        arg[2]=bn0_beta(0) version=0
        arg[3]=bn0_moving_mean(0) version=0
        arg[4]=bn0_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=relu0
Inputs:
        arg[0]=bn0(0)
--------------------
Op:max_pool2d, Name=max_pool2d0
Inputs:
        arg[0]=relu0(0)
Attrs:
        padding=(1, 1)
        pool_size=(3, 3)
        strides=(2, 2)
Variable:stage1_unit1_bn1_gamma
Variable:stage1_unit1_bn1_beta
Variable:stage1_unit1_bn1_moving_mean
Variable:stage1_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage1_unit1_bn1
Inputs:
        arg[0]=max_pool2d0(0)
        arg[1]=stage1_unit1_bn1_gamma(0) version=0
        arg[2]=stage1_unit1_bn1_beta(0) version=0
        arg[3]=stage1_unit1_bn1_moving_mean(0) version=0
        arg[4]=stage1_unit1_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit1_relu1
Inputs:
        arg[0]=stage1_unit1_bn1(0)
Variable:stage1_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage1_unit1_conv1
Inputs:
        arg[0]=stage1_unit1_relu1(0)
        arg[1]=stage1_unit1_conv1_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit1_bn2_gamma
Variable:stage1_unit1_bn2_beta
Variable:stage1_unit1_bn2_moving_mean
Variable:stage1_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage1_unit1_bn2
Inputs:
        arg[0]=stage1_unit1_conv1(0)
        arg[1]=stage1_unit1_bn2_gamma(0) version=0
        arg[2]=stage1_unit1_bn2_beta(0) version=0
        arg[3]=stage1_unit1_bn2_moving_mean(0) version=0
        arg[4]=stage1_unit1_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit1_relu2
Inputs:
        arg[0]=stage1_unit1_bn2(0)
Variable:stage1_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage1_unit1_conv2
Inputs:
        arg[0]=stage1_unit1_relu2(0)
        arg[1]=stage1_unit1_conv2_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit1_sc_weight
--------------------
Op:conv2d, Name=stage1_unit1_sc
Inputs:
        arg[0]=stage1_unit1_relu1(0)
        arg[1]=stage1_unit1_sc_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add0
Inputs:
        arg[0]=stage1_unit1_conv2(0)
        arg[1]=stage1_unit1_sc(0)
Variable:stage1_unit2_bn1_gamma
Variable:stage1_unit2_bn1_beta
Variable:stage1_unit2_bn1_moving_mean
Variable:stage1_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage1_unit2_bn1
Inputs:
        arg[0]=elemwise_add0(0)
        arg[1]=stage1_unit2_bn1_gamma(0) version=0
        arg[2]=stage1_unit2_bn1_beta(0) version=0
        arg[3]=stage1_unit2_bn1_moving_mean(0) version=0
        arg[4]=stage1_unit2_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit2_relu1
Inputs:
        arg[0]=stage1_unit2_bn1(0)
Variable:stage1_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage1_unit2_conv1
Inputs:
        arg[0]=stage1_unit2_relu1(0)
        arg[1]=stage1_unit2_conv1_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit2_bn2_gamma
Variable:stage1_unit2_bn2_beta
Variable:stage1_unit2_bn2_moving_mean
Variable:stage1_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage1_unit2_bn2
Inputs:
        arg[0]=stage1_unit2_conv1(0)
        arg[1]=stage1_unit2_bn2_gamma(0) version=0
        arg[2]=stage1_unit2_bn2_beta(0) version=0
        arg[3]=stage1_unit2_bn2_moving_mean(0) version=0
        arg[4]=stage1_unit2_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit2_relu2
Inputs:
        arg[0]=stage1_unit2_bn2(0)
Variable:stage1_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage1_unit2_conv2
Inputs:
        arg[0]=stage1_unit2_relu2(0)
        arg[1]=stage1_unit2_conv2_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add1
Inputs:
        arg[0]=stage1_unit2_conv2(0)
        arg[1]=elemwise_add0(0)
Variable:stage2_unit1_bn1_gamma
Variable:stage2_unit1_bn1_beta
Variable:stage2_unit1_bn1_moving_mean
Variable:stage2_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage2_unit1_bn1
Inputs:
        arg[0]=elemwise_add1(0)
        arg[1]=stage2_unit1_bn1_gamma(0) version=0
        arg[2]=stage2_unit1_bn1_beta(0) version=0
        arg[3]=stage2_unit1_bn1_moving_mean(0) version=0
        arg[4]=stage2_unit1_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit1_relu1
Inputs:
        arg[0]=stage2_unit1_bn1(0)
Variable:stage2_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage2_unit1_conv1
Inputs:
        arg[0]=stage2_unit1_relu1(0)
        arg[1]=stage2_unit1_conv1_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage2_unit1_bn2_gamma
Variable:stage2_unit1_bn2_beta
Variable:stage2_unit1_bn2_moving_mean
Variable:stage2_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage2_unit1_bn2
Inputs:
        arg[0]=stage2_unit1_conv1(0)
        arg[1]=stage2_unit1_bn2_gamma(0) version=0
        arg[2]=stage2_unit1_bn2_beta(0) version=0
        arg[3]=stage2_unit1_bn2_moving_mean(0) version=0
        arg[4]=stage2_unit1_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit1_relu2
Inputs:
        arg[0]=stage2_unit1_bn2(0)
Variable:stage2_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage2_unit1_conv2
Inputs:
        arg[0]=stage2_unit1_relu2(0)
        arg[1]=stage2_unit1_conv2_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage2_unit1_sc_weight
--------------------
Op:conv2d, Name=stage2_unit1_sc
Inputs:
        arg[0]=stage2_unit1_relu1(0)
        arg[1]=stage2_unit1_sc_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add2
Inputs:
        arg[0]=stage2_unit1_conv2(0)
        arg[1]=stage2_unit1_sc(0)
Variable:stage2_unit2_bn1_gamma
Variable:stage2_unit2_bn1_beta
Variable:stage2_unit2_bn1_moving_mean
Variable:stage2_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage2_unit2_bn1
Inputs:
        arg[0]=elemwise_add2(0)
        arg[1]=stage2_unit2_bn1_gamma(0) version=0
        arg[2]=stage2_unit2_bn1_beta(0) version=0
        arg[3]=stage2_unit2_bn1_moving_mean(0) version=0
        arg[4]=stage2_unit2_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit2_relu1
Inputs:
        arg[0]=stage2_unit2_bn1(0)
Variable:stage2_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage2_unit2_conv1
Inputs:
        arg[0]=stage2_unit2_relu1(0)
        arg[1]=stage2_unit2_conv1_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage2_unit2_bn2_gamma
Variable:stage2_unit2_bn2_beta
Variable:stage2_unit2_bn2_moving_mean
Variable:stage2_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage2_unit2_bn2
Inputs:
        arg[0]=stage2_unit2_conv1(0)
        arg[1]=stage2_unit2_bn2_gamma(0) version=0
        arg[2]=stage2_unit2_bn2_beta(0) version=0
        arg[3]=stage2_unit2_bn2_moving_mean(0) version=0
        arg[4]=stage2_unit2_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit2_relu2
Inputs:
        arg[0]=stage2_unit2_bn2(0)
Variable:stage2_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage2_unit2_conv2
Inputs:
        arg[0]=stage2_unit2_relu2(0)
        arg[1]=stage2_unit2_conv2_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add3
Inputs:
        arg[0]=stage2_unit2_conv2(0)
        arg[1]=elemwise_add2(0)
Variable:stage3_unit1_bn1_gamma
Variable:stage3_unit1_bn1_beta
Variable:stage3_unit1_bn1_moving_mean
Variable:stage3_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage3_unit1_bn1
Inputs:
        arg[0]=elemwise_add3(0)
        arg[1]=stage3_unit1_bn1_gamma(0) version=0
        arg[2]=stage3_unit1_bn1_beta(0) version=0
        arg[3]=stage3_unit1_bn1_moving_mean(0) version=0
        arg[4]=stage3_unit1_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit1_relu1
Inputs:
        arg[0]=stage3_unit1_bn1(0)
Variable:stage3_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage3_unit1_conv1
Inputs:
        arg[0]=stage3_unit1_relu1(0)
        arg[1]=stage3_unit1_conv1_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage3_unit1_bn2_gamma
Variable:stage3_unit1_bn2_beta
Variable:stage3_unit1_bn2_moving_mean
Variable:stage3_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage3_unit1_bn2
Inputs:
        arg[0]=stage3_unit1_conv1(0)
        arg[1]=stage3_unit1_bn2_gamma(0) version=0
        arg[2]=stage3_unit1_bn2_beta(0) version=0
        arg[3]=stage3_unit1_bn2_moving_mean(0) version=0
        arg[4]=stage3_unit1_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit1_relu2
Inputs:
        arg[0]=stage3_unit1_bn2(0)
Variable:stage3_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage3_unit1_conv2
Inputs:
        arg[0]=stage3_unit1_relu2(0)
        arg[1]=stage3_unit1_conv2_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage3_unit1_sc_weight
--------------------
Op:conv2d, Name=stage3_unit1_sc
Inputs:
        arg[0]=stage3_unit1_relu1(0)
        arg[1]=stage3_unit1_sc_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add4
Inputs:
        arg[0]=stage3_unit1_conv2(0)
        arg[1]=stage3_unit1_sc(0)
Variable:stage3_unit2_bn1_gamma
Variable:stage3_unit2_bn1_beta
Variable:stage3_unit2_bn1_moving_mean
Variable:stage3_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage3_unit2_bn1
Inputs:
        arg[0]=elemwise_add4(0)
        arg[1]=stage3_unit2_bn1_gamma(0) version=0
        arg[2]=stage3_unit2_bn1_beta(0) version=0
        arg[3]=stage3_unit2_bn1_moving_mean(0) version=0
        arg[4]=stage3_unit2_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit2_relu1
Inputs:
        arg[0]=stage3_unit2_bn1(0)
Variable:stage3_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage3_unit2_conv1
Inputs:
        arg[0]=stage3_unit2_relu1(0)
        arg[1]=stage3_unit2_conv1_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage3_unit2_bn2_gamma
Variable:stage3_unit2_bn2_beta
Variable:stage3_unit2_bn2_moving_mean
Variable:stage3_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage3_unit2_bn2
Inputs:
        arg[0]=stage3_unit2_conv1(0)
        arg[1]=stage3_unit2_bn2_gamma(0) version=0
        arg[2]=stage3_unit2_bn2_beta(0) version=0
        arg[3]=stage3_unit2_bn2_moving_mean(0) version=0
        arg[4]=stage3_unit2_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit2_relu2
Inputs:
        arg[0]=stage3_unit2_bn2(0)
Variable:stage3_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage3_unit2_conv2
Inputs:
        arg[0]=stage3_unit2_relu2(0)
        arg[1]=stage3_unit2_conv2_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add5
Inputs:
        arg[0]=stage3_unit2_conv2(0)
        arg[1]=elemwise_add4(0)
Variable:stage4_unit1_bn1_gamma
Variable:stage4_unit1_bn1_beta
Variable:stage4_unit1_bn1_moving_mean
Variable:stage4_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage4_unit1_bn1
Inputs:
        arg[0]=elemwise_add5(0)
        arg[1]=stage4_unit1_bn1_gamma(0) version=0
        arg[2]=stage4_unit1_bn1_beta(0) version=0
        arg[3]=stage4_unit1_bn1_moving_mean(0) version=0
        arg[4]=stage4_unit1_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit1_relu1
Inputs:
        arg[0]=stage4_unit1_bn1(0)
Variable:stage4_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage4_unit1_conv1
Inputs:
        arg[0]=stage4_unit1_relu1(0)
        arg[1]=stage4_unit1_conv1_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage4_unit1_bn2_gamma
Variable:stage4_unit1_bn2_beta
Variable:stage4_unit1_bn2_moving_mean
Variable:stage4_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage4_unit1_bn2
Inputs:
        arg[0]=stage4_unit1_conv1(0)
        arg[1]=stage4_unit1_bn2_gamma(0) version=0
        arg[2]=stage4_unit1_bn2_beta(0) version=0
        arg[3]=stage4_unit1_bn2_moving_mean(0) version=0
        arg[4]=stage4_unit1_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit1_relu2
Inputs:
        arg[0]=stage4_unit1_bn2(0)
Variable:stage4_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage4_unit1_conv2
Inputs:
        arg[0]=stage4_unit1_relu2(0)
        arg[1]=stage4_unit1_conv2_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage4_unit1_sc_weight
--------------------
Op:conv2d, Name=stage4_unit1_sc
Inputs:
        arg[0]=stage4_unit1_relu1(0)
        arg[1]=stage4_unit1_sc_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add6
Inputs:
        arg[0]=stage4_unit1_conv2(0)
        arg[1]=stage4_unit1_sc(0)
Variable:stage4_unit2_bn1_gamma
Variable:stage4_unit2_bn1_beta
Variable:stage4_unit2_bn1_moving_mean
Variable:stage4_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage4_unit2_bn1
Inputs:
        arg[0]=elemwise_add6(0)
        arg[1]=stage4_unit2_bn1_gamma(0) version=0
        arg[2]=stage4_unit2_bn1_beta(0) version=0
        arg[3]=stage4_unit2_bn1_moving_mean(0) version=0
        arg[4]=stage4_unit2_bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit2_relu1
Inputs:
        arg[0]=stage4_unit2_bn1(0)
Variable:stage4_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage4_unit2_conv1
Inputs:
        arg[0]=stage4_unit2_relu1(0)
        arg[1]=stage4_unit2_conv1_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage4_unit2_bn2_gamma
Variable:stage4_unit2_bn2_beta
Variable:stage4_unit2_bn2_moving_mean
Variable:stage4_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage4_unit2_bn2
Inputs:
        arg[0]=stage4_unit2_conv1(0)
        arg[1]=stage4_unit2_bn2_gamma(0) version=0
        arg[2]=stage4_unit2_bn2_beta(0) version=0
        arg[3]=stage4_unit2_bn2_moving_mean(0) version=0
        arg[4]=stage4_unit2_bn2_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit2_relu2
Inputs:
        arg[0]=stage4_unit2_bn2(0)
Variable:stage4_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage4_unit2_conv2
Inputs:
        arg[0]=stage4_unit2_relu2(0)
        arg[1]=stage4_unit2_conv2_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add7
Inputs:
        arg[0]=stage4_unit2_conv2(0)
        arg[1]=elemwise_add6(0)
Variable:bn1_gamma
Variable:bn1_beta
Variable:bn1_moving_mean
Variable:bn1_moving_var
--------------------
Op:batch_norm, Name=bn1
Inputs:
        arg[0]=elemwise_add7(0)
        arg[1]=bn1_gamma(0) version=0
        arg[2]=bn1_beta(0) version=0
        arg[3]=bn1_moving_mean(0) version=0
        arg[4]=bn1_moving_var(0) version=0
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=relu1
Inputs:
        arg[0]=bn1(0)
--------------------
Op:global_avg_pool2d, Name=pool1
Inputs:
        arg[0]=relu1(0)
--------------------
Op:flatten, Name=flatten0
Inputs:
        arg[0]=pool1(0)
Variable:fc1_weight
Variable:fc1_bias
--------------------
Op:dense, Name=fc1
Inputs:
        arg[0]=flatten0(0)
        arg[1]=fc1_weight(0) version=0
        arg[2]=fc1_bias(0) version=0
Attrs:
        units=1000
--------------------
Op:softmax, Name=softmax
Inputs:
        arg[0]=fc1(0)

Compilation

Next step is to compile the model using the NNVM/TVM pipeline. Users can specify the optimization level of the compilation. Currently this value can be 0 to 2, which corresponds to “SimplifyInference”, “OpFusion” and “PrecomputePrune” respectively. In this example we set optimization level to be 0 and use Raspberry Pi as compile target.

nnvm.compiler.build returns three components: the execution graph in json format, the TVM module library of compiled functions specifically for this graph on the target hardware, and the parameter blobs of the model. During the compilation, NNVM does the graph-level optimization while TVM does the tensor-level optimization, resulting in an optimized runtime module for model serving.

We’ll first compile for Nvidia GPU. To generate the module library, TVM will first transfer graph IR into lower intrinsic IR for the specified target backend, which is CUDA in this example. Then target backend will generate module library.

opt_level = 0
target = tvm.target.cuda()
with nnvm.compiler.build_config(opt_level=opt_level):
    graph, lib, params = nnvm.compiler.build(
        net, target, shape={"data": data_shape}, params=params)

Save Compiled Module

After compilation, we can save the graph, lib and params into separate files and deploy them to Nvidia GPU.

from tvm.contrib import util

temp = util.tempdir()
path_lib = temp.relpath("deploy_lib.so")
lib.export_library(path_lib)
with open(temp.relpath("deploy_graph.json"), "w") as fo:
    fo.write(graph.json())
with open(temp.relpath("deploy_param.params"), "wb") as fo:
    fo.write(nnvm.compiler.save_param_dict(params))
print(temp.listdir())

Out:

['deploy_lib.so', 'deploy_param.params', 'deploy_graph.json']

Deploy locally to Nvidia GPU

Now we can load the module back.

import numpy as np
from tvm.contrib import graph_runtime

loaded_lib = tvm.module.load(path_lib)
loaded_json = open(temp.relpath("deploy_graph.json")).read()
loaded_params = bytearray(open(temp.relpath("deploy_param.params"), "rb").read())
module = graph_runtime.create(loaded_json, loaded_lib, tvm.gpu(0))
module.load_params(loaded_params)

input_data = tvm.nd.array(np.random.uniform(size=data_shape).astype("float32"))
module.run(data=input_data)
out = module.get_output(0, out=tvm.nd.empty(out_shape))
# Print first 10 elements of output
print(out.asnumpy()[0][0:10])

Out:

[ 0.00096432  0.00094971  0.00096572  0.00097922  0.00098626  0.00095537
  0.00100671  0.00104772  0.00096336  0.00103682]

Compile and Deploy the Model to Raspberry Pi Remotely with RPC

Following the steps above, we can also compile the model for Raspberry Pi. TVM provides rpc module to help with remote deploying.

For demonstration, we simply start an RPC server on the same machine, if use_rasp is False. If you have set up the remote environment, please change the three lines below: change the use_rasp to True, also change the host and port with your device’s host address and port number.

# If we run the example locally for demonstration, we can simply set the
# compilation target as `llvm`.
# To run it on the Raspberry Pi, you need to specify its instruction set.
# `llvm -target=armv7l-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon`
# is the recommended compilation configuration, thanks to Ziheng's work.

from tvm.contrib import rpc

use_rasp = False
host = 'rasp0'
port = 9090

if not use_rasp:
    # run server locally
    host = 'localhost'
    port = 9090
    server = rpc.Server(host=host, port=port, use_popen=True)

# compile and save model library
if use_rasp:
    target = "llvm -target=armv7l-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon"
else:
    target = "llvm"
# use `with tvm.target.rasp` for some target-specified optimization
with tvm.target.rasp():
    graph, lib, params = nnvm.compiler.build(
        net, target, shape={"data": data_shape}, params=params)

temp = util.tempdir()
path_lib = temp.relpath("deploy_lib_rasp.o")
lib.save(path_lib)

# connect the server
remote = rpc.connect(host, port)

# upload the library to remote device and load it
remote.upload(path_lib)
rlib = remote.load_module('deploy_lib_rasp.o')

ctx = remote.cpu(0)
# upload the parameter
rparams = {k: tvm.nd.array(v, ctx) for k, v in params.items()}

# create the remote runtime module
module = graph_runtime.create(graph, rlib, ctx)
# set parameter
module.set_input(**rparams)
# set input data
input_data = np.random.uniform(size=data_shape)
module.set_input('data', tvm.nd.array(input_data.astype('float32')))
# run
module.run()

out = module.get_output(0, out=tvm.nd.empty(out_shape, ctx=ctx))
# Print first 10 elements of output
print(out.asnumpy()[0][0:10])

if not use_rasp:
    # terminate the local server
    server.terminate()

Out:

[ 0.00096451  0.00095024  0.00096367  0.00098417  0.00098744  0.00095434
  0.00100718  0.00104579  0.0009618   0.00103585]

Total running time of the script: ( 0 minutes 13.208 seconds)

Generated by Sphinx-Gallery