Deploy the Pretrained Model on Raspberry Pi

Author: Ziheng Jiang

This is an example of using NNVM to compile a ResNet model and deploy it on raspberry pi.

To begin with, we import nnvm(for compilation) and TVM(for deployment).

import tvm
import nnvm.compiler
import nnvm.testing
from tvm.contrib import util, rpc
from tvm.contrib import graph_runtime as runtime

Build TVM Runtime on Device

There’re some prerequisites: we need build tvm runtime and set up a RPC server on remote device.

To get started, clone tvm repo from github. It is important to clone the submodules along, with –recursive option (Assuming you are in your home directory):

git clone --recursive


Usually device has limited resources and we only need to build runtime. The idea is we will use TVM compiler on the local server to compile and upload the compiled program to the device and run the device function remotely.

make runtime

After success of buildind runtime, we need set environment varibles in ~/.bashrc file of yourself account or /etc/profile of system enviroment variables. Assuming your TVM directory is in ~/tvm and set environment variables below your account.

vi ~/.bashrc

We need edit ~/.bashrc using vi ~/.bashrc and add lines below (Assuming your TVM directory is in ~/tvm):

export TVM_HOME=~/tvm
export PATH=$PATH:$TVM_HOME/lib

To enable updated ~/.bashrc, execute source ~/.bashrc.

Set Up RPC Server on Device

To set up a TVM RPC server on the Raspberry Pi (our remote device), we have prepared a one-line script so you only need to run this command after following the installation guide to install TVM on your device:

python -m tvm.exec.rpc_server --host --port=9090

After executing command above, if you see these lines below, it’s successful to start RPC server on your device.

Loading runtime library /home/YOURNAME/code/tvm/lib/ exec only
INFO:root:RPCServer: bind to
environment, please change the three lines below: change the use_rasp to True, also change the host and port with your device’s host address and port number.
use_rasp = False
host = 'rasp0'
port = 9090

if not use_rasp:
    # run server locally
    host = 'localhost'
    port = 9091
    server = rpc.Server(host=host, port=port, use_popen=True)

Prepare the Pretrained Model

Back to the host machine, firstly, we need to download a MXNet Gluon ResNet model from model zoo, which is pretrained on ImageNet. You can found more details about this part at Compile MXNet Models

from import get_model
from mxnet.gluon.utils import download
from PIL import Image
import numpy as np

# only one line to get the model
block = get_model('resnet18_v1', pretrained=True)

In order to test our model, here we download an image of cat and transform its format.

img_name = 'cat.jpg'
download('', img_name)
image =, 224))

def transform_image(image):
    image = np.array(image) - np.array([123., 117., 104.])
    image /= np.array([58.395, 57.12, 57.375])
    image = image.transpose((2, 0, 1))
    image = image[np.newaxis, :]
    return image

x = transform_image(image)

synset is used to transform the label from number of ImageNet class to the word human can understand.

synset_url = ''.join(['',
synset_name = 'synset.txt'
download(synset_url, synset_name)
with open(synset_name) as f:
    synset = eval(

Now we would like to port the Gluon model to a portable computational graph. It’s as easy as several lines.

# We support MXNet static graph(symbol) and HybridBlock in mxnet.gluon
net, params = nnvm.frontend.from_mxnet(block)
# we want a probability so add a softmax operator
net = nnvm.sym.softmax(net)

Here are some basic data workload configurations.

batch_size = 1
num_classes = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_classes)

Compile The Graph

To compile the graph, we call the function with the graph configuration and parameters. However, You cannot to deploy a x86 program on a device with ARM instruction set. It means NNVM also needs to know the compilation option of target device, apart from arguments net and params to specify the deep learning workload. Actually, the option matters, different option will lead to very different performance.

If we run the example locally for demonstration, we can simply set it as llvm. If to run it on the Raspberry Pi, you need to specify its instruction set. Here is the option I use for my Raspberry Pi, which has been proved as a good compilation configuration.

if use_rasp:
    target =
    target ='llvm')

graph, lib, params =
    net, target, shape={"data": data_shape}, params=params)

# After ``, you will get three return values: graph,
# library and the new parameter, since we do some optimization that will
# change the parameters but keep the result of model as the same.

# Save the library at local temporary directory.
tmp = util.tempdir()
lib_fname = tmp.relpath('net.o')

Deploy the Model Remotely by RPC

With RPC, you can deploy the model remotely from your host machine to the remote device.

# connect the server
remote = rpc.connect(host, port)

# upload the library to remote device and load it
rlib = remote.load_module('net.o')

ctx = remote.cpu(0)
# upload the parameter
rparams = {k: tvm.nd.array(v, ctx) for k, v in params.items()}

# create the remote runtime module
module = runtime.create(graph, rlib, ctx)
# set parameter
# set input data
module.set_input('data', tvm.nd.array(x.astype('float32')))
# run
# get output
out = module.get_output(0, tvm.nd.empty(out_shape, ctx=ctx))
# get top1 result
top1 = np.argmax(out.asnumpy())
print('TVM prediction top-1: {}'.format(synset[top1]))

if not use_rasp:
    # terminate the local server


TVM prediction top-1: tabby, tabby cat

Total running time of the script: ( 0 minutes 9.571 seconds)

Gallery generated by Sphinx-Gallery