Category Archives: Deep Learning Frameworks

[TVM] Adding New Relay Operation

Official Guide:

While going through this, it is important to have more context.

Attrs and Type Relation

Attrs are used to provide interface for the final interface regarding various attributes of the operator.

Type Relations are dealt with using functions.

Defining Compute / Strategy of the Operation

It seems that the operator can be defined in various ways such as te.compute or tir.
I want to see how the weights are defined… I think this can be better observed in operations like Convolution or Matmul

Importantly, to lower Relay operators to the implementations defined in TOPI library, a compute and schedule function need to be registered to each relay operator. However, they are usually specialized for each target. It is important that we provide some schedule for that so that AutoTVM or AutoScheduler can optimize the operations.

refer to schedule_conv3d_winograd_weight_transform in python/tvm/topi/generic/

Other things are about creating the Python Hooks and stuff, so lets ignore this

[TVM] Basics

import logging
import sys

import numpy as np
import tvm

from tvm import autotvm # seems like this is the autotuner

Basic Flow

There are placeholder and operations which make it just like some of the other declarative deep learning frameworks. Addition to that is create_schedule which creates schedule.

define_knob + split

Schedule is created through two stages (1) get config object + define the search space –> 5×5 = 25

# get the config object
cfg = autotvm.get_config()

# define search space
cfg.define_knob("tile_y", [1, 2, 4, 8, 16])
cfg.define_knob("tile_x", [1, 2, 4, 8, 16])

(2) schedule according to entity in the space.

# yo, yi = s[C].split(y, cfg['tile_y'].val)
# xo, xi = s[C].split(x, cfg['tile_x'].val)

define_split + apply

Another (better) method would be to (1) get config object + define the split knob (which enumerates all the possible ways to split an axis and construct the space) –> {(1,32) to (32,1)} = 6

# get the config object
cfg = autotvm.get_config()

# define search space --> each entry in cfg is SplitEntity
cfg.define_split("tile_y", y, num_outputs=2)
cfg.define_split("tile_x", x, num_outputs=2)

(2) schedule according to entity in the space.

# yo, yi = cfg["tile_y"].apply(s, C, y)
# xo, xi = cfg["tile_x"].apply(s, C, x)


There are RandomTuner, GridSearchTuner, GATuner (Genetic Algorithm), XGBTuner

GridSearch Tuner

next_batch() function uses Tuner’s counter as index to get new config and append to ret which is tested one by one.

Random Tuner

next_batch() function finds random index and never visit the same config again by comparing the new random index with entries in the visited set.

GA Tuner

point2knob makes index to vector
knob2point makes vector to index

At initialization, it makes a list of genes with pop_size elements.

Each next_batch() call will find batch_size number of genes to run experiments using measure_batch() and get their time as output.

Every time batch is over, it goes through an update(). It just appends scores until the whole pop_size is tested. After it has completed testing pop_size, it picks the best elite_num genes from the ones experimented using np.argpartition (link). Out of them, it (1) samples two of them using their scores as probability, (2) mix two of them together to form pop_size number of tmp_gene. (3) mutates some dimensions of the knob.

—> only traverses ones that are likely to be successful, so small probability of invalid…

XGBoost Tuner

CostModel: predicts the speed of a config…
ModelOptimizer: find optimal points of a cost model…
ModelBasedTuner: fit a cost model and use an optimizer to find the maximums…


itervar: use features extracted from IterVar (default)
knob: use flatten ConfigEntityDirectly
curve: use sampled curve feature

Details of encoding are programmed in C/C++ in src/autotvm/

Transfer Learning

Transfer learning is implemented by reading the past logs… It reads which are in the form of “topi_nn_conv2d”. Reading from file is implemented in python/tvm/autotvm/

Invalid Configurations

MeasureResult(costs=(InstantiationError(‘Skipped because of invalid gpu kernel’,),)

LocalBuilder –> default_build_func(measure_input, tmp_dir, **kwargs) –> _build_func_common(measure_input, **kwargs) –> gpu_verify_pass(**check_gpu = 1st part of **kwargs)

LocalBuilder’s build calls self.executor.submit(self.build_func, inp, self.tmp_dir, **self.build_kwargs)

Verification code finally goes to src/pass/


On firefly board… using screen or tmux for each of the following command…

python3 -m tvm.exec.rpc_tracker
python3 -m tvm.exec.rpc_server –tracker=[HOST_IP]:9190 –key=rk3399

[TensorFlow] How to use Graph Editor

import tensorflow as tf
from tensorflow.contrib import graph_editor as ge

g = tf.Graph()

with g.as_default():
    a = tf.constant(1.0, shape=[2, 3], name="a")
    b = tf.constant(2.0, shape=[2, 3], name="b")
    a_pl = tf.placeholder(dtype=tf.float32)
    b_pl = tf.placeholder(dtype=tf.float32)
    c = tf.add(a_pl, b_pl, name='c')

    print g.get_operations()
    with tf.Session(graph=g) as sess:
        #print # this prints out a lot of error, since input to c (a_pl and b_pl) are not fed any values
        print, feed_dict={a_pl:[[1, 1, 1], [1, 1, 1]], b_pl:[[2, 2, 2], [2, 2, 2]]}) # we could feed values...

    # after changing the inputs of the c to a and b --> disconnects a_pl and b_pl from c
    ge.swap_inputs(c.op, [a, b])
    print g.get_operations()
    with tf.Session(graph=g) as sess:

    # after replacing the inpus of the c to a and b
    c_1 = ge.graph_replace(c, {a: a_pl, b: b_pl}) # this appends a new node called c_1 to graph with inputs a_pl and b_pl
    print g.get_operations()
    with tf.Session(graph=g) as sess:
        #print # this prints out a lot of error, since input to c_1 (a_pl and b_pl) are not fed any values
        print, feed_dict={a_pl:[[1, 1, 1], [1, 1, 1]], b_pl:[[2, 2, 2], [2, 2, 2]]}) # we could feed values...


[TensorFlow] Generating TFRecord from image files

import tensorflow as tf

def _float_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=value))

def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

def write_examples(image_data, output_path):
  Create a tfrecord file.
    image_data (List[(image_file_path (str), label (int), instance_id (str)]): the data to store in the tfrecord file. 
      The `image_file_path` should be the full path to the image, accessible by the machine that will be running the 
      TensorFlow network. The `label` should be an integer in the range [0, number_of_classes). `instance_id` should be 
      some unique identifier for this example (such as a database identifier). 
    output_path (str): the full path name for the tfrecord file. 
  writer = tf.python_io.TFRecordWriter(output_path)

  for image_path, label, instance_id in image_data:

    example = tf.train.Example(features=tf.train.Features(
        'label': _int64_feature([label]),
        'path': _bytes_feature([image_path]),
        'instance' : _bytes_feature([instance_id])




[Caffe] Issues while installing Caffe


  1. Caffe cannot be installed with GCC version above 6, so use export to bypass the problem.
  2. There are various dependencies which cause problem, refer to their GitHub

[Keras] How to get less messages while training in Keras

To change message

class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []

def on_batch_end(self, batch, logs={}):


history = LossHistory(), y_train, batch_size=128, epochs=20, verbose=0, callbacks=[history])