import logging
import sys
import numpy as np
import tvm
from tvm import autotvm # seems like this is the autotuner
Basic Flow
There are placeholder and operations which make it just like some of the other declarative deep learning frameworks. Addition to that is create_schedule which creates schedule.
define_knob + split
Schedule is created through two stages (1) get config object + define the search space –> 5×5 = 25
# get the config object
cfg = autotvm.get_config()
# define search space
cfg.define_knob("tile_y", [1, 2, 4, 8, 16])
cfg.define_knob("tile_x", [1, 2, 4, 8, 16])
(2) schedule according to entity in the space.
# yo, yi = s[C].split(y, cfg['tile_y'].val)
# xo, xi = s[C].split(x, cfg['tile_x'].val)
define_split + apply
Another (better) method would be to (1) get config object + define the split knob (which enumerates all the possible ways to split an axis and construct the space) –> {(1,32) to (32,1)} = 6
# get the config object
cfg = autotvm.get_config()
# define search space --> each entry in cfg is SplitEntity
cfg.define_split("tile_y", y, num_outputs=2)
cfg.define_split("tile_x", x, num_outputs=2)
(2) schedule according to entity in the space.
# yo, yi = cfg["tile_y"].apply(s, C, y)
# xo, xi = cfg["tile_x"].apply(s, C, x)
Tuning
There are RandomTuner, GridSearchTuner, GATuner (Genetic Algorithm), XGBTuner
GridSearch Tuner
next_batch() function uses Tuner’s counter as index to get new config and append to ret which is tested one by one.
Random Tuner
next_batch() function finds random index and never visit the same config again by comparing the new random index with entries in the visited set.
GA Tuner
point2knob makes index to vector
knob2point makes vector to index
At initialization, it makes a list of genes with pop_size elements.
Each next_batch() call will find batch_size number of genes to run experiments using measure_batch() and get their time as output.
Every time batch is over, it goes through an update(). It just appends scores until the whole pop_size is tested. After it has completed testing pop_size, it picks the best elite_num genes from the ones experimented using np.argpartition (link). Out of them, it (1) samples two of them using their scores as probability, (2) mix two of them together to form pop_size number of tmp_gene. (3) mutates some dimensions of the knob.
—> only traverses ones that are likely to be successful, so small probability of invalid…
XGBoost Tuner
CostModel: predicts the speed of a config…
ModelOptimizer: find optimal points of a cost model…
ModelBasedTuner: fit a cost model and use an optimizer to find the maximums…
Glossary
itervar: use features extracted from IterVar (default)
knob: use flatten ConfigEntityDirectly
curve: use sampled curve feature
Details of encoding are programmed in C/C++ in src/autotvm/touch_extractor.cc
Transfer Learning
Transfer learning is implemented by reading the past logs… It reads task.name which are in the form of “topi_nn_conv2d”. Reading from file is implemented in python/tvm/autotvm/record.py.
Invalid Configurations
MeasureResult(costs=(InstantiationError(‘Skipped because of invalid gpu kernel’,),)
LocalBuilder –> default_build_func(measure_input, tmp_dir, **kwargs) –> _build_func_common(measure_input, **kwargs) –> gpu_verify_pass(**check_gpu = 1st part of **kwargs)
LocalBuilder’s build calls self.executor.submit(self.build_func, inp, self.tmp_dir, **self.build_kwargs)
Verification code finally goes to src/pass/verify_gpu_code.cc
ARM
On firefly board… using screen or tmux for each of the following command…
python3 -m tvm.exec.rpc_tracker
python3 -m tvm.exec.rpc_server –tracker=[HOST_IP]:9190 –key=rk3399