#### Abstract

Conventional sparse matrix formats involve irregular index structures with large storage requirement and a sequential reconstruction process, resulting in inefficient use of highly parallel computing resources.

Numerous sparse matrix candidates are first generated by the Viterbi encoder, and the candidate that aims to minimize the model accuracy degradation is then selected by the Viterbi algorithm.

#### Introduction

It has been shown that pruning can achieve 9x to 13x reduction in connection (Han et al., 2015)

DNN after pruning heavily involves sparse matrix-vector and matrix-matrix multiplications (SpMV and SpMM, respectively). Despite the sparse content, the computation time for SpMM is longer than that of dense matrix multiplication in the modern graphic processing unit (GPU), due to its serialized index decoding process and irregular memory access patterns.

#### Pruning using Viterbi-based Approach

If the input sequence length and the total output sequence length of a VD are denoted as p and q, respectively, then the index compression ration can be calculated q/p. (q >> p)

Viterbi Decompressor (VD) Design Considerations

The goal of VD is to act as a random number generator using the input sequence. It is interesting to note that such an effort has already been studied in ECC design.

Fixed encoding rate was proposed to simulate random coding with an allowed decoding complexity.

Randomness of VD outputs is determined by the number of FFs and the XOR gates configuration.

Viterbi Algorithm for Pruning

Assign a cost function to each pruning case enabled by VD and evaluate all possible (2^p) pruning cases with a “Branch-and-Cut” algorithm.

The pruning case that has the optimal cost function should lead to minimal accuracy degradation.

The Viterbi algorithm computes the maximum-likelihood sequence in a hidden Markov model, and can be utilized as a fast and efficient pruning exploration technique for our pruning method.

- Construct trellis diagram, a time-indexed version of a state diagram.
- number of states are 2^k.

- Computing cost function for possible transittions using the branch metric and the path metric.
**Some scheme I don’t understand**

#### Result

Pruning similar to magnitude based, but with smaller matrix for indexing.