PhD Research

University of California, San Diego

From Statement of Purpose for admission to UCSD

My Ph.D will be an adventure about various ways to optimize systems 🙂
Following are the projects that I work on

Machine Learning for Compiler Optimization

Success of Deep Neural Networks (DNNs) and their computational intensity has heralded Cambrian explosion of DNN hardware. While hardware design has advanced significantly, optimizing the code for them is still and open challenge. Recent research has moved past traditional compilation techniques and taken a stochastic search algorithmic path that blindly generates rather stochastic samples of the binaries for real hardware measurements to guide the search.

Chameleon leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain-knowledge inspired logic to improve the samples itself.

Glimpse opens a new dimension by incorporating the mathematical embeddings of the hardware specifications of the GPU accelerators dubbed Blueprint to better guide the search algorithm and focus on sub-spaces that have higher potential for yielding higher performance binaries.

Overview of the Adaptive Code Optimization for Expedited Deep Neural Network Compilation (ICLR’20)
  • Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation (2019.02 – 2020.02) ICLR’20, ICML-W’19
  • Use of Mathematical Embeddings of Hardware Specifications for Faster Neural Compilation (2021.07 – 2022.02) DAC’22

Spatial Multi-Tenancy for DNN Accelerators

Accelerator-based INFaaS (Google’s TPU, NVIDIA T4, Microsoft Brainwave, etc.) has become the backbone of many real-life applications. However, as the demand for such services grows, merely scaling-out the number of accelerators is not economically cost-effective. Although multi-tenancy has propelled datacenter scalability, it has not been a primary factor in designing DNN accelerators due to the arms race for higher speed and efficiency.

Planaria is the first work to explore the spatial multi-tenancy in accelerators. Dynamic architecture fission and its associated flexibility enables an extra degree of freedom for task scheduling, that even allows breaking the accelerator with regard to the server load, DNN topology, and task priority. As such, it can simultaneously co-locate DNNs to enhance utilization, throughput, QoS, and fairness.

Illustration of Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of DNNs (MICRO’20)
  • Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of DNNs (2020.02 – 2020.09) MICRO’20

Others

  • Neural Architecture Search on Resource-limited Platforms IEEE SMC’21 – with Mälardalen University in Sweden

Qualcomm AI Research

During my internships at Qualcomm, I worked on developing compiler optimization techniques to reduce memory footprint while executing deep neural networks. Compiler optimization research at Qualcomm forms an important part of my PhD dissertation research.

Compiler Optimization for Machine Learning

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual designs. These designs are especially effective while designing neural architectures under hard resource constraints (memory, MACs, . . . ) which highlights the importance of this class of designing neural networks. However, such a move creates complication in the previously streamlined pattern of execution. In fact one of the main challenges is that the order of such nodes in the neural network significantly effects the memory footprint of the intermediate activations. Current compilers do not schedule with regard to activation memory footprint that it significantly increases its peak compared to the optimum, rendering it not applicable for edge devices.

Serenity utilizes dynamic programming to find a sequence that finds a schedule with optimal memory footprint. Our solution also comprises of graph rewriting technique that allows further reduction beyond the optimum. As such, Serenity achieves optimal peak memory, and the graph rewriting technique further reduces the memory footprint.

Overview of the Memory-Aware Scheduling of Irregularly Wired Neural Networks (MLSys’20)
  • Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices (2019.07-2019.09) MLSys’20
  • AI-Enabled Memory Management Algorithms (2020.06-2020.09)