Projects in Industry

Apple (2022)

During my internship at Apple, I worked on Apple Neural Engine (ANE) compilers. Specifically, I worked on developing optimization passes to improve ANE performance.

Protopia AI (2020 – present)

Protopia AI is a start-up with a novel technology that generates curated noise around the source signal, safeguarding data from unauthorized inference.

Noise for Privacy-Aware AI

INFerence-as-a-Service (INFaaS) has become a pervasive backbone technology for many use-cases from mobile assistants to enterprise applications. IBM estimates that 95% of all data processed for AI is used for inference services. This data currently contains staggering amount of privileged and private data that could hinder timely deployment of deep leaning models or pose risks for businesses and their customers. While data is protected at rest and in motion through encryption, it is exposed during inference as the data needs to be processed in an un-encrypted fashion due to latency requirements and Service Level Agreements (SLA).

Protopia AI addresses this structural gap in inference privacy by a novel obfuscation technology. Our approach leverages gradient mechanisms to find stochastic data obfuscations that also keep the inference service highly performant. We also provide secure APIs that allow inference services to seamlessly integrate this privacy protection solution adding only less than 10 lines of code.

  • Development of Protopia AI’s noise training technology for inference privacy as well as PoC on various tasks (2020.10 – 2022.06) NeurIPS-D’21
  • Development of training and inference infrastructure (2021.01 – 2022.06)
  • Development of optimizing compiler for noise injection technology (2021.07 – 2021.12)

Qualcomm AI Research (2019 & 2020)

During my internships at Qualcomm, I worked on developing compiler optimization techniques to reduce memory footprint while executing deep neural networks.

Compiler Optimization for Machine Learning

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual designs. These designs are especially effective while designing neural architectures under hard resource constraints (memory, MACs, . . . ) which highlights the importance of this class of designing neural networks. However, such a move creates complication in the previously streamlined pattern of execution. In fact one of the main challenges is that the order of such nodes in the neural network significantly effects the memory footprint of the intermediate activations. Current compilers do not schedule with regard to activation memory footprint that it significantly increases its peak compared to the optimum, rendering it not applicable for edge devices.

Serenity utilizes dynamic programming to find a sequence that finds a schedule with optimal memory footprint. Our solution also comprises of graph rewriting technique that allows further reduction beyond the optimum. As such, Serenity achieves optimal peak memory, and the graph rewriting technique further reduces the memory footprint.

Overview of the Memory-Aware Scheduling of Irregularly Wired Neural Networks (MLSys’20)
  • Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices (2019.07-2019.09) MLSys’20
  • AI-Enabled Memory Management Algorithms (2020.06-2020.09)

Samsung Research (2015 – 2018)

The overall goal of my projects at Samsung were to optimize systems for next generation products. I tackled the problem from various levels of the system stack. My main role was a compiler engineer throughout my tenure at Samsung, but I also worked on simulators and model optimization of DNNs.

Samsung Reconfigurable Processor (SRP)

  • Developed swing modulo scheduler for VLIW and achieved maximum of 250% performance improvement
  • Accelerated scheduling time and increased performance of edge-centric modulo scheduler for CGRA using an adaptive algorithm to optimize routing of the scheduling process. As a result, achieved smaller Initiation Intervals (II) for many kernels, significantly faster
  • Ported LLVM compiler’s backends and C standard library for multiple architectures
  • Implemented and debugged various tools to enhance usability of the toolchain

* Samsung Reconfigurable Processor is a VLIW plus CGRA architecture DSP, which has been employed in Samsung’s TVs, Galaxy Smartphones, and various other products

Samsung Neural Processor (SNP)

  • Developed SNP system simulator to be used for architecture exploration, hardware verification, and software development
  • Ported, quantized, and optimized neural network algorithms including LeNet, SqueezeNet, and other applications to run on SNP

* Samsung Neural Processor is a Neural Network Processor, which is targeted for Samsung’s future products including TVs and Smartphones