Custom Hardware Accelerators for On-Embedded-Device Inference

Accelerators (GPU, TPU, etc.) have been shown to significantly improve training and inference speeds for conventional deep learning. However, embedded platforms lack the necessary resources (power) to handle such advanced accelerators. During the Summer and Fall of 2021 I interned at OctoML where I helped develop a system to add custom hardware accelerators to an embedded device and run inference using OctoML's uTVM compiler. This project is still a work in progress and has inspired my interest in efficient machine learning.

Publications

Work still in progress!

Additional Material

More about uTVM

Collaborators

Andrew Reusch, Josh Fromm, Shwetak Patel,