Inventi Impact: Embedded Systems

Articles

Inventi:ees/24578/17

Optimized Deep Neural Networks for Real-Time Object Classification on Embedded GPUs

01-Jan-1970 Research 2018 : January - March

Syed Tahir Hussain Rizvi, Gianpiero Cabodi, Gianluca Francini

Convolution is the most computationally intensive task of the Convolutional Neural\nNetwork (CNN). It requires a lot of memory storage and computational power. There are different\napproaches to compute the solution of convolution and reduce its computational complexity. In this\npaper, a matrix multiplication-based convolution (ConvMM) approach is fully parallelized using\nconcurrent resources of GPU (Graphics Processing Unit) and optimized, considerably improving the\nperformance of the image classifiers and making them applicable to real-time embedded applications.\nThe flow of this CUDA (Compute Unified Device Architecture)-based scheme is optimized using\nunified memory and hardware-dependent acceleration of matrix multiplication. Proposed flow is\nevaluated on two different embedded platforms: first on an Nvidia Jetson TX1 embedded board\nand then on a Tegra K1 GPU of an Nvidia Shield K1 Tablet. The performance of this optimized\nand accelerated convolutional layer is compared with its sequential and heterogeneous versions.\nResults show that the proposed scheme significantly improves the overall results including energy\nefficiency, storage requirement and inference performance. In particular, the proposed scheme on\nembedded GPUs is hundreds of times faster than the sequential version and delivers tens of times\nhigher performance than the heterogeneous approach.

How to Cite this Article
CC Compliant Citation: Rizvi, Syed Tahir Hussain, Gianpiero Cabodi, and Gianluca Francini. \"Optimized Deep Neural\nNetworks for Real-Time Object Classification on Embedded GPUs.\" Applied Sciences 7.8 (2017): 826, doi:10.3390/\napp7080826, https://creativecommons.org/licenses/by/4.0/.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Embedded Systems

Articles

Inventi:ees/24578/17

Optimized Deep Neural Networks for Real-Time Object Classification on Embedded GPUs

How to Cite this Article

Links

Contact Us