4 research outputs found
PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms
The majority of contemporary mobile devices and personal computers are based
on heterogeneous computing platforms that consist of a number of CPU cores and
one or more Graphics Processing Units (GPUs). Despite the high volume of these
devices, there are few existing programming frameworks that target full and
simultaneous utilization of all CPU and GPU devices of the platform.
This article presents a dataflow-flavored Model of Computation (MoC) that has
been developed for deploying signal processing applications to heterogeneous
platforms. The presented MoC is dynamic and allows describing applications with
data dependent run-time behavior. On top of the MoC, formal design rules are
presented that enable application descriptions to be simultaneously dynamic and
decidable. Decidability guarantees compile-time application analyzability for
deadlock freedom and bounded memory.
The presented MoC and the design rules are realized in a novel Open Source
programming environment "PRUNE" and demonstrated with representative
application examples from the domains of image processing, computer vision and
wireless communications. Experimental results show that the proposed approach
outperforms the state-of-the-art in analyzability, flexibility and performance.Comment: This is the author's version of an article that has been published in
this journal. Changes were made to this version by the publisher prior to
publicatio
On Design and Optimization of Convolutional Neural Network for Embedded Systems
This work presents the research on optimizing neural networks and deploying them for real-time practical applications. We analyze different optimization methods, namely binarization, separable convolution and pruning. We implement each method for the application of vehicle classification and we empirically evaluate and analyze the results. The objective is to make large neural networks suitable for real-time applications by reducing the computation requirements through these optimization approaches. The data set is of vehicles from 4 classes of vehicle types, and a convolutional model was used to solve the problem initially. Our results show that these optimization methods offer many performance benefits in this application in terms of reduced execution time (by up to 5 ×), reduced model storage requirements, with out largely impacting accuracy, making them a suitable tool for use in streamlining heavy neural networks to be used on resource-constrained envrionments. The platforms used in the research are a desktop platform, and two embedded platforms
Mapping streaming applications onto GPU systems
10.1109/SC.Companion.2012.279Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 20121488-149