4 research outputs found

    PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

    Get PDF
    The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform. This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory. The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.Comment: This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publicatio

    On Design and Optimization of Convolutional Neural Network for Embedded Systems

    Get PDF
    This work presents the research on optimizing neural networks and deploying them for real-time practical applications. We analyze different optimization methods, namely binarization, separable convolution and pruning. We implement each method for the application of vehicle classification and we empirically evaluate and analyze the results. The objective is to make large neural networks suitable for real-time applications by reducing the computation requirements through these optimization approaches. The data set is of vehicles from 4 classes of vehicle types, and a convolutional model was used to solve the problem initially. Our results show that these optimization methods offer many performance benefits in this application in terms of reduced execution time (by up to 5 ×), reduced model storage requirements, with out largely impacting accuracy, making them a suitable tool for use in streamlining heavy neural networks to be used on resource-constrained envrionments. The platforms used in the research are a desktop platform, and two embedded platforms

    Mapping streaming applications onto GPU systems

    No full text
    10.1109/SC.Companion.2012.279Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 20121488-149
    corecore