140 research outputs found

    PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

    Get PDF
    The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform. This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory. The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.Comment: This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publicatio

    Interfacing and scheduling legacy code within the Canals framework

    No full text
    International audienceThe need for understanding how to distribute computations across multiple cores, have obviously increased in the multi-core era. Scheduling the functional blocks of an application for concurrent execution requires not only a good understanding of data dependencies, but also a structured way to describe the intended scheduling. In this paper we describe how the Canals language and its scheduling framework can be used for the purpose of scheduling and executing legacy code. Additionally a set of translation guidelines for translating RVC-CAL applications into Canals are presented. The proposed approaches are applied to an existing MPEG-4 Simple Profile decoder for evaluation purposes. The inverse discrete cosine transform (IDCT) is accelerated by the means of OpenCL

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms

    Get PDF
    Two major trends in computing hardware during the last decade have been an increase in the number of processing cores found in individual computer hardware platforms and an ubiquity of distributed, heterogeneous systems. Together, these changes can improve not only the performance of a range of applications, but the types of applications that can be created. Despite the advances in hardware technology, advances in programming of such systems has not kept pace. Traditional concurrent programming has always been challenging, and is only set to be come more so as the level of hardware concurrency increases. The different hardware platforms which make up heterogeneous systems come with domain-specific programming models, which are not designed to interact, or take into account the different resource-constraints present across different hardware devices, motivating a need for runtime reconfiguration or adaptation. This dissertation investigates the actor model of computation as an appropriate abstraction to address the issues present in programming concurrent, distributed, and adaptive applications across different scales and types of computing hardware. Given the limitations of other approaches, this dissertation describes a new actor-based programming language (Ensemble) and its runtime to address these challenges. The goal of this language is to enable non-specialist programmers to take advantage of parallel, distributed, and adaptive programming without the programmer requiring in-depth knowledge of hardware architectures or software frameworks. There is also a description of the design and implementation of the runtime system which executes Ensemble applications across a range of heterogeneous platforms. To show the suitability of the actor-based abstraction in creating applications for such systems, the language and runtime were evaluated in terms of linguistic complexity and performance. These evaluations covered programming embedded, concurrent, distributed, and adaptable applications, as well as combinations thereof. The results show that the actor provides an objectively simple way to program such systems without sacrificing performance

    Edge/Fog Computing Technologies for IoT Infrastructure

    Get PDF
    The prevalence of smart devices and cloud computing has led to an explosion in the amount of data generated by IoT devices. Moreover, emerging IoT applications, such as augmented and virtual reality (AR/VR), intelligent transportation systems, and smart factories require ultra-low latency for data communication and processing. Fog/edge computing is a new computing paradigm where fully distributed fog/edge nodes located nearby end devices provide computing resources. By analyzing, filtering, and processing at local fog/edge resources instead of transferring tremendous data to the centralized cloud servers, fog/edge computing can reduce the processing delay and network traffic significantly. With these advantages, fog/edge computing is expected to be one of the key enabling technologies for building the IoT infrastructure. Aiming to explore the recent research and development on fog/edge computing technologies for building an IoT infrastructure, this book collected 10 articles. The selected articles cover diverse topics such as resource management, service provisioning, task offloading and scheduling, container orchestration, and security on edge/fog computing infrastructure, which can help to grasp recent trends, as well as state-of-the-art algorithms of fog/edge computing technologies

    Cognitive Radio Programming: Existing Solutions and Open Issues

    Get PDF
    Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

    Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

    Get PDF
    Scientific applications strive for increased memory and computing performance, requiring massive amounts of data and time to produce results. Applications utilize large-scale, parallel computing platforms with advanced architectures to accommodate their needs. However, developing performance-portable applications for modern, heterogeneous platforms requires lots of effort and expertise in both the application and systems domains. This is more relevant for unstructured applications whose workflow is not statically predictable due to their heavily data-dependent nature. One possible solution for this problem is the introduction of an intelligent Domain-Specific Language (iDSL) that transparently helps to maintain correctness, hides the idiosyncrasies of lowlevel hardware, and scales applications. An iDSL includes domain-specific language constructs, a compilation toolchain, and a runtime providing task scheduling, data placement, and workload balancing across and within heterogeneous nodes. In this work, we focus on the runtime framework. We introduce a novel design and extension of a runtime framework, the Parallel Runtime Environment for Multicore Applications. In response to the ever-increasing intra/inter-node concurrency, the runtime system supports efficient task scheduling and workload balancing at both levels while allowing the development of custom policies. Moreover, the new framework provides abstractions supporting the utilization of heterogeneous distributed nodes consisting of CPUs and GPUs and is extensible to other devices. We demonstrate that by utilizing this work, an application (or the iDSL) can scale its performance on heterogeneous exascale-era supercomputers with minimal effort. A future goal for this framework (out of the scope of this thesis) is to be integrated with machine learning to improve its decision-making and performance further. As a bridge to this goal, since the framework is under development, we experiment with data from Nuclear Physics Particle Accelerators and demonstrate the significant improvements achieved by utilizing machine learning in the hit-based track reconstruction process

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Curracurrong: a stream processing system for distributed environments

    Get PDF
    Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay
    corecore