3,082 research outputs found

    Towards Automatic Parallelization of Stream Processing Applications

    Get PDF
    Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework

    Towards automatic parallelization of stream processing applications

    Get PDF
    Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework.This work was supported in part by the Spanish Ministerio de Economía y Competitividad through the Project Toward Uni cation of HPC and Big Data Paradigms under Grant TIN2016-79637-P and in part by the EU Project RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications under Grant ICT 644235

    Statistical performance analysis with dynamic workload using S-NET

    Get PDF
    Volkmar Wieser, Philip K. F. Hölzenspies, Michael Roßbory, and Raimund Kirner, 'Statistical performance analysis with dynamic workload using S-NET'. Paper presented at the Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Paris, France 23-25 January 2012In this paper the ADVANCE approach for engineering con- current software systems with well-balanced hardware ef- ficiency is adressed using the stream processing language S-Net. To obtain the cost information in the concurrent system the metrics throughput, latency, and jitter are evalu- ated by analyzing generated synthetical data as well as using an industrial related application in the future. As fall-out an Eclipse plugin for S-Net has been developed to provide sup- port for syntax highlighting, content assistance, hover help, and more, for easier and faster development. The presented results of the current work are on the one hand an indicator for the status quo of the ADVANCE vision and on the other hand used to improve the applied statistical analysis tech- niques within ADVANCE. Like the ADVANCE project, this work is still under development, but further improvements and speedups are expected in the near future

    Programming MPSoC platforms: Road works ahead

    Get PDF
    This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

    Skalabilna implementacija dekodera po normi MPEG korištenjem tokovnog programskog jezika

    Get PDF
    In this paper, we describe a scalable and portable parallelized implementation of a MPEG decoder using a streaming computation paradigm, tailored to new generations of multi--core systems. A novel, hybrid approach towards parallelization of both new and legacy applications is described, where only data--intensive and performance--critical parts are implemented in the streaming domain. An architecture--independent \u27StreamIt\u27 language is used for design, optimization and implementation of parallelized segments, while the developed \u27StreamGate\u27 interface provides a communication mechanism between the implementation domains. The proposed hybrid approach was employed in re--factoring of a reference MPEG video decoder implementation; identifying the most performance--critical segments and re-implementing them in \u27StreamIt\u27 language, with \u27StreamGate\u27 interface as a communication mechanism between the host and streaming kernel. We evaluated the scalability of the decoder with respect to the number of cores, video frame formats, sizes and decomposition. Decoder performance was examined in the presence of different processor load configurations and with respect to the number of simultaneously processed frames.U ovom radu opisujemo skalabilnu i prenosivu implementaciju dekodera po normi MPEG ostvarenu korištenjem paradigme tokovnog računarstva, prilagođenu novim generacijama višejezgrenih računala. Opisan je novi, hibridni pristup paralelizaciji novih ili postojećih aplikacija, gdje se samo podatkovno intenzivni i računski zahtjevni dijelovi implementiraju u tokovnoj domeni. Arhitekturno neovisni jezik StreamIt koristi se za oblikovanje, optimiranje i izvedbu paraleliziranih segmenata aplikacije, dok razvijeno sučelje \u27StreamGate\u27 omogućava komunikaciju između domena implementacije. Predloženi hibridni pristup razvoju paraleliziranih aplikacija iskorišten je u preoblikovanju referentnog dekodera video zapisa po normi MPEG; identificirani su računski zahtjevni segmenti aplikacije i ponovno implementirani u jeziku StreamIt, sa sučeljem \u27StreamGate\u27 kao poveznicom između slijedne i tokovne domene. Ispitivana su svojstva skalabilnosti s obzirom na ciljani broj jezgri, format video zapisa i veličinu okvira te dekompoziciju ulaznih podataka. Svojstva dekodera  su praćena u prisustvu različitih opterećenja ispitnog računala, i s obzirom na broj istovremeno obrađivanih okvira
    corecore