28 research outputs found

    PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

    Get PDF
    International audienceThe high performance Digital Signal Processors (DSP) currently manufactured by Texas Instruments are heterogeneous multiprocessor architectures. Programming these architectures is a complex task often reserved to specialized engineers because the bottlenecks of both the algorithm and the architecture need to be deeply understood in order to obtain a fairly parallel execution. The PREESM framework objective is to simplify the programming of multicore DSP systems by building on dataflow programming methods. The current functionalities of this scalable framework cover memory and time analysis, as well as automatic deadlock-free code generation. Several tutorials are provided with the tool for fast initiation of C programmers to multicore DSP programming. This paper demonstrates PREESM capabilities by comparing simulation and execution performances on a stereo matching algorithm prototyped on the TMS320C6678 8-core DSP device

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture

    Get PDF
    The current trend in high performance and embedded computing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In state of the art Model Driven Engineering (MDE) methods, different communities have developed custom architecture models associated to languages of substantial complexity. This fact contrasts with Models of Computation (MoCs) that provide abstract representations of an algorithm behavior as well as tool interoperability.In this report, we define the notion of Model of Architecture (MoA) and study the combination of a MoC and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. An MoA provides reproducible cost computation for evaluating the efficiency of a system. A new MoA called Linear System-Level Architecture Model (LSLA) is introduced and compared to state of the art models. LSLA aims at representing hardware efficiency with a linear model. The computed cost results from the mapping of an application, represented by a model conforming a MoC on an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors

    Get PDF
    International audienceCurrently, Multicore Digital Signal Processor (DSP) platforms are commonly used in telecommunications baseband processing. In the next few years, high performance DSPs are likely to combine many more DSP cores for signal processing with some General-Purpose Processor (GPP) cores for application control. As the number of cores increases in new DSP platform designs, scheduling of applications is becoming a complex operation. Meanwhile, the variability of the scheduled applications also tends to increase as applications become more sophisticated. Such variations require runtime adaptivity of application scheduling. This paper extends the previous work on adaptive scheduling by using the Hybrid Flow-Shop (HFS) scheduling method, which enables the device architecture to be modeled as a pipeline of Processing Elements (PEs) with multiple alternate PEs for each pipeline stage. HFS scheduling is applied to the scheduling of 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) telecommunication standard Uplink Physical Layer data processing (PUSCH). The experiments, conducted on an ARM Cortex-A9 GPP, show that an HFS scheduling algorithm has an overhead that increases very slowly with the number of PEs. This makes the method suitable for executing the adaptive scheduling in less than 1 ms for the 501 actors of a LTE PUSCH dataflow description executed on a 256-core architecture

    Optimization des algorithmes de calibration sur plate-forme embarquée manycore

    No full text
    International audienceThis paper presents the porting and the optimization of full polarization, direction independent calibration algorithm for radio-interferometry, on an embedded many-core platform. In astronomy, calibration algorithms consist of solving for the unknown complex antenna gains using a known model of the sky. Calibration is a key computation to provide images of the sky at good quality and high resolutions. In the context of the Square Kilometer Array (SKA) project, real-time and low power execution of the calibration is challenging. In this paper, we show that the CohJohnes algorithm provides good properties for being executed efficiently on the new generation of many-core embedded platforms. Experimental results are provided using the Kalray MPPA Bostan platform running 288 64-bit VLIW cores and delivering up to 845 GFLOPS at 12W.Le papier présente le portage et l'optimisation d'un algorithme calibration polarisé et indépendant de la direction pour des outils de radio interférométrie sur une plate-forme embarquée manycore. En radioastronomie, les algorithmes de calibration ont pour objectif de trouver les gains complexes inconnus des antennes à partir d'un modèle connu du ciel. La calibration est une étape primordiale permettant de fournir des images du ciel de bonne qualité et à haute-résolution. Dans le contexte du projet SKA (Square Kilometer Array), le challenge est de calculer ces algorithmes de calibration en temps-réel et à faible consommation électrique. Dans ce papier, nous montrons que les algorithmes de CohJohnes ont de bonnes propriétés pour être exécutées efficacement sur la nouvelle generation de processeurs embarqués manycores. Les résultats expérimentaux présentés montrent que les 288 coeurs VLIW 64-bits de la plateforme MPPA Bostan peuvent être utilisés générant 845 GFLOPS pour une consommation de 12W

    The spatial distribution of birds and carabid beetles in pine plantation forests: the role of landscape composition and structure

    No full text
    Aim To evaluate the joint and independent effects of spatial location, landscape composition and landscape structure on the distribution patterns of bird and carabid beetle assemblages in a mosaic landscape dominated by pine plantation forests. Location A continuous 3000-ha landscape mosaic with native maritime pine Pinus pinaster plantations of different ages, deciduous woodlands and open habitats, located in the Landes de Gascogne forest of south-western France. Methods We sampled breeding birds by 20-min point counts and carabid beetles by pitfall trapping using a systematic grid sampling of 200 points every 400 m over the whole landscape. Explanatory variables were composed of three data sets derived from GIS habitat mapping: (1) spatial variables (polynomial terms of geographical coordinates of samples), (2) landscape composition as the percentage cover of the six main habitats, and (3) landscape structure metrics including indices of fragmentation and spatial heterogeneity. We used canonical correspondence analysis with variance partitioning to evaluate the joint and independent effects of the three sets of variables on the ordination of species assemblages. Moran's I correlograms and Mantel tests were used to assess for spatial structure in species distribution and relationships with separate landscape attributes. Results Landscape composition was the main factor explaining the distribution patterns of birds and carabids at the mesoscale of 400 X 400 m. Independent effects of spatial variables and landscape structure were still significant for bird assemblages once landscape composition was controlled for, but not for carabid assemblages. Spatial distributions of birds and carabids were primarily influenced by the amount of heathlands, young pine plantations, herbaceous firebreaks and deciduous woodlands. Deciduous woodland species had positive responses to edge density, while open habitat species were positively associated with mean patch area. Main conclusions Forest birds were favoured by an increase in deciduous woodland cover and landscape heterogeneity, but there was no evidence for a similar effect on carabid beetles. Fragmentation of open habitats negatively affected both early-successional birds and carabids, specialist species being restricted to large heathlands and young plantations. Several birds of conservation concern were associated with mosaics of woodlands and grasslands, especially meadows and firebreaks. Conserving biodiversity in mosaic plantation landscapes could be achieved by the maintenance of a significant amount of early-successional habitats and deciduous woodland patches within a conifer plantation matri

    Hierarchical Dataflow Model for Efficient Programming of Clustered Manycore Processors

    No full text
    International audienceProgramming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded system designers. Dataflow Models of Computation (MoCs) are increasingly used for developing parallel applications as their high-level of abstraction eases the automation of mapping, task scheduling and memory allocation onto MPSoCs. This paper introduces a technique for deploying hierarchical dataflow graphs efficiently onto MPSoC. The proposed technique exploits different granularity of dataflow parallelism to generate both NoC-based communications and nested OpenMP loops. Deployment of an image processing application on a many-core MPSoC results in speedups of up to 58.7 compared to the sequential execution

    SPIDER: A Synchronous Parameterized and Interfaced Dataflow-Based RTOS for Multicore DSPs

    Get PDF
    International audienceThis paper introduces a novel Real-Time Operating System (RTOS) based on a parameterized dataflow Model of Computation (MoC). This RTOS, called Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPiDER), aims at efficiently scheduling Parameterized and Interfaced Synchronous Dataflow (PiSDF) graphs on multicore architectures. It exploits features of PiSDF to locate locally static regions that exhibit predictable application behavior. This paper uses a multicore signal processing benchmark to demonstrate that the SPiDER runtime can exploit more parallelism than a conventional multicore task scheduler. By comparing experimental results of the SPiDER runtime on an 8-core Texas Instruments Keystone I Digital Signal Processor (DSP) with those obtained from the OpenMP framework, latency improvements of up to 26% are demonstrated

    Implementation of a Fast Fourier Transform Algorithm onto a Manycore Processor

    Get PDF
    International audienceThe Fourier transform is the main processing step applied to data collected from the Square Kilometre Array (SKA) receivers. The requirement is to compute a Fourier transform of 2 19 real byte samples in real-time, while minimizing the power consumption. We address this challenge by optimizing a FFT implementation for execution on the Kalray MPPA manycore processor. Although this processor delivers high floating-point performances, we use fixed-point number representations in order to reduce the memory consumption and the I/O bandwidth. The result is an execution time of 1,07ms per FFT, including data transfers. This enables to use only two first-generation MPPA chips per flow of data coming from the receivers, for a total power consumption of 17.4W
    corecore