1,378 research outputs found

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Computation of Buffer Capacities for Throughput Constrained and Data Dependent Inter-Task Communication

    Get PDF
    Streaming applications are often implemented as task graphs. Currently, techniques exist to derive buffer capacities that guarantee satisfaction of a throughput constraint for task graphs in which the inter-task communication is data-independent, i.e. the amount of data produced and consumed is independent of the data values in the processed stream. This paper presents a technique to compute buffer capacities that satisfy a throughput constraint for task graphs with data dependent inter-task communication, given that the task graph is a chain. We demonstrate the applicability of the approach by computing buffer capacities for an MP3 playback application, of which the MP3 decoder has a variable consumption rate. We are not aware of alternative approaches to compute buffer capacities that guarantee satisfaction of the throughput constraint for this application

    Octopus - an energy-efficient architecture for wireless multimedia systems

    Get PDF
    Multimedia computing and mobile computing are two trends that will lead to a new application domain in the near future. However, the technological challenges to establishing this paradigm of computing are non-trivial. Personal mobile computing offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The approach we made to achieve such a system is to use autonomous, adaptable modules, interconnected by a switch rather than by a bus, and to offload as much as work as possible from the CPU to programmable modules that is placed in the data streams. A reconfigurable internal communication network switch called Octopus exploits locality of reference and eliminates wasteful data copies

    Predictable embedded multiprocessor architecture for streaming applications

    Get PDF
    The focus of this thesis is on embedded media systems that execute applications from the application domain car infotainment. These applications, which we refer to as jobs, typically fall in the class of streaming, i.e. they process on a stream of data. The jobs are executed on heterogeneous multiprocessor platforms, for performance and power efficiency reasons. Most of these jobs have firm real-time requirements, like throughput and end-to-end latency. Car-infotainment systems become increasingly more complex, due to an increase in the supported number of jobs and an increase of resource sharing. Therefore, it is hard to verify, for each job, that the realtime requirements are satisfied. To reduce the verification effort, we elaborate on an architecture for a predictable system from which we can verify, at design time, that the job’s throughput and end-to-end latency requirements are satisfied. This thesis introduces a network-based multiprocessor system that is predictable. This is achieved by starting with an architecture where processors have private local memories and execute tasks in a static order, so that the uncertainty in the temporal behaviour is minimised. As an interconnect, we use a network that supports guaranteed communication services so that it is guaranteed that data is delivered in time. The architecture is extended with shared local memories, run-time scheduling of tasks, and a memory hierarchy. Dataflow modelling and analysis techniques are used for verification, because they allow cyclic data dependencies that influence the job’s performance. Shown is how to construct a dataflow model from a job that is mapped onto our predictable multiprocessor platforms. This dataflow model takes into account: computation of tasks, communication between tasks, buffer capacities, and scheduling of shared resources. The job’s throughput and end-to-end latency bounds are derived from a self-timed execution of the dataflow graph, by making use of existing dataflow-analysis techniques. It is shown that the derived bounds are tight, e.g. for our channel equaliser job, the accuracy of the derived throughput bound is within 10.1%. Furthermore, it is shown that the dataflow modelling and analysis techniques can be used despite the use of shared memories, run-time scheduling of tasks, and caches

    Architecture Design Space Exploration for Streaming Applications Through Timing Analysis

    Get PDF
    In this paper we compare the maximum achievable throughput of different memory organisations of the processing elements that constitute a multiprocessor system on chip. This is done by modelling the mapping of a task with input and output channels on a processing element as a homogeneous synchronous dataflow graph, and use maximum cycle mean analysis to derive the throughput. In a HiperLAN2 case study we show how these techniques can be used to derive the required clock frequency and communication latencies in order to meet the application's throughput requirement on a multiprocessor system on chip that has one of the investigated memory organisations

    SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs

    Full text link
    In modern heterogeneous MPSoCs, the management of shared memory resources is crucial in delivering end-to-end QoS. Previous frameworks have either focused on singular QoS targets or the allocation of partitionable resources among CPU applications at relatively slow timescales. However, heterogeneous MPSoCs typically require instant response from the memory system where most resources cannot be partitioned. Moreover, the health of different cores in a heterogeneous MPSoC is often measured by diverse performance objectives. In this work, we propose a Self-Aware Resource Allocation (SARA) framework for heterogeneous MPSoCs. Priority-based adaptation allows cores to use different target performance and self-monitor their own intrinsic health. In response, the system allocates non-partitionable resources based on priorities. The proposed framework meets a diverse range of QoS demands from heterogeneous cores.Comment: Accepted by the 55th annual Design Automation Conference 2018 (DAC'18

    Reconfigurable media coding: a new specification model for multimedia coders

    Get PDF
    Multimedia coding technology, after about 20 years of active research, has delivered a rich variety of different and complex coding algorithms. Selecting an appropriate subset of these algorithms would, in principle, enable a designer to produce the codec supporting any desired functionality as well as any desired trade-off between compression performance and implementation complexity. Currently, interoperability demands that this selection process be hard-wired into the normative descriptions of the codec, or at a lower level, into a predefined number of choices, known as profiles, codified within each standard specification. This paper presents an alternative paradigm for codec deployment that is currently under development by MPEG, known as Reconfigurable Media Coding (RMC). Using the RMC framework, arbitrary combinations of fundamental algorithms may be assembled, without predefined standardization, because everything necessary for specifying the decoding process is delivered alongside the content itself. This side-information consists of a description of the bitstream syntax, as well as a description of the decoder configuration. Decoder configuration information is provided as a description of the interconnections between algorithmic blocks. The approach has been validated by development of an RMC format that matches MPEG-4 Video, and then extending the format by adding new chroma-subsampling patterns
    • …
    corecore