Search CORE

49 research outputs found

Multi-Criteria Optimization of Real-Time DAGs on Heterogeneous Platforms under P-EDF

Author: Amory Alexandre
Ara Gabriele
Cucinotta Tommaso
Natale Marco Di
Paladino Francesco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

This paper tackles the problem of optimal placement of complex real-time embedded applications on heterogeneous platforms. Applications are composed of directed acyclic graphs of tasks, with each DAG having a minimum inter-arrival period for its activation requests, and an end-to-end deadline within which all of the computations need to terminate since each activation. The platforms of interest are heterogeneous power-aware multi-core platforms with DVFS capabilities, including big.LITTLE Arm architectures, and platforms with GPU or FPGA hardware accelerators with Dynamic Partial Reconfiguration capabilities. Tasks can be deployed on CPUs using partitioned EDF-based scheduling. Additionally, some of the tasks may have an alternate implementation available for one of the accelerators on the target platform, which are assumed to serve requests in non-preemptive FIFO order. The system can be optimized by: minimizing power consumption, respecting precise timing constraints; maximizing the applications’ slack, respecting given power consumption constraints; or even a combination of these, in a multi-objective formulation. We propose an off-line optimization of the mentioned problem based on mixed-integer quadratic constraint programming (MIQCP). The optimization provides the DVFS configuration of all the CPUs (or accelerators) capable of frequency switching and the placement to be followed by each task in the DAGs, including the software-vs-hardware implementation choice for tasks that can be hardware-accelerated. For relatively big problems, we developed heuristic solvers capable of providing suboptimal solutions in a significantly reduced time compared to the MIQCP strategy, thus widening the applicability of the proposed framework. We validate the approach by running a set of randomly generated DAGs on Linux under SCHED_DEADLINE, deployed onto two real boards, one with Arm big.LITTLE architecture, the other with FPGA acceleration, verifying that the experimental runs meet the theoretical expectations in terms of timing and power optimization goals

Archivio della ricerca della Scuola Superiore Sant'Anna

Recommended from our members

Optimizing Constrainted Concurrent Applications at Run-time

Author: Fay Daniel Riley
Publication venue: CU Scholar
Publication date: 01/01/2011
Field of study

Computer systems are resource constrained. Application adaptation is a useful way to optimize system resource usage while satisfying an application’s performance requirements. Current multicore computer systems supporting these applications, however, are not designed to reliably meet these requirements. Meanwhile, these computer systems are resource-limited, e.g., have power-induced energy and thermal constraints. Compounding the application’s performance requirements are increasingly-stringent microprocessor thermal constraints. Previous application adaptation efforts, however, were ad-hoc, time-consuming, and highly application-specific, with limited portability between computer systems. This thesis presents OCCAM, a software platform for developing multicore adaptable applications. OCCAM’s design-time platform consists of design patterns, APIs, and data structures that allow application developers to specify the performance constraints and application-specific optimization techniques. OCCAM generates a run-time controller offline, using profiling data. It then uses this profiling data to generate an internal model that it subsequently employs to generate a robust Markov Decision Process-based Model Predictive Controller. Using a set of Recognition, Mining, and Synthesis benchmarks, the experimental study demonstrates that OCCAM can successfully optimize the system while meeting the systems performance requirements across a wide range of computer platforms, ranging from an energy-constrained single-core system to a high-performance 16-core system. Finally, OCCAM presents a simulation-based, stochastic model checking-based framework for quantifying the robustness of the controller

CU Scholar Institutional Repository

Scheduling analysis of fixed priority hard real-time systems with multiframe tasks

Author: Zuhily Areej
Publication venue: University of York
Publication date: 01/01/2009
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

CiteSeerX

White Rose E-theses Online

OpenGrey Repository

Mixed Criticality Systems - A Review : (13th Edition, February 2022)

Author: Burns Alan
Davis Robert Ian
Publication venue
Publication date: 15/02/2022
Field of study

This review covers research on the topic of mixed criticality systems that has been published since Vestal’s 2007 paper. It covers the period up to end of 2021. The review is organised into the following topics: introduction and motivation, models, single processor analysis (including job-based, hard and soft tasks, fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, related topics, realistic models, formal treatments, systems issues, industrial practice and research beyond mixed-criticality. A list of PhDs awarded for research relating to mixed-criticality systems is also included

White Rose Research Online

Exploring Processor and Memory Architectures for Multimedia

Author: Iranpour Ali
Publication venue
Publication date: 01/01/2012
Field of study

Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

CiteSeerX

Lund University Publications

Dynamic voltage and frequency scaling for multiprocessors embedded applications with soft delay deadlines

Author: Pepe Pedro Carlos Fazolino, 1978-
Publication venue: [s.n.]
Publication date: 21/08/2018
Field of study

Orientador: Alice Maria Bastos Hubinger TokarniaDissertação (mestrado) - Universidade Estadual de Campinas,Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Este trabalho apresenta quatro algoritmos de escalonamento dinâmico de Tensão e Frequência (DVFS) em sistemas multiprocessador baseado em caminhos de execução. Nossos alvos são aplicações multimídia executadas em sistemas embarcados, com especificação de qualidade por taxa mínima de entradas (QoS) processadas. Uma fração mínima de entradas, geralmente quadros de dados, precisa ser completamente processada no tempo máximo de resposta especificado. O objetivo dos algoritmos é atuar em quatro cenários que correspondem a sistemas com diferentes possibilidades de escalonamento dinâmico de tensão e frequência e diferentes capacidades de monitoramento da qualidade de serviço. No primeiro cenário, todos os pacotes de dados de entrada recebidos devem ser processados dentro do tempo máximo especificado e o nível de tensão/frequência pode ser ajustado no início da execução da aplicação, sendo o mesmo para todos os processadores. Este cenário é referência para comparação de resultados para os outros cenários. Para o segundo cenário, o nível de tensão/frequência pode ser definido individualmente para um processador, no início da execução de cada tarefa, e dados de entrada de classes específicas podem ser descartados. O terceiro cenário possibilita, além do descarte de classes específicas de dados de entrada, o ajuste do nível de tensão/frequência de cada tarefa de acordo com a classe de dados de entrada a ser processada. O algoritmo desenvolvido para o quarto cenário trata dinamicamente de alterações na distribuição probabilística das classes de entrada, calculando novos níveis de tensão/frequência para as tarefas e classes de entrada de modo que a especificação de qualidade continue a ser satisfeita, de forma eficiente. Para uma aplicação de cancelamento de eco acústico, executada em 4 processadores, com taxa mínima de processamento igual a 50%, o algoritmo de escalonamento de tensão e frequência, no cenário 3, conseguiu reduzir o consumo de energia em cerca de 71%, comparado ao cenário 1. No cenário 4, simulamos para esta aplicação uma modificação simultânea de 10 pontos percentuais na distribuição das classes de entrada em 3 tarefas causando aumentos do número de descartes. O algoritmo proposto para o cenário 4 manteve a qualidade mínima com um aumento de apenas 6% no consumo de energia, quando comparado ao consumo de energia da configuração inicial definida para o cenário 3Abstract: This work presents four execution-path based Dynamic Voltage/Frequency Scaling (DVFS) algorithms for multiprocessor systems. The targets are embedded systems multimedia applications, with minimum input data completion rate specification (QoS). A minimum fraction of input data, usually data frames, should be processed within the specified deadline. These algorithms aim to operate in four scenarios corresponding to systems with different possibilities of dynamic voltage and frequency scheduling and different QoS monitoring capabilities. In the first scenario, all received data frames should be treated within the deadline and the voltage/frequency operational level can be adjusted at the beginning of the application execution, and must be the same for all processors. This scenario is a reference for comparison of results obtained for the other scenarios. For the second scenario, the voltage/frequency operational level can be set individually for each processor at the beginning of each task execution, and input data frames of specific input classes can be discarded. The third scenario allows, besides discarding specific classes of input data, it is possible to adjust the operation level for each task, according to the class of the input data to be treated. The algorithm for the fourth scenario operates online, computing new voltage/frequency levels and making new decisions about class discarding to cope with changes in probability distribution of input classes. Its goal is to maintain the specified quality with low energy consumption. In an application of acoustic echo cancellation running on a system with 4 processors, with a rate of inputs completely processed specified as 50%, the algorithm for scenario 3 achieved a reduction in consumption close to 71%, comparing to the results for scenario 1. During simulation, this application has been subjected to simultaneous changes of 10% in the input class distributions of three discarding tasks, reducing system quality. The algorithm for scenario 4, maintained the minimum quality with just 6% increase in power consumption, when compared to the consumption of the initial configuration for scenario 3MestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

Repositorio da Producao Cientifica e Intelectual da Unicamp

NIAC Phase II Orbiting Rainbows: Future Space Imaging with Granular Systems

Author: Arumugam Darmindra
Basinger Scott
Quadrelli Marco B.
Swartzlander Grover
Publication venue
Publication date
Field of study

Inspired by the light scattering and focusing properties of distributed optical assemblies in Nature, such as rainbows and aerosols, and by recent laboratory successes in optical trapping and manipulation, we propose a unique combination of space optics and autonomous robotic system technology, to enable a new vision of space system architecture with applications to ultra-lightweight space optics and, ultimately, in-situ space system fabrication. Typically, the cost of an optical system is driven by the size and mass of the primary aperture. The ideal system is a cloud of spatially disordered dust-like objects that can be optically manipulated: it is highly reconfigurable, fault-tolerant, and allows very large aperture sizes at low cost. This new concept is based on recent understandings in the physics of optical manipulation of small particles in the laboratory and the engineering of distributed ensembles of spacecraft swarms to shape an orbiting cloud of micron-sized objects. In the same way that optical tweezers have revolutionized micro- and nano-manipulation of objects, our breakthrough concept will enable new large scale NASA mission applications and develop new technology in the areas of Astrophysical Imaging Systems and Remote Sensing because the cloud can operate as an adaptive optical imaging sensor. While achieving the feasibility of constructing one single aperture out of the cloud is the main topic of this work, it is clear that multiple orbiting aerosol lenses could also combine their power to synthesize a much larger aperture in space to enable challenging goals such as exo-planet detection. Furthermore, this effort could establish feasibility of key issues related to material properties, remote manipulation, and autonomy characteristics of cloud in orbit. There are several types of endeavors (science missions) that could be enabled by this type of approach, i.e. it can enable new astrophysical imaging systems, exo-planet search, large apertures allow for unprecedented high resolution to discern continents and important features of other planets, hyperspectral imaging, adaptive systems, spectroscopy imaging through limb, and stable optical systems from Lagrange-points. Furthermore, future micro-miniaturization might hold promise of a further extension of our dust aperture concept to other more exciting smart dust concepts with other associated capabilities. Our objective in Phase II was to experimentally and numerically investigate how to optically manipulate and maintain the shape of an orbiting cloud of dust-like matter so that it can function as an adaptable ultra-lightweight surface. Our solution is based on the aperture being an engineered granular medium, instead of a conventional monolithic aperture. This allows building of apertures at a reduced cost, enables extremely fault-tolerant apertures that cannot otherwise be made, and directly enables classes of missions for exoplanet detection based on Fourier spectroscopy with tight angular resolution and innovative radar systems for remote sensing. In this task, we have examined the advanced feasibility of a crosscutting concept that contributes new technological approaches for space imaging systems, autonomous systems, and space applications of optical manipulation. The proposed investigation has matured the concept that we started in Phase I to TRL 3, identifying technology gaps and candidate system architectures for the space-borne cloud as an aperture

NASA Technical Reports Server

Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

Author: Trotta Pascal
Publication venue: Politecnico di Torino
Publication date: 01/01/2016
Field of study

Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

NASA Tech Briefs, April 1990

Author
Publication venue
Publication date
Field of study

Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences

NASA Technical Reports Server