67 research outputs found

    Towards resource-aware computing for task-based runtimes and parallel architectures

    Get PDF
    Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. The increasing restrictions in terms of power consumption of High Performance Computing (HPC) systems and data centers have forced hardware vendors to include power capping capabilities in their commodity processors. Power capping opens up new opportunities for applications to directly manage their power behavior at user level. However, constraining power consumption causes the individual sockets of a parallel system to deliver different performance levels under the same power cap, even when they are equally designed, which is an effect caused by manufacturing variability. Modern chips suffer from heterogeneous power consumption due to manufacturing issues, a problem known as manufacturing or process variability. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this thesis we show that parallel systems benefit from taking into account the consequences of manufacturing variability, in terms of both performance and energy efficiency. In order to evaluate our work we have also implemented our own task-based version of the PARSEC benchmark suite. This allows to test our methodology using state-of-the-art parallelization techniques and real world workloads. We present two approaches to mitigate manufacturing variability, by power redistribution at runtime level and by power- and variability-aware job scheduling at system-wide level. A parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost. The next approach presented in this theis, we show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensures that power consumption stays under a system wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications.Los sistemas modernos de gran escala muestran crecientes demandas de energía, hasta el punto de que se ha convertido en una gran presión para las instalaciones y los presupuestos. Las restricciones crecientes de consumo de energía de los sistemas de alto rendimiento (HPC) y los centros de datos han obligado a los proveedores de hardware a incluir capacidades de limitación de energía en sus procesadores. La limitación de energía abre nuevas oportunidades para que las aplicaciones administren directamente su comportamiento de energía a nivel de usuario. Sin embargo, la restricción en el consumo de energía de sockets individuales de un sistema paralelo resulta en diferentes niveles de rendimiento, por el mismo límite de potencia, incluso cuando están diseñados por igual. Esto es un efecto causado durante el proceso de la fabricación. Los chips modernos sufren de un consumo de energía heterogéneo debido a problemas de fabricación, un problema conocido como variabilidad del proceso o fabricación. Como resultado, los sistemas que no consideran este tipo de variabilidad causada por problemas de fabricación conducen a degradaciones del rendimiento y desperdicio de energía. Para evitar dicho impacto negativo, los usuarios y administradores del sistema deben contrarrestar activamente cualquier variabilidad de fabricación. En esta tesis, demostramos que los sistemas paralelos se benefician de tener en cuenta las consecuencias de la variabilidad de la fabricación, tanto en términos de rendimiento como de eficiencia energética. Para evaluar nuestro trabajo, también hemos implementado nuestra propia versión del paquete de aplicaciones de prueba PARSEC, basada en tareas paralelos. Esto permite probar nuestra metodología utilizando técnicas avanzadas de paralelización con cargas de trabajo del mundo real. Presentamos dos enfoques para mitigar la variabilidad de fabricación, mediante la redistribución de la energía a durante la ejecución de las aplicaciones y mediante la programación de trabajos a nivel de todo el sistema. Se puede utilizar un sistema runtime paralelo para tratar con eficacia este nuevo tipo de heterogeneidad de rendimiento, compensando los efectos desiguales de la limitación de potencia. En el contexto de un nodo NUMA compuesto de varios sockets y núcleos, nuestro sistema puede optimizar los niveles de energía y concurrencia asignados a cada socket para maximizar el rendimiento. Aplicado de manera transparente dentro del sistema runtime paralelo, no requiere ninguna interacción del programador como cambiar el código fuente de la aplicación o reconfigurar manualmente el sistema paralelo. Comparamos nuestro novedoso análisis de runtime con los resultados óptimos, obtenidos de una análisis manual exhaustiva, y demostramos que puede lograr el mismo rendimiento a una fracción del costo. El siguiente enfoque presentado en esta tesis, muestra que es posible predecir el impacto de la variabilidad de fabricación en aplicaciones específicas mediante el uso de modelos de predicción de potencia conscientes de la variabilidad. Basados ​​en estos modelos de predicción de energía, proponemos dos políticas de programación de trabajos que consideran los efectos de la variabilidad de fabricación para cada aplicación y que aseguran que el consumo se mantiene bajo un presupuesto de energía de todo el sistema. Evaluamos nuestras políticas con diferentes presupuestos de energía y escenarios de tráfico, que consisten en aplicaciones paralelas que corren en uno o varios nodos.Postprint (published version

    Towards resource-aware computing for task-based runtimes and parallel architectures

    Get PDF
    Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. The increasing restrictions in terms of power consumption of High Performance Computing (HPC) systems and data centers have forced hardware vendors to include power capping capabilities in their commodity processors. Power capping opens up new opportunities for applications to directly manage their power behavior at user level. However, constraining power consumption causes the individual sockets of a parallel system to deliver different performance levels under the same power cap, even when they are equally designed, which is an effect caused by manufacturing variability. Modern chips suffer from heterogeneous power consumption due to manufacturing issues, a problem known as manufacturing or process variability. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this thesis we show that parallel systems benefit from taking into account the consequences of manufacturing variability, in terms of both performance and energy efficiency. In order to evaluate our work we have also implemented our own task-based version of the PARSEC benchmark suite. This allows to test our methodology using state-of-the-art parallelization techniques and real world workloads. We present two approaches to mitigate manufacturing variability, by power redistribution at runtime level and by power- and variability-aware job scheduling at system-wide level. A parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost. The next approach presented in this theis, we show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensures that power consumption stays under a system wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications.Los sistemas modernos de gran escala muestran crecientes demandas de energía, hasta el punto de que se ha convertido en una gran presión para las instalaciones y los presupuestos. Las restricciones crecientes de consumo de energía de los sistemas de alto rendimiento (HPC) y los centros de datos han obligado a los proveedores de hardware a incluir capacidades de limitación de energía en sus procesadores. La limitación de energía abre nuevas oportunidades para que las aplicaciones administren directamente su comportamiento de energía a nivel de usuario. Sin embargo, la restricción en el consumo de energía de sockets individuales de un sistema paralelo resulta en diferentes niveles de rendimiento, por el mismo límite de potencia, incluso cuando están diseñados por igual. Esto es un efecto causado durante el proceso de la fabricación. Los chips modernos sufren de un consumo de energía heterogéneo debido a problemas de fabricación, un problema conocido como variabilidad del proceso o fabricación. Como resultado, los sistemas que no consideran este tipo de variabilidad causada por problemas de fabricación conducen a degradaciones del rendimiento y desperdicio de energía. Para evitar dicho impacto negativo, los usuarios y administradores del sistema deben contrarrestar activamente cualquier variabilidad de fabricación. En esta tesis, demostramos que los sistemas paralelos se benefician de tener en cuenta las consecuencias de la variabilidad de la fabricación, tanto en términos de rendimiento como de eficiencia energética. Para evaluar nuestro trabajo, también hemos implementado nuestra propia versión del paquete de aplicaciones de prueba PARSEC, basada en tareas paralelos. Esto permite probar nuestra metodología utilizando técnicas avanzadas de paralelización con cargas de trabajo del mundo real. Presentamos dos enfoques para mitigar la variabilidad de fabricación, mediante la redistribución de la energía a durante la ejecución de las aplicaciones y mediante la programación de trabajos a nivel de todo el sistema. Se puede utilizar un sistema runtime paralelo para tratar con eficacia este nuevo tipo de heterogeneidad de rendimiento, compensando los efectos desiguales de la limitación de potencia. En el contexto de un nodo NUMA compuesto de varios sockets y núcleos, nuestro sistema puede optimizar los niveles de energía y concurrencia asignados a cada socket para maximizar el rendimiento. Aplicado de manera transparente dentro del sistema runtime paralelo, no requiere ninguna interacción del programador como cambiar el código fuente de la aplicación o reconfigurar manualmente el sistema paralelo. Comparamos nuestro novedoso análisis de runtime con los resultados óptimos, obtenidos de una análisis manual exhaustiva, y demostramos que puede lograr el mismo rendimiento a una fracción del costo. El siguiente enfoque presentado en esta tesis, muestra que es posible predecir el impacto de la variabilidad de fabricación en aplicaciones específicas mediante el uso de modelos de predicción de potencia conscientes de la variabilidad. Basados ​​en estos modelos de predicción de energía, proponemos dos políticas de programación de trabajos que consideran los efectos de la variabilidad de fabricación para cada aplicación y que aseguran que el consumo se mantiene bajo un presupuesto de energía de todo el sistema. Evaluamos nuestras políticas con diferentes presupuestos de energía y escenarios de tráfico, que consisten en aplicaciones paralelas que corren en uno o varios nodos

    Practical Real-Time with Look-Ahead Scheduling

    Get PDF
    In my dissertation, I present ATLAS — the Auto-Training Look-Ahead Scheduler. ATLAS improves service to applications with regard to two non-functional properties: timeliness and overload detection. Timeliness is an important requirement to ensure user interface responsiveness and the smoothness of multimedia operations. Overload can occur when applications ask for more computation time than the machine can offer. Interactive systems have to handle overload situations dynamically at runtime. ATLAS provides timely service to applications, accessible through an easy-to-use interface. Deadlines specify timing requirements, workload metrics describe jobs. ATLAS employs machine learning to predict job execution times. Deadline misses are detected before they occur, so applications can react early.:1 Introduction 2 Anatomy of a Desktop Application 3 Real Simple Real-Time 4 Execution Time Prediction 5 System Scheduler 6 Timely Service 7 The Road Ahead Bibliography Inde

    Management And Security Of Multi-Cloud Applications

    Get PDF
    Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Data systems elements technology assessment and system specifications, issue no. 2

    Get PDF
    The ability to satisfy the objectives of future NASA Office of Applications programs is dependent on technology advances in a number of areas of data systems. The hardware and software technology of end-to-end systems (data processing elements through ground processing, dissemination, and presentation) are examined in terms of state of the art, trends, and projected developments in the 1980 to 1985 timeframe. Capability is considered in terms of elements that are either commercially available or that can be implemented from commercially available components with minimal development

    Nova combinação de hardware e de software para veículos de desporto automóvel baseada no processamento directo de funções gráficas

    Get PDF
    Doutoramento em Engenharia EletrónicaThe main motivation for the work presented here began with previously conducted experiments with a programming concept at the time named "Macro". These experiments led to the conviction that it would be possible to build a system of engine control from scratch, which could eliminate many of the current problems of engine management systems in a direct and intrinsic way. It was also hoped that it would minimize the full range of software and hardware needed to make a final and fully functional system. Initially, this paper proposes to make a comprehensive survey of the state of the art in the specific area of software and corresponding hardware of automotive tools and automotive ECUs. Problems arising from such software will be identified, and it will be clear that practically all of these problems stem directly or indirectly from the fact that we continue to make comprehensive use of extremely long and complex "tool chains". Similarly, in the hardware, it will be argued that the problems stem from the extreme complexity and inter-dependency inside processor architectures. The conclusions are presented through an extensive list of "pitfalls" which will be thoroughly enumerated, identified and characterized. Solutions will also be proposed for the various current issues and for the implementation of these same solutions. All this final work will be part of a "proof-of-concept" system called "ECU2010". The central element of this system is the before mentioned "Macro" concept, which is an graphical block representing one of many operations required in a automotive system having arithmetic, logic, filtering, integration, multiplexing functions among others. The end result of the proposed work is a single tool, fully integrated, enabling the development and management of the entire system in one simple visual interface. Part of the presented result relies on a hardware platform fully adapted to the software, as well as enabling high flexibility and scalability in addition to using exactly the same technology for ECU, data logger and peripherals alike. Current systems rely on a mostly evolutionary path, only allowing online calibration of parameters, but never the online alteration of their own automotive functionality algorithms. By contrast, the system developed and described in this thesis had the advantage of following a "clean-slate" approach, whereby everything could be rethought globally. In the end, out of all the system characteristics, "LIVE-Prototyping" is the most relevant feature, allowing the adjustment of automotive algorithms (eg. Injection, ignition, lambda control, etc.) 100% online, keeping the engine constantly working, without ever having to stop or reboot to make such changes. This consequently eliminates any "turnaround delay" typically present in current automotive systems, thereby enhancing the efficiency and handling of such systems.A principal motivação para o trabalho que conduziu a esta tese residiu na constatação de que os actuais métodos de modelação de centralinas automóveis conduzem a significativos problemas de desenvolvimento e manutenção. Como resultado dessa constatação, o objectivo deste trabalho centrou-se no desenvolvimento de um conceito de arquitectura que rompe radicalmente com os modelos state-of-the-art e que assenta num conjunto de conceitos que vieram a ser designados de "Macro" e "Celular ECU". Com este modelo pretendeu-se simultaneamente minimizar a panóplia de software e de hardware necessários à obtenção de uma sistema funcional final. Inicialmente, esta tese propõem-se fazer um levantamento exaustivo do estado da arte na área específica do software e correspondente hardware das ferramentas e centralinas automóveis. Os problemas decorrentes de tal software serão identificados e, dessa identificação deverá ficar claro, que praticamente todos esses problemas têm origem directa ou indirecta no facto de se continuar a fazer um uso exaustivo de "tool chains" extremamente compridas e complexas. De forma semelhante, no hardware, os problemas têm origem na extrema complexidade e inter-dependência das arquitecturas dos processadores. As consequências distribuem-se por uma extensa lista de "pitfalls" que também serão exaustivamente enumeradas, identificadas e caracterizadas. São ainda propostas soluções para os diversos problemas actuais e correspondentes implementações dessas mesmas soluções. Todo este trabalho final faz parte de um sistema "proof-of-concept" designado "ECU2010". O elemento central deste sistema é o já referido conceito de “Macro”, que consiste num bloco gráfico que representa uma de muitas operações necessárias num sistema automóvel, como sejam funções aritméticas, lógicas, de filtragem, de integração, de multiplexagem, entre outras. O resultado final do trabalho proposto assenta numa única ferramenta, totalmente integrada que permite o desenvolvimento e gestão de todo o sistema de forma simples numa única interface visual. Parte do resultado apresentado assenta numa plataforma hardware totalmente adaptada ao software, bem como na elevada flexibilidade e escalabilidade, para além de permitir a utilização de exactamente a mesma tecnologia quer para a centralina, como para o datalogger e para os periféricos. Os sistemas actuais assentam num percurso maioritariamente evolutivo, apenas permitindo a calibração online de parâmetros, mas nunca a alteração online dos próprios algoritmos das funcionalidades automóveis. Pelo contrário, o sistema desenvolvido e descrito nesta tese apresenta a vantagem de seguir um "clean-slate approach", pelo que tudo pode ser globalmente repensado. No final e para além de todas as restantes características, o “LIVE-PROTOTYPING” é a funcionalidade mais relevante, ao permitir alterar algoritmos automóveis (ex: injecção, ignição, controlo lambda, etc.) de forma 100% online, mantendo o motor constantemente a trabalhar e sem nunca ter de o parar ou re-arrancar para efectuar tais alterações. Isto elimina consequentemente qualquer "turnaround delay" tipicamente presente em qualquer sistema automóvel actual, aumentando de forma significativa a eficiência global do sistema e da sua utilização

    NASA Tech Briefs, December 1990

    Get PDF
    Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
    corecore