4,527 research outputs found

    Virtus project: A scalable aggregation platform for the intelligent virtual management of distributed energy resources

    Get PDF
    open5noVIRTUS Project, funded by Cassa per i Servizi Energetici e Ambientali (Fund for Energy and Environmental Services-CSEA)-Project code CCSEB_00094.The VIRTUS project aims to create a Virtual Power Plant (VPP) prototype coordinating the Distributed Energy Resources (DERs) of the power system and providing services to the system operators and the various players of the electricity markets, with a particular focus on the industrial sector agents. The VPP will be able to manage a significant number of DERs and simulate realistic plants, components, and market data to study different operating conditions and the future impact of the policy changes of the Balancing Markets (BM). This paper describes the project’s aim, the general structure of the proposed framework, and its optimization and simulation modules. Then, we assess the scalability of the optimization module, designed to provide the maximum possible flexibility to the system operators, exploiting the simulation module of the VPP.openBianchi S.; De Filippo A.; Magnani S.; Mosaico G.; Silvestro F.Bianchi S.; De Filippo A.; Magnani S.; Mosaico G.; Silvestro F

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Exploiting asymmetric multi-core systems with flexible system software

    Get PDF
    Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. These architectures combine different types of processing cores designed at different performance and power optimization points, thus exposing a performance-power trade-off. By maintaining two types of cores, AMCs are able to provide high performance under the facility power budget. However, there are significant challenges when using AMCs such as scheduling and load balancing. This thesis initially explores the potential of AMCs when executing current HPC applications and searches for the most appropriate execution model. Specifically we evaluate several execution models on an Arm big.LITTLE AMC using the PARSEC benchmark suite that includes representative HPC applications. We compare schedulers at the user, OS and runtime system levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system as it improves the user-level scheduling by 23%, while the heterogeneous-aware OS scheduling solution improves the user-level scheduling by 10%. Following this outcome, this thesis focuses on increasing performance of AMC systems by improving scheduling in the runtime system level. Scheduling in the runtime system level is provided by the use of task-based parallel programming models. These programming models offer programming flexibility as they consist of an interface and a runtime system to manage the underlying resources and threads. In this thesis we improve scheduling with task-based programming models by providing three novel task schedulers for AMCs. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with an existing state-of the art heterogeneous scheduler and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45x on a real 8-core AMC and up to 2.1x on a simulated 32-core AMC. Another enhancement we provide in task-based programming models is the adaptability to fine grained parallelism. The increasing number of cores on modern CMPs is pushing research towards the use of fine grained workloads, which is an important challenge for task-based programming models. Our study makes the observation that task creation becomes a bottleneck when executing fine grained workloads with task-based programming models. As the number of cores increases, the time spent generating tasks is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX minimizes task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. From our evaluation using 11 HPC workloads on both symmetric and AMC systems, we obtain performance improvements up to 15x, averaging to 3.1x over the baseline. Finally, this thesis presents a showcase for a real-time CPU scheduler with the goal to increase the frames per second (FPS) of the game-play on mobile devices with AMC systems. We design and implement the RTS scheduler in the Android framework. RTS provides an efficient scheduling policy that takes into account the current temperature of the system to perform task migration. RTS solution increases the median FPS of the baseline mechanisms by up to 7.5% and at the same time it maintains temperature stable.Los procesadores multinúcleos asimétricos (AMC) son una solución arquitectónica exitosa para dispositivos móviles y supercomputadores. Estas arquitecturas combinan diferentes tipos de núcleos de procesamiento diseñados con diferentes propiedades de rendimiento y potencia. Al mantener dos o más tipos de núcleos, los AMCs pueden proporcionar un alto rendimiento con un consumo bajo de energía de las infraestructuras. Sin embargo, existen importantes desafíos al usar los AMC, como la programación y el equilibrio de carga. Esta tesis explora inicialmente el potencial de los AMC al ejecutar aplicaciones actuales de Computacion de Alto Rendimiento (HPC) y busca el modelo de ejecución más apropiado para ellas. Específicamente evaluamos varios modelos de ejecución en un procesador asimétrico Arm big.LITTLE utilizando las aplicaciones PARSEC que son aplicaciones representativas de HPC. En este trabajo se compara la programación en los niveles de usuario, sistema operativo y librería y evaluamos el impacto de estas opciones en el conocido problema de equilibrar la carga entre los AMCs. Nuestros resultados demuestran que la programación es más efectiva cuando se lleva a cabo en el nivel del runtime, ya que mejora la programación del nivel de usuario en un 23%, mientras que la solución de programación del sistema operativo heterogéneo mejora la programación del nivel de usuario en un 10%. Siguiendo este resultado, esta tesis se centra en aumentar el rendimiento de los sistemas AMC mejorando la programación al nivel de librería. La programación en este nivel se proporciona mediante el uso de Modelos de Programación Paralelos Basados en Tareas (MPBT). Estos modelos de programación ofrecen flexibilidad de programación, ya que consisten en una interfaz y un runtime para administrar los recursos e hilos subyacentes. En esta tesis, mejoramos la programación con MPBT al proporcionar tres nuevos planificadores de tareas para AMCs. Estos planificadores dinámicos reducen el tiempo total de ejecución ya sea detectando la camino más largo o el camino crítico del grafo de dependencia de tareas de la aplicación, que es generado dinámicamente. En nuestra evaluación, comparamos estos planificadores con un planificador heterogéneo existente y demonstramos su mejora sobre un planificador FIFO. Mostramos que los planificadores heterogéneos mejoran el planificador FIFO en hasta 1.45x en un AMC real de 8 núcleos y hasta 2.1x en un AMC simulado de 32 núcleos. Otra contribución en los MPBT es la adaptabilidad al paralelismo de grano fino. El creciente número de núcleos en los chip multinúcleos modernos está empujando la investigación hacia el uso de cargas de trabajo de grano fino, que es un desafío importante para los MPBT. Nuestro estudio observa que la creación de tareas bloquea la ejecución con cargas de trabajo de grano fino con MPBT. Cuando el número de núcleos aumenta, el tiempo empleado en generar tareas pasa a ser más crítico para toda la ejecución. Nuestra solución es TaskGenX, que minimiza los costes de creación de tareas y se basa en una extensión del runtime y en un hardware dedicado. En el runtime, TaskGenX desacopla la creación de tareas de las otras actividades del runtime, ejecutando esta actividad en un hardware especializado. Evaluamos 11 aplicaciones de HPC con TaskGenX en sistemas simétricos y AMC y obtenemos mejoras de rendimiento de hasta 15x, con un promedio de 3.1x sobre la implementación de referencia. Finalmente, esta tesis presenta un planificador de CPU con el objetivo de aumentar los fotogramas por segundo (FPS) para juegos en dispositivos móviles con sistemas AMC. Diseñamos e implementamos el planificador de Real-Time Scheduler (RTS) en Android. El RTS proporciona una política de programación eficiente que tiene en cuenta la temperatura actual del sistema para realizar la migración de tareas. La solución RTS aumenta la FPS mediana de los mecanismos de referenci

    Exploiting asymmetric multi-core systems with flexible system software

    Get PDF
    Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. These architectures combine different types of processing cores designed at different performance and power optimization points, thus exposing a performance-power trade-off. By maintaining two types of cores, AMCs are able to provide high performance under the facility power budget. However, there are significant challenges when using AMCs such as scheduling and load balancing. This thesis initially explores the potential of AMCs when executing current HPC applications and searches for the most appropriate execution model. Specifically we evaluate several execution models on an Arm big.LITTLE AMC using the PARSEC benchmark suite that includes representative HPC applications. We compare schedulers at the user, OS and runtime system levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system as it improves the user-level scheduling by 23%, while the heterogeneous-aware OS scheduling solution improves the user-level scheduling by 10%. Following this outcome, this thesis focuses on increasing performance of AMC systems by improving scheduling in the runtime system level. Scheduling in the runtime system level is provided by the use of task-based parallel programming models. These programming models offer programming flexibility as they consist of an interface and a runtime system to manage the underlying resources and threads. In this thesis we improve scheduling with task-based programming models by providing three novel task schedulers for AMCs. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with an existing state-of the art heterogeneous scheduler and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45x on a real 8-core AMC and up to 2.1x on a simulated 32-core AMC. Another enhancement we provide in task-based programming models is the adaptability to fine grained parallelism. The increasing number of cores on modern CMPs is pushing research towards the use of fine grained workloads, which is an important challenge for task-based programming models. Our study makes the observation that task creation becomes a bottleneck when executing fine grained workloads with task-based programming models. As the number of cores increases, the time spent generating tasks is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX minimizes task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. From our evaluation using 11 HPC workloads on both symmetric and AMC systems, we obtain performance improvements up to 15x, averaging to 3.1x over the baseline. Finally, this thesis presents a showcase for a real-time CPU scheduler with the goal to increase the frames per second (FPS) of the game-play on mobile devices with AMC systems. We design and implement the RTS scheduler in the Android framework. RTS provides an efficient scheduling policy that takes into account the current temperature of the system to perform task migration. RTS solution increases the median FPS of the baseline mechanisms by up to 7.5% and at the same time it maintains temperature stable.Los procesadores multinúcleos asimétricos (AMC) son una solución arquitectónica exitosa para dispositivos móviles y supercomputadores. Estas arquitecturas combinan diferentes tipos de núcleos de procesamiento diseñados con diferentes propiedades de rendimiento y potencia. Al mantener dos o más tipos de núcleos, los AMCs pueden proporcionar un alto rendimiento con un consumo bajo de energía de las infraestructuras. Sin embargo, existen importantes desafíos al usar los AMC, como la programación y el equilibrio de carga. Esta tesis explora inicialmente el potencial de los AMC al ejecutar aplicaciones actuales de Computacion de Alto Rendimiento (HPC) y busca el modelo de ejecución más apropiado para ellas. Específicamente evaluamos varios modelos de ejecución en un procesador asimétrico Arm big.LITTLE utilizando las aplicaciones PARSEC que son aplicaciones representativas de HPC. En este trabajo se compara la programación en los niveles de usuario, sistema operativo y librería y evaluamos el impacto de estas opciones en el conocido problema de equilibrar la carga entre los AMCs. Nuestros resultados demuestran que la programación es más efectiva cuando se lleva a cabo en el nivel del runtime, ya que mejora la programación del nivel de usuario en un 23%, mientras que la solución de programación del sistema operativo heterogéneo mejora la programación del nivel de usuario en un 10%. Siguiendo este resultado, esta tesis se centra en aumentar el rendimiento de los sistemas AMC mejorando la programación al nivel de librería. La programación en este nivel se proporciona mediante el uso de Modelos de Programación Paralelos Basados en Tareas (MPBT). Estos modelos de programación ofrecen flexibilidad de programación, ya que consisten en una interfaz y un runtime para administrar los recursos e hilos subyacentes. En esta tesis, mejoramos la programación con MPBT al proporcionar tres nuevos planificadores de tareas para AMCs. Estos planificadores dinámicos reducen el tiempo total de ejecución ya sea detectando la camino más largo o el camino crítico del grafo de dependencia de tareas de la aplicación, que es generado dinámicamente. En nuestra evaluación, comparamos estos planificadores con un planificador heterogéneo existente y demonstramos su mejora sobre un planificador FIFO. Mostramos que los planificadores heterogéneos mejoran el planificador FIFO en hasta 1.45x en un AMC real de 8 núcleos y hasta 2.1x en un AMC simulado de 32 núcleos. Otra contribución en los MPBT es la adaptabilidad al paralelismo de grano fino. El creciente número de núcleos en los chip multinúcleos modernos está empujando la investigación hacia el uso de cargas de trabajo de grano fino, que es un desafío importante para los MPBT. Nuestro estudio observa que la creación de tareas bloquea la ejecución con cargas de trabajo de grano fino con MPBT. Cuando el número de núcleos aumenta, el tiempo empleado en generar tareas pasa a ser más crítico para toda la ejecución. Nuestra solución es TaskGenX, que minimiza los costes de creación de tareas y se basa en una extensión del runtime y en un hardware dedicado. En el runtime, TaskGenX desacopla la creación de tareas de las otras actividades del runtime, ejecutando esta actividad en un hardware especializado. Evaluamos 11 aplicaciones de HPC con TaskGenX en sistemas simétricos y AMC y obtenemos mejoras de rendimiento de hasta 15x, con un promedio de 3.1x sobre la implementación de referencia. Finalmente, esta tesis presenta un planificador de CPU con el objetivo de aumentar los fotogramas por segundo (FPS) para juegos en dispositivos móviles con sistemas AMC. Diseñamos e implementamos el planificador de Real-Time Scheduler (RTS) en Android. El RTS proporciona una política de programación eficiente que tiene en cuenta la temperatura actual del sistema para realizar la migración de tareas. La solución RTS aumenta la FPS mediana de los mecanismos de referenciaPostprint (published version

    A mathematical programming approach for resource allocation of data analysis workflows on heterogeneous clusters

    Get PDF
    Scientific communities are motivated to schedule their large-scale data analysis workflows in heterogeneous cluster environments because of privacy and financial issues. In such environments containing considerably diverse resources, efficient resource allocation approaches are essential for reaching high performance. Accordingly, this research addresses the scheduling problem of workflows with bag-of-task form to minimize total runtime (makespan). To this aim, we develop a mixed-integer linear programming model (MILP). The proposed model contains binary decision variables determining which tasks should be assigned to which nodes. Also, it contains linear constraints to fulfill the tasks requirements such as memory and scheduling policy. Comparative results show that our approach outperforms related approaches in most cases. As part of the post-optimality analysis, some secondary preferences are imposed on the proposed model to obtain the most preferred optimal solution. We analyze the relaxation of the makespan in the hope of significantly reducing the number of consumed nodes

    A Framework for Flexible Loads Aggregation

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Smart handoff technique for internet of vehicles communication using dynamic edge-backup node

    Get PDF
    © 2020 The Authors. Published by MDPI. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.3390/electronics9030524A vehicular adhoc network (VANET) recently emerged in the the Internet of Vehicles (IoV); it involves the computational processing of moving vehicles. Nowadays, IoV has turned into an interesting field of research as vehicles can be equipped with processors, sensors, and communication devices. IoV gives rise to handoff, which involves changing the connection points during the online communication session. This presents a major challenge for which many standardized solutions are recommended. Although there are various proposed techniques and methods to support seamless handover procedure in IoV, there are still some open research issues, such as unavoidable packet loss rate and latency. On the other hand, the emerged concept of edge mobile computing has gained crucial attention by researchers that could help in reducing computational complexities and decreasing communication delay. Hence, this paper specifically studies the handoff challenges in cluster based handoff using new concept of dynamic edge-backup node. The outcomes are evaluated and contrasted with the network mobility method, our proposed technique, and other cluster-based technologies. The results show that coherence in communication during the handoff method can be upgraded, enhanced, and improved utilizing the proposed technique.Published onlin

    Alternative Processor within Threshold: Flexible Scheduling on Heterogeneous Systems

    Get PDF
    Computing systems have become increasingly heterogeneous contributing to higher performance and power efficiency. However, this is at the cost of increasing the overall complexity of designing such systems. One key challenge in the design of heterogeneous systems is the efficient scheduling of computational load. To address this challenge, this paper thoroughly analyzes state of the art scheduling policies and proposes a new dynamic scheduling heuristic: Alternative Processor within Threshold (APT). This heuristic uses a flexibility factor to attain efficient usage of the available hardware resources, taking advantage of the degree of heterogeneity of the system. In a GPU-CPU-FPGA system, tested on workloads with and without data dependencies, this approach improved overall execution time by 16% and 18% when compared to the second-best heuristic
    corecore