Search CORE

73 research outputs found

Modular Learning and Optimization for Planning of Discrete Event Systems

Author: Hagebring Fredrik
Publication venue
Publication date: 01/01/2021
Field of study

Optimization of industrial processes, such as manufacturing cells, can have great impact on their performance. Finding optimal solutions to these large-scale systems is, however, a complex problem. They typically include multiple subsystems, and the search space generally grows exponentially with each subsystem. This is usually referred to as the state explosion problem and is a well-known problem within the control and optimization of automation systems. This thesis proposes two main contributions to improve and to simplify the optimization of these systems. The first is a new method of solving these optimization problems using a compositional optimization approach. This integrates optimization with techniques from compositional supervisory control using modular formal models, dividing the optimization of subsystems into separate subproblems. The second is a modular learning approach that alleviates the need for prior knowledge of the systems and system experts when applying compositional optimization. The key to both techniques is the division of the large system into smaller subsystems and the identification of local behavior in these subsystems, i.e. behavior that is independent of all other subsystems. It is proven in this thesis that this local behavior can be partially optimized individually without affecting the global optimal solution. This is used to reduce the state space in each subsystem, and to construct the global optimal solution compositionally.The thesis also shows that the proposed techniques can be integrated to compute global optimal solutions to large-scale optimization problems, too big to solve based on traditional monolithic models

Chalmers Research

A Concurrency and Time Centered Framework for Certification of Autonomous Space Systems

Author: Dechev Damian
Publication venue
Publication date
Field of study

Future space missions, such as Mars Science Laboratory, suggest the engineering of some of the most complex man-rated autonomous software systems. The present process-oriented certification methodologies are becoming prohibitively expensive and do not reach the level of detail of providing guidelines for the development and validation of concurrent software. Time and concurrency are the most critical notions in an autonomous space system. In this work we present the design and implementation of the first concurrency and time centered framework for product-oriented software certification of autonomous space systems. To achieve fast and reliable concurrent interactions, we define and apply the notion of Semantically Enhanced Containers (SEC). SECs are data structures that are designed to provide the flexibility and usability of the popular ISO C++ STL containers, while at the same time they are hand-crafted to guarantee domain-specific policies, such as conformance to a given concurrency model. The application of nonblocking programming techniques is critical to the implementation of our SEC containers. Lock-free algorithms help avoid the hazards of deadlock, livelock, and priority inversion, and at the same time deliver fast and scalable performance. Practical lock-free algorithms are notoriously difficult to design and implement and pose a number of hard problems such as ABA avoidance, high complexity, portability, and meeting the linearizability correctness requirements. This dissertation presents the design of the first lock-free dynamically resizable array. Our approach o ers a set of practical, portable, lock-free, and linearizable STL vector operations and a fast and space effcient implementation when compared to the alternative lock- and STM-based techniques. Currently, the literature does not offer an explicit analysis of the ABA problem, its relation to the most commonly applied nonblocking programming techniques, and the possibilities for its detection and avoidance. Eliminating the hazards of ABA is left to the ingenuity of the software designer. We present a generic and practical solution to the fundamental ABA problem for lock-free descriptor-based designs. To enable our SEC container with the property of validating domain-specific invariants, we present Basic Query, our expression template-based library for statically extracting semantic information from C++ source code. The use of static analysis allows for a far more efficient implementation of our nonblocking containers than would have been otherwise possible when relying on the traditional run-time based techniques. Shared data in a real-time cyber-physical system can often be polymorphic (as is the case with a number of components part of the Mission Data System's Data Management Services). The use of dynamic cast is important in the design of autonomous real-time systems since the operation allows for a direct representation of the management and behavior of polymorphic data. To allow for the application of dynamic cast in mission critical code, we validate and improve a methodology for constant-time dynamic cast that shifts the complexity of the operation to the compiler's static checker. In a case study that demonstrates the applicability of the programming and validation techniques of our certification framework, we show the process of verification and semantic parallelization of the Mission Data System's (MDS) Goal Networks. MDS provides an experimental platform for testing and development of autonomous real-time flight applications

Texas A&M Repository

Runtime support for load balancing of parallel adaptive and irregular applications

Author: Barker Kevin James
Publication venue: W&M ScholarWorks
Publication date: 01/01/2004
Field of study

Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform

College of William & Mary: W&M Publish

Scalable Synchronization with Mindicators

Author: Liu Yujie
McNamara Logan
Publication venue: Lehigh Preserve
Publication date
Field of study

The Mindicator is a shared object that stores one value for each thread in a system, and can return the minimum of all thread’s values in constant time. In this paper, we explore applications of the Mindicator in synchronization algorithms. We introduce three new algorithms, designed for scalable Read-Copy-Update (RCU), fair Readers-Writer locking, and Group Mutual Exclusion. Experimental evaluation shows these algorithms to perform well while avoiding contention

Lehigh University: Lehigh Preserve

The Paragraph: Design and Implementation of the STAPL Parallel Task Graph

Author: Thomas Nathan 1977-
Publication venue
Publication date: 11/01/2021
Field of study

Parallel programming is becoming mainstream due to the increased availability of multiprocessor and multicore architectures and the need to solve larger and more complex problems. Languages and tools available for the development of parallel applications are often difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to help programmers address these difficulties. STAPL is a parallel C++ library with functionality similar to STL, the ISO adopted C++ Standard Template Library. STAPL provides a collection of parallel pContainers for data storage and pViews that provide uniform data access operations by abstracting away the details of the pContainer data distribution. Generic pAlgorithms are written in terms of PARAGRAPHs, high level task graphs expressed as a composition of common parallel patterns. These task graphs define a set of operations on pViews as well as any ordering (i.e., dependences) on these operations that must be enforced by STAPL for a valid execution. The subject of this dissertation is the PARAGRAPH Executor, a framework that manages the runtime instantiation and execution of STAPL PARAGRAPHS. We address several challenges present when using a task graph program representation and discuss a novel approach to dependence specification which allows task graph creation and execution to proceed concurrently. This overlapping increases scalability and reduces the resources required by the PARAGRAPH Executor. We also describe the interface for task specification as well as optimizations that address issues such as data locality. We evaluate the performance of the PARAGRAPH Executor on several parallel machines including massively parallel Cray XT4 and Cray XE6 systems and an IBM Power5 cluster. Using tests including generic parallel algorithms, kernels from the NAS NPB suite, and a nuclear particle transport application written in STAPL, we demonstrate that the PARAGRAPH Executor enables STAPL to exhibit good scalability on more than

10^4

processors

Texas A&M Repository

Scaling finite difference methods in large eddy simulation of jet engine noise to the petascale: numerical methods and their efficient and automated implementation

Author: Situ Yingchong
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

Reduction of jet engine noise has recently become a new arena of competition between aircraft manufacturers. As a relatively new field of research in computational fluid dynamics (CFD), computational aeroacoustics (CAA) prediction of jet engine noise based on large eddy simulation (LES) is a robust and accurate tool that complements the existing theoretical and experimental approaches. In order to satisfy the stringent requirements of CAA on numerical accuracy, finite difference methods in LES-based jet engine noise prediction rely on the implicitly formulated compact spatial partial differentiation and spatial filtering schemes, a crucial component of which is an embedded solver for tridiagonal linear systems spatially oriented along the three coordinate directions of the computational space. Traditionally, researchers and engineers in CAA have employed manually crafted implementations of solvers including the transposition method, the multiblock method and the Schur complement method. Algorithmically, these solvers force a trade-off between numerical accuracy and parallel scalability. Programmingwise, implementing them for each of the three coordinate directions is tediously repetitive and error-prone. ^ In this study, we attempt to tackle both of these two challenges faced by researchers and engineers. We first describe an accurate and scalable tridiagonal linear system solver as a specialization of the truncated SPIKE algorithm and strategies for efficient implementation of the compact spatial partial differentiation and spatial filtering schemes. We then elaborate on two programming models tailored for composing regular grid-based numerical applications including finite difference-based LES of jet engine noise, one based on generalized elemental subroutines and the other based on functional array programming, and the accompanying code optimization and generation methodologies. Through empirical experiments, we demonstrate that truncated SPIKE-based spatial partial differentiation and spatial filtering deliver the theoretically promised optimal scalability in weak scaling conditions and can be implemented using the two programming models with performance on par with handwritten code while significantly reducing the required programming effort

Purdue E-Pubs

Optimización del rendimiento y la eficiencia energética en sistemas masivamente paralelos

Author: Nozal Raúl
Publication venue
Publication date: 21/01/2022
Field of study

RESUMEN Los sistemas heterogéneos son cada vez más relevantes, debido a sus capacidades de rendimiento y eficiencia energética, estando presentes en todo tipo de plataformas de cómputo, desde dispositivos embebidos y servidores, hasta nodos HPC de grandes centros de datos. Su complejidad hace que sean habitualmente usados bajo el paradigma de tareas y el modelo de programación host-device. Esto penaliza fuertemente el aprovechamiento de los aceleradores y el consumo energético del sistema, además de dificultar la adaptación de las aplicaciones. La co-ejecución permite que todos los dispositivos cooperen para computar el mismo problema, consumiendo menos tiempo y energía. No obstante, los programadores deben encargarse de toda la gestión de los dispositivos, la distribución de la carga y la portabilidad del código entre sistemas, complicando notablemente su programación. Esta tesis ofrece contribuciones para mejorar el rendimiento y la eficiencia energética en estos sistemas masivamente paralelos. Se realizan propuestas que abordan objetivos generalmente contrapuestos: se mejora la usabilidad y la programabilidad, a la vez que se garantiza una mayor abstracción y extensibilidad del sistema, y al mismo tiempo se aumenta el rendimiento, la escalabilidad y la eficiencia energética. Para ello, se proponen dos motores de ejecución con enfoques completamente distintos. EngineCL, centrado en OpenCL y con una API de alto nivel, favorece la máxima compatibilidad entre todo tipo de dispositivos y proporciona un sistema modular extensible. Su versatilidad permite adaptarlo a entornos para los que no fue concebido, como aplicaciones con ejecuciones restringidas por tiempo o simuladores HPC de dinámica molecular, como el utilizado en un centro de investigación internacional. Considerando las tendencias industriales y enfatizando la aplicabilidad profesional, CoexecutorRuntime proporciona un sistema flexible centrado en C++/SYCL que dota de soporte a la co-ejecución a la tecnología oneAPI. Este runtime acerca a los programadores al dominio del problema, posibilitando la explotación de estrategias dinámicas adaptativas que mejoran la eficiencia en todo tipo de aplicaciones.ABSTRACT Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator utilization and system energy consumption, as well as making it difficult to adapt applications. Co-execution allows all devices to simultaneously compute the same problem, cooperating to consume less time and energy. However, programmers must handle all device management, workload distribution and code portability between systems, significantly complicating their programming. This thesis offers contributions to improve performance and energy efficiency in these massively parallel systems. The proposals address the following generally conflicting objectives: usability and programmability are improved, while ensuring enhanced system abstraction and extensibility, and at the same time performance, scalability and energy efficiency are increased. To achieve this, two runtime systems with completely different approaches are proposed. EngineCL, focused on OpenCL and with a high-level API, provides an extensible modular system and favors maximum compatibility between all types of devices. Its versatility allows it to be adapted to environments for which it was not originally designed, including applications with time-constrained executions or molecular dynamics HPC simulators, such as the one used in an international research center. Considering industrial trends and emphasizing professional applicability, CoexecutorRuntime provides a flexible C++/SYCL-based system that provides co-execution support for oneAPI technology. This runtime brings programmers closer to the problem domain, enabling the exploitation of dynamic adaptive strategies that improve efficiency in all types of applications.Funding: This PhD has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and PID2019-105660RB-C22. This work has also been partially supported by the Mont-Blanc 3: European Scalable and Power Efficient HPC Platform based on Low-Power Embedded Technology project (G.A. No. 671697) from the European Union’s Horizon 2020 Research and Innovation Programme (H2020 Programme). Some activities have also been funded by the Spanish Science and Technology Commission under contract TIN2016-81840-REDT (CAPAP-H6 network). The Integration II: Hybrid programming models of Chapter 4 has been partially performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme. In particular, the author gratefully acknowledges the support of the SPMT Department of the High Performance Computing Center Stuttgart (HLRS)

UCrea

Sistema de comunicação sem fios de suporte à monitorização ambiental

Author: Moura Tatiana Filipa Gomes
Publication venue
Publication date: 18/12/2018
Field of study

Poor indoor air quality in classrooms can lead to decreased students’ performance, and affect the health and comfort of the occupants. The purpose of this dissertation is to deploy a system for environmental monitoring support through wireless communications technologies and long range networks. The prototype developed allows to collect continuous measurement of temperature, relative humidity, Volatile Organic Compounds (VOC), air pressure, oxygen and carbon dioxide. Evaluations were done using LoRaWAN protocol in selected classrooms during the winter semester at University of Aveiro. It demonstrates how to collect, integrate, analyse, and visualize real-time air quality data collected.A má qualidade do ar no interior das salas de aula pode levar à diminuição do desempenho dos alunos, uma vez que a qualidade do ar é um factor fundamental a ser controlado para garantir a saúde e o conforto dos ocupantes. Esta dissertação tem como objectivo desenvolver um sistema de suporte à monitorização ambiental através de tecnologias de comunicação sem fios e de redes de longo alcance. O protótipo desenvolvido permite recolher medições contínuas de temperatura, humidade relativa, Compostos Orgânicos Voláteis (VOC), pressão do ar, oxigénio e dióxido de carbono. Foram realizados testes em salas de aulas selecionadas durante o semestre de inverno na Universidade de Aveiro usando o protocolo LoRaWAN. É demonstrado como recolher, integrar, analisar e visualizar em tempo real os dados obtidos.Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro

Fast Internet-Wide Scanning: A New Security Perspective

Author: Durumeric Zakir
Publication venue
Publication date
Field of study

Techniques like passive observation and random sampling let researchers understand many aspects of Internet day-to-day operation, yet these methodologies often focus on popular services or a small demographic of users, rather than providing a comprehensive view of the devices and services that constitute the Internet. As the diversity of devices and the role they play in critical infrastructure increases, so does understanding the dynamics of and securing these hosts. This dissertation shows how fast Internet-wide scanning provides a near-global perspective of edge hosts that enables researchers to uncover security weaknesses that only emerge at scale. First, I show that it is possible to efficiently scan the IPv4 address space. ZMap: a network scanner specifically architected for large-scale research studies can survey the entire IPv4 address space from a single machine in under an hour at 97% of the theoretical maximum speed of gigabit Ethernet with an estimated 98% coverage of publicly available hosts. Building on ZMap, I introduce Censys, a public service that maintains up-to-date and legacy snapshots of the hosts and services running across the public IPv4 address space. Censys enables researchers to efficiently ask a range of security questions. Next, I present four case studies that highlight how Internet-wide scanning can identify new classes of weaknesses that only emerge at scale, uncover unexpected attacks, shed light on previously opaque distributed systems on the Internet, and understand the impact of consequential vulnerabilities. Finally, I explore how in- creased contention over IPv4 addresses introduces new challenges for performing large-scale empirical studies. I conclude with suggested directions that the re- search community needs to consider to retain the degree of visibility that Internet-wide scanning currently provides.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138660/1/zakir_1.pd

Deep Blue Documents at the University of Michigan