53 research outputs found
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Design and Code Optimization for Systems with Next-generation Racetrack Memories
With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market.
Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM .
This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation.
Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators
Logical methods for the hierarchy of hyperlogics
In this thesis, we develop logical methods for reasoning about hyperproperties. Hyperproperties describe relations between multiple executions of a system. Unlike trace properties, hyperproperties comprise relational properties like noninterference, symmetry, and robustness. While trace properties have been studied extensively, hyperproperties form a relatively new concept that is far from fully understood. We study the expressiveness of various hyperlogics and develop algorithms for their satisfiability and synthesis problems. In the first part, we explore the landscape of hyperlogics based on temporal logics, first-order and second-order logics, and logics with team semantics. We establish that first-order/second-order and temporal hyperlogics span a hierarchy of expressiveness, whereas team logics constitute a radically different way of specifying hyperproperties. Furthermore, we introduce the notion of temporal safety and liveness, from which we obtain fragments of HyperLTL (the most prominent hyperlogic) with a simpler satisfiability problem. In the second part, we develop logics and algorithms for the synthesis of smart contracts. We introduce two extensions of temporal stream logic to express (hyper)properties of infinite-state systems. We study the realizability problem of these logics and define approximations of the problem in LTL and HyperLTL. Based on these approximations, we develop algorithms to construct smart contracts directly from their specifications.In dieser Arbeit beschreiben wir logische Methoden, um ĂŒber Hypereigenschaften zu argumentieren. Hypereigenschaften beschreiben Relationen zwischen mehreren AusfĂŒhrungen eines Systems. Anders als pfadbasierte Eigenschaften können Hypereigenschaften relationale Eigenschaften wie Symmetrie, Robustheit und die Abwesenheit von Informationsfluss ausdrĂŒcken. WĂ€hrend pfadbasierte Eigenschaften in den letzten Jahrzehnten ausfĂŒhrlich erforscht wurden, sind Hypereigenschaften ein relativ neues Konzept, das wir noch nicht vollstĂ€ndig verstehen. Wir untersuchen die AusdrucksmĂ€chtigkeit verschiedener Hyperlogiken und entwickeln ausfĂŒhrbare Algorithmen, um deren ErfĂŒllbarkeits- und Syntheseproblem zu lösen. Im ersten Teil erforschen wir die Landschaft der Hyperlogiken basierend auf temporalen Logiken, Logiken erster und zweiter Ordnung und Logiken mit Teamsemantik. Wir stellen fest, dass temporale Logiken und Logiken erster und zweiter Ordnung eine Hierarchie an AusdrucksmĂ€chtigkeit aufspannen. Teamlogiken hingegen spezifieren Hypereigenschaften auf eine radikal andere Art. Wir fĂŒhren auĂerdem das Konzept von temporalen Sicherheits- und Lebendigkeitseigenschaften ein, durch die Fragmente der bedeutensten Logik HyperLTL entstehen, fĂŒr die das ErfĂŒllbarkeitsproblem einfacher ist. Im zweiten Teil entwickeln wir Logiken und Algorithmen fĂŒr die Synthese digitaler VertrĂ€ge. Wir fĂŒhren zwei Erweiterungen temporaler Stromlogik ein, um (Hyper)eigenschaften in unendlichen Systemen auszudrĂŒcken. Wir untersuchen das Realisierungsproblem dieser Logiken und definieren Approximationen des Problems in LTL und HyperLTL. Basierend auf diesen Approximationen entwickeln wir Algorithmen, die digitale VertrĂ€ge direkt aus einer Spezifikation erstellen
Control-Theoretical Perspective in Feedback-Based Systems Testing
Self-Adaptive Systems (SAS) and Cyber-Physical Systems (CPS) have received significant attention in recent computer engineering research. This is due to their ability to improve the level of autonomy of engineering artefacts. In both cases, this autonomy increase is achieved through feedback. Feedback is the iteration of sens- ing and actuation to respectively acquire knowledge about the current state of said artefacts and steer them toward a desired state or behaviour. In this thesis we dis- cuss the challenges that the introduction of feedback poses on the verification and validation process for such systems, more specifically, on their testing. We highlight three types of new challenges with respect to traditional software testing: alteration of testing input and output definition, and intertwining of components with different nature. Said challenges affect the ways we can define different elements of the test- ing process: coverage criteria, testing set-ups, test-case generation strategies, and oracles in the testing process. This thesis consists of a collection of three papers and contributes to the definition of each of the mentioned testing elements. In terms of coverage criteria for SAS, Paper I proposes the casting of the testing problem, to a semi-infinite optimisation problem. This allows to leverage the Scenario Theory from the field of robust control, and provide a worst-case probabilistic bound on a given performance metric of the system under test. For what concerns the definition of testing set-ups for control-based CPS, Paper II investigates the implications of the use of different abstractions (i.e., the use of implemented or emulated compo- nents) on the significance of the testing. The paper provides evidence that confutes the common assumption present in previous literature on the existence of a hierar- chy among commonly used testing set-ups. Finally, regarding the test-case gener- ation and oracle definition, Paper III defines the problem of stress testing control- based CPS software. We contribute to the generation and identification of stress test cases for such software by proposing a novel test case parametrisation. Leveraging the proposed parametrisation we define metamorphic relations on the expected be- haviour of the system under test. We use said relations for the development of stress testing approach and sanity checks on the testing results
Computer Aided Verification
This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications
Smart Wireless Sensor Networks
The recent development of communication and sensor technology results in the growth of a new attractive and challenging area - wireless sensor networks (WSNs). A wireless sensor network which consists of a large number of sensor nodes is deployed in environmental fields to serve various applications. Facilitated with the ability of wireless communication and intelligent computation, these nodes become smart sensors which do not only perceive ambient physical parameters but also be able to process information, cooperate with each other and self-organize into the network. These new features assist the sensor nodes as well as the network to operate more efficiently in terms of both data acquisition and energy consumption. Special purposes of the applications require design and operation of WSNs different from conventional networks such as the internet. The network design must take into account of the objectives of specific applications. The nature of deployed environment must be considered. The limited of sensor nodesïżœ resources such as memory, computational ability, communication bandwidth and energy source are the challenges in network design. A smart wireless sensor network must be able to deal with these constraints as well as to guarantee the connectivity, coverage, reliability and security of network's operation for a maximized lifetime. This book discusses various aspects of designing such smart wireless sensor networks. Main topics includes: design methodologies, network protocols and algorithms, quality of service management, coverage optimization, time synchronization and security techniques for sensor networks
Optimal program variant generation for hybrid manycore systems
Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked.
In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Service Boosters: Library Operating Systems For The Datacenter
Cloud applications are taking an increasingly important place our technology and economic landscape. Consequently, they are subject to stringent performance requirements. High tail latency â percentiles at the tail of the response time distribution â is a threat to these requirements. As little as 0.01% slow requests in one microservice can significantly degrade performance for the entire application. The conventional wisdom is that application-awareness is crucial to design optimized performance management systems, but comes at the cost of maneuverability. Consequently, existing execution environments are often general-purpose and ignore important application features such as the architecture of request processing pipelines or the type of requests being served. These one-size-fits-all solutions are missing crucial information to identify and remove sources of high tail latency. This thesis aims to develop a lightweight execution environment exploiting application semantics to optimize tail performance for cloud services. This system, dubbed Service Boosters, is a library operating system exposing application structure and semantics to the underlying resource management stack. Using Service Boosters, programmers use a generic programming model to build, declare and an-notate their request processing pipeline, while performance engineers can program advanced management strategies. Using Service Boosters, I present three systems, FineLame, PersĂ©phone, and DeDoS, that exploit application awareness to provide real time anomaly detection; tail-tolerant RPC scheduling; and resource harvesting. FineLame leverages awareness of the request processing pipeline to deploy monitoring and anomaly detection probes. Using these, FineLame can detect abnormal requests in-flight whenever they depart from the expected behavior and alerts other resource management modules. Pers Ìephone exploits an understanding of request types to dynamically allocate resources to each type and forbid pathological head-of-line blocking from heavy-tailed workloads, without the need for interrupts. Pers Ìephone is a low overhead solution well suited for microsecond scale workloads. Finally, DeDoS can identify overloaded components and dynamically scale them, harvesting only the resources needed to quench the overload. Service Boosters is a powerful framework to handle tail latency in the datacenter. Service Boosters clearly separates the roles of application development and performance engineering, proposing a general purpose application programming model while enabling the development of specialized resource management modules such as PersĂ©phone and DeDoS
Advances in ILP-based Modulo Scheduling for High-Level Synthesis
In today's heterogenous computing world, field-programmable gate arrays (FPGA) represent the energy-efficient alternative to generic processor cores and graphics accelerators. However, due to their radically different computing model, automatic design methods, such as high-level synthesis (HLS), are needed to harness their full power. HLS raises the abstraction level to behavioural descriptions of algorithms, thus freeing designers from dealing with tedious low-level concerns, and enabling a rapid exploration of different microarchitectures for the same input specification. In an HLS tool, scheduling is the most influential step for the performance of the generated accelerator. Specifically, modulo schedulers enable a pipelined execution, which is a key technique to speed up the computation by extracting more parallelism from the input description.
In this thesis, we make a case for the use of integer linear programming (ILP) as a framework for modulo scheduling approaches.
First, we argue that ILP-based modulo schedulers are practically usable in the HLS context. Secondly, we show that the ILP framework enables a novel approach for the automatic design of FPGA accelerators.
We substantiate the first claim by proposing a new, flexible ILP formulation for the modulo scheduling problem, and evaluate it experimentally with a diverse set of realistic test instances. While solving an ILP may incur an exponential runtime in the worst case, we observe that simple countermeasures, such as setting a time limit, help to contain the practical impact of outlier instances. Furthermore, we present an algorithm to compress problems before the actual scheduling.
An HLS-generated microarchitecture is comprised of operators, i.e. single-purpose functional units such as a floating-point multiplier. Usually, the allocation of operators is determined before scheduling, even though both problems are interdependent. To that end, we investigate an extension of the modulo scheduling problem that combines both concerns in a single model. Based on the extension, we present a novel multi-loop scheduling approach capable of finding the fastest microarchitecture that still fits on a given FPGA device - an optimisation problem that current commercial HLS tools cannot solve. This proves our second claim
- âŠ