210 research outputs found

    Towards Zero-Overhead Disambiguation of Deep Priority Conflicts

    Full text link
    **Context** Context-free grammars are widely used for language prototyping and implementation. They allow formalizing the syntax of domain-specific or general-purpose programming languages concisely and declaratively. However, the natural and concise way of writing a context-free grammar is often ambiguous. Therefore, grammar formalisms support extensions in the form of *declarative disambiguation rules* to specify operator precedence and associativity, solving ambiguities that are caused by the subset of the grammar that corresponds to expressions. **Inquiry** Implementing support for declarative disambiguation within a parser typically comes with one or more of the following limitations in practice: a lack of parsing performance, or a lack of modularity (i.e., disallowing the composition of grammar fragments of potentially different languages). The latter subject is generally addressed by scannerless generalized parsers. We aim to equip scannerless generalized parsers with novel disambiguation methods that are inherently performant, without compromising the concerns of modularity and language composition. **Approach** In this paper, we present a novel low-overhead implementation technique for disambiguating deep associativity and priority conflicts in scannerless generalized parsers with lightweight data-dependency. **Knowledge** Ambiguities with respect to operator precedence and associativity arise from combining the various operators of a language. While *shallow conflicts* can be resolved efficiently by one-level tree patterns, *deep conflicts* require more elaborate techniques, because they can occur arbitrarily nested in a tree. Current state-of-the-art approaches to solving deep priority conflicts come with a severe performance overhead. **Grounding** We evaluated our new approach against state-of-the-art declarative disambiguation mechanisms. By parsing a corpus of popular open-source repositories written in Java and OCaml, we found that our approach yields speedups of up to 1.73x over a grammar rewriting technique when parsing programs with deep priority conflicts--with a modest overhead of 1-2 % when parsing programs without deep conflicts. **Importance** A recent empirical study shows that deep priority conflicts are indeed wide-spread in real-world programs. The study shows that in a corpus of popular OCaml projects on Github, up to 17 % of the source files contain deep priority conflicts. However, there is no solution in the literature that addresses efficient disambiguation of deep priority conflicts, with support for modular and composable syntax definitions

    On Solving Solved Problems

    Get PDF
    Some problems are considered solved by the research community. But are they really and does that mean we should stop investigating them? In this essay, I argue that "solved" problems often only appear solved on the surface, while fundamental open research problems lurk below the surface. It requires dedicated researchers to discover those open problems by applying the existing solutions and putting them to the test

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Scalable system software for high performance large-scale applications

    Get PDF
    In the last decades, high-performance large-scale systems have been a fundamental tool for scientific discovery and engineering advances. The sustained growth of supercomputing performance and the concurrent reduction in cost have made this technology available for a large number of scientists and engineers working on many different problems. The design of next-generation supercomputers will include traditional HPC requirements as well as the new requirements to handle data-intensive computations. Data intensive applications will hence play an important role in a variety of fields, and are the current focus of several research trends in HPC. Due to the challenges of scalability and power efficiency, next-generation of supercomputers needs a redesign of the whole software stack. Being at the bottom of the software stack, system software is expected to change drastically to support the upcoming hardware and to meet new application requirements. This PhD thesis addresses the scalability of system software. The thesis start at the Operating System level: first studying general-purpose OS (ex. Linux) and then studying lightweight kernels (ex. CNK). Then, we focus on the runtime system: we implement a runtime system for distributed memory systems that includes many of the system services required by next-generation applications. Finally we focus on hardware features that can be exploited at user-level to improve applications performance, and potentially included into our advanced runtime system. The thesis contributions are the following: Operating System Scalability: We provide an accurate study of the scalability problems of modern Operating Systems for HPC. We design and implement a methodology whereby detailed quantitative information may be obtained for each OS noise event. We validate our approach by comparing it to other well-known standard techniques to analyze OS noise, such FTQ (Fixed Time Quantum). Evaluation of the address translation management for a lightweight kernel: we provide a performance evaluation of different TLB management approaches Âż dynamic memory mapping, static memory mapping with replaceable TLB entries, and static memory mapping with fixed TLB entries (no TLB misses) on a IBM BlueGene/P system. Runtime System Scalability: We show that a runtime system can efficiently incorporate system services and improve scalability for a specific class of applications. We design and implement a full-featured runtime system and programming model to execute irregular appli- cations on a commodity cluster. The runtime library is called Global Memory and Threading library (GMT) and integrates a locality-aware Partitioned Global Address Space communication model with a fork/join program structure. It supports massive lightweight multi-threading, overlapping of communication and computation and small messages aggregation to tolerate network latencies. We compare GMT to other PGAS models, hand-optimized MPI code and custom architectures (Cray XMT) on a set of large scale irregular applications: breadth first search, random walk and concurrent hash map access. Our runtime system shows performance orders of magnitude higher than other solutions on commodity clusters and competitive with custom architectures. User-level Scalability Exploiting Hardware Features: We show the high complexity of low-level hardware optimizations for single applications, as a motivation to incorporate this logic into an adaptive runtime system. We evaluate the effects of controllable hardware-thread priority mechanism that controls the rate at which each hardware-thread decodes instruction on IBM POWER5 and POWER6 processors. Finally, we show how to effectively exploits cache locality and network-on-chip on the Tilera many-core architecture to improve intra-core scalability

    Trumpeting M Dwarfs with CONCH-SHELL: a Catalog of Nearby Cool Host-Stars for Habitable ExopLanets and Life

    Get PDF
    We present an all-sky catalog of 2970 nearby (d≲50d \lesssim 50 pc), bright (J<9J< 9) M- or late K-type dwarf stars, 86% of which have been confirmed by spectroscopy. This catalog will be useful for searches for Earth-size and possibly Earth-like planets by future space-based transit missions and ground-based infrared Doppler radial velocity surveys. Stars were selected from the SUPERBLINK proper motion catalog according to absolute magnitudes, spectra, or a combination of reduced proper motions and photometric colors. From our spectra we determined gravity-sensitive indices, and identified and removed 0.2% of these as interloping hotter or evolved stars. Thirteen percent of the stars exhibit H-alpha emission, an indication of stellar magnetic activity and possible youth. The mean metallicity is [Fe/H] = -0.07 with a standard deviation of 0.22 dex, similar to nearby solar-type stars. We determined stellar effective temperatures by least-squares fitting of spectra to model predictions calibrated by fits to stars with established bolometric temperatures, and estimated radii, luminosities, and masses using empirical relations. Six percent of stars with images from integral field spectra are resolved doubles. We inferred the planet population around M dwarfs using KeplerKepler data and applied this to our catalog to predict detections by future exoplanet surveys.Comment: Accepted to MNRAS 22 figures, 3 tables, 2 electronic tables. Electronic tables are available as links on this pag

    A methodology for hardware-software codesign

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 150-156).Special purpose hardware is vital to embedded systems as it can simultaneously improve performance while reducing power consumption. The integration of special purpose hardware into applications running in software is difficult for a number of reasons. Some of the difficulty is due to the difference between the models used to program hardware and software, but great effort is also required to coordinate the simultaneous execution of the application running on the microprocessor with the accelerated kernel(s) running in hardware. To further compound the problem, current design methodologies for embedded applications require an early determination of the design partitioning which allows hardware and software to be developed simultaneously, each adhering to a rigid interface contract. This approach is problematic because often a good hardware-software decomposition is not known until deep into the design process. Fixed interfaces and the burden of reimplementation prevent the migration of functionality motivated by repartitioning. This thesis presents a two-part solution to the integration of special purpose hardware into applications running in software. The first part addresses the problem of generating infrastructure for hardware-accelerated applications. We present a methodology in which the application is represented as a dataflow graph and the computation at each node is specified for execution either in software or as specialized hardware using the programmer's language of choice. An interface compiler as been implemented which takes as input the FIFO edges of the graph and generates code to connect all the different parts of the program, including those which communicate across the hardware/software boundary. This methodology, which we demonstrate on an FPGA platform, enables programmers to effectively exploit hardware acceleration without ever leaving the application space. The second part of this thesis presents an implementation of the Bluespec Codesign Language (BCL) to address the difficulty of experimenting with hardware/software partitioning alternatives. Based on guarded atomic actions, BCL can be used to specify both hardware and low-level software. Based on Bluespec SystemVerilog (BSV) for which a hardware compiler by Bluespec Inc. is commercially available, BCL has been augmented with extensions to support more efficient software generation. In BCL, the programmer specifies the entire design, including the partitioning, allowing the compiler to synthesize efficient software and hardware, along with transactors for communication between the partitions. The benefit of using a single language to express the entire design is that a programmer can easily experiment with many different hardware/software decompositions without needing to re-write the application code. Used together, the BCL and interface compilers represent a comprehensive solution to the task of integrating specialized hardware into an application.by Myron King.Ph.D
    • …
    corecore