1,008 research outputs found

    Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

    Get PDF
    Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower

    NPEFix: Automatic Runtime Repair of Null Pointer Exceptions in Java

    Full text link
    Null pointer exceptions, also known as null dereferences are the number one exceptions in the field. In this paper, we propose 9 alternative execution semantics when a null pointer exception is about to happen. We implement those alternative execution strategies using code transformation in a tool called NPEfix. We evaluate our prototype implementation on 11 field null dereference bugs and 519 seeded failures and show that NPEfix is able to repair at runtime 10/11 and 318/519 failures

    Holistic debugging - enabling instruction set simulation for software quality assurance

    Get PDF
    We present holistic debugging, a novel method for observing execution of complex and distributed software. It builds on an instruction set simulator, which provides reproducible experiments and non-intrusive probing of state in a distributed system. Instruction set simulators, however, only provide low-level information, so a holistic debugger contains a translation framework that maps this information to higher abstraction level observation tools, such as source code debuggers. We have created Nornir, a proof-of-concept holistic debugger, built on the simulator Simics. For each observed process in the simulated system, Nornir creates an abstraction translation stack, with virtual machine translators that map machine-level storage contents (e.g. physical memory, registers) provided by Simics, to application-level data (e.g. virtual memory contents) by parsing the data structures of operating systems and virtual machines. Nornir includes a modified version of the GNU debugger (GDB), which supports non-intrusive symbolic debugging of distributed applications. Nornir's main interface is a debugger shepherd, a programmable interface that controls multiple debuggers, and allows users to coherently inspect the entire state of heterogeneous, distributed applications. It provides a robust observation platform for construction of new observation tools

    Monatomic phase change memory

    Full text link
    Phase change memory has been developed into a mature technology capable of storing information in a fast and non-volatile way, with potential for neuromorphic computing applications. However, its future impact in electronics depends crucially on how the materials at the core of this technology adapt to the requirements arising from continued scaling towards higher device densities. A common strategy to finetune the properties of phase change memory materials, reaching reasonable thermal stability in optical data storage, relies on mixing precise amounts of different dopants, resulting often in quaternary or even more complicated compounds. Here we show how the simplest material imaginable, a single element (in this case, antimony), can become a valid alternative when confined in extremely small volumes. This compositional simplification eliminates problems related to unwanted deviations from the optimized stoichiometry in the switching volume, which become increasingly pressing when devices are aggressively miniaturized. Removing compositional optimization issues may allow one to capitalize on nanosize effects in information storage

    Functional programming languages for verification tools: experiences with ML and Haskell

    Get PDF
    We compare Haskell with ML as programming languages for verification tools, based on our experience developing TRUTH in Haskell and the Edinburgh Concurrency Workbench (CWB) in ML. We discuss not only technical language features but also the "worlds" of the languages, for example, the availability of tools and libraries

    Outlier Mining Methods Based on Graph Structure Analysis

    Get PDF
    Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version

    Automatic aggregation of subtask accesses for nested OpenMP-style tasks

    Get PDF
    Task-based programming is a high performance and productive model to express parallelism. Tasks encapsulate work to be executed across multiple cores or offloaded to GPUs, FPGAs, other accelerators or other nodes. In order to maintain parallelism and afford maximum freedom to the scheduler, the task dependency graph should be created in parallel and well in advance of task execution. A key limitation with OpenMP and OmpSs-2 tasking is that a task cannot be created until all its accesses and its descendents' accesses are known. Current approaches to work around this limitation either stop task creation and execution using a taskwait or they substitute “fake” accesses known as sentinels. This paper proposes the auto clause, which indicates that the task may create subtasks that access unspecified memory regions or it may allocate and return memory at addresses that are of course not yet known. Unlike approaches using taskwaits, there is no interruption to the concurrent creation and execution of tasks, maintaining parallelism and the scheduler's ability to optimize load balance and data locality. Unlike existing approaches using sentinels, all tasks can be given a precise specification of their own data accesses, so that a single mechanism is used to control task ordering, program data transfers on distributed memory and optimize data locality, e.g. on NUMA systems. The auto clause also provides an incremental path to develop programs with nested tasks, by removing the need for every parent task to have a complete specification of the accesses of its descendent tasks. This is redundant information that can be time consuming and error-prone to describe. We present a straightforward runtime implementation that achieves a 1.4 times speedup for n-body with OmpSs-2@Cluster task offloading to 32 nodes and <4% slowdown for three benchmarks with task offloading to 8 nodes. All code is open source.This research has received funding from the European Unions Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP- SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB- C21/MCIN/AEI/10.13039/501100011033 and Ramon y Cajal fellowship RYC2018-025628-I/MCIN/AEI/ 10.13039/501100011033 and by “ESF Investing in your future”), as well as by the Generalitat de Catalunya (2017-SGR-1414).Peer ReviewedPostprint (author's final draft
    corecore