3,739 research outputs found

    Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

    Get PDF
    Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower

    Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism

    Get PDF
    Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallellism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency.related primitives which take case of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary esperiments show thay the performance safcrifieced is reasonable, although granularity of unrestricted parallelism contributes to better observed speedups

    A Type-Safe Model of Adaptive Object Groups

    Full text link
    Services are autonomous, self-describing, technology-neutral software units that can be described, published, discovered, and composed into software applications at runtime. Designing software services and composing services in order to form applications or composite services requires abstractions beyond those found in typical object-oriented programming languages. This paper explores service-oriented abstractions such as service adaptation, discovery, and querying in an object-oriented setting. We develop a formal model of adaptive object-oriented groups which offer services to their environment. These groups fit directly into the object-oriented paradigm in the sense that they can be dynamically created, they have an identity, and they can receive method calls. In contrast to objects, groups are not used for structuring code. A group exports its services through interfaces and relies on objects to implement these services. Objects may join or leave different groups. Groups may dynamically export new interfaces, they support service discovery, and they can be queried at runtime for the interfaces they support. We define an operational semantics and a static type system for this model of adaptive object groups, and show that well-typed programs do not cause method-not-understood errors at runtime.Comment: In Proceedings FOCLASA 2012, arXiv:1208.432

    Time series analysis acceleration with advanced vectorization extensions

    Get PDF
    Time series analysis is an important research topic and a key step in monitoring and predicting events in many fields. Recently, the Matrix Profile method, and particularly two of its Euclidean-distance-based implementations—SCRIMP and SCAMP—have become the state-of-the-art approaches in this field. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profile is embarrassingly parallelizable, we find that auto-vectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the auto-vectorization.This work has been supported by the Government of Spain under project PID2019-105396RB-I00, and Junta de Andalucía under projects P18-FR-3433, and UMA18-FEDERJA-197.Peer ReviewedPostprint (published version

    The Raincore Distributed Session Service for Networking Elements

    Get PDF
    Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer protocols for networking elements. These protocols are designed to enable a network cluster to share the state information necessary for balancing network traffic and computation load among a group of networking elements. In addition, in the presence of failures, they allow network traffic to fail-over from failed networking elements to healthy ones. To maximize the overall network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals

    Time series analysis acceleration with advanced vectorization extensions

    Get PDF
    Time series analysis is an important research topic and a key step in monitoring and predicting events in many felds. Recently, the Matrix Profle method, and particularly two of its Euclidean-distance-based implementations—SCRIMP and SCAMP—have become the state-of-the-art approaches in this feld. Those algorithms bring the possibility of obtaining exact motifs and discords from a time series, which can be used to infer events, predict outcomes, detect anomalies and more. While matrix profle is embarrassingly parallelizable, we fnd that auto-vectorization techniques fail to fully exploit the SIMD capabilities of modern CPU architectures. In this paper, we develop custom-vectorized SCRIMP and SCAMP implementations based on AVX2 and AVX-512 extensions, which we combine with multithreading techniques aimed at exploiting the potential of the underneath architectures. Our experimental evaluation, conducted using real data, shows a performance improvement of more than 4× with respect to the auto-vectorization.Funding for open access publishing: Universidad Málaga/CBU

    Exploring multiprocessor approaches to time series analysis

    Get PDF
    A time series is a chronologically ordered set of samples of a real-valued variable that can have millions of observations. Time series analysis seeks extracting models in a large variety of domains [31] such as epidemiology, DNA analysis, economics, geophysics, speech recognition, etc. Particularly, motif [4] (similarity) and discord [13] (anomaly) discovery has become one of the most frequently used primitives in time series data mining [20], [2], [32], [7], [34], [1]. It poses the problem of solving the all-pairs-similarity-search (also known as similarity join). Specifically, given a time series broken down into subsequences, retrieve the most similar subsequences (motifs) and the most different ones (discords). One of the state-of-the-art methods for motif and discord discovery is Matrix Profile [35]. It solves the similarity join problem and allows time-manageable computation of very large time series. In this work, we focus on this technique, which features the possibility of detecting similarities, anomalies, and predicting outcomes. It provides full joins without the need for specifying a similarity threshold, which is a very challenging task in this domain. The matrix profile is another time series representing the minimum distance subsequence for each subsequence in the time series (motifs). Maximum distance values of the profile highlight the most dissimilar subsequences (discords).Funding for open Access charge: Universidad de Málaga / CBUA. This work has been supported by the Spanish Government under projects PID2019-105396RB-I00 and PID2022-136575OB-I00
    • …
    corecore