35 research outputs found

    One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

    Get PDF
    Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.Comment: 12 pages, 4 figures, 3 algorithms, 1 table, submitted to EuroPar'1

    Efficient approximation of correlated sums on data streams

    Full text link

    One table to count them all: parallel frequency estimation on single-board computers

    Get PDF
    Sketches are probabilistic data structures that can provide approx- imate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which signif- icantly increased the performance of frequency estimation

    CROSS-DISCIPLINARY COLLABORATIONS IN DATA QUALITY RESEARCH

    Get PDF
    Data Quality has been the target of research and development for over four decades, and due to its cross-disciplinary nature has been approached by business analysts, solution architects, database experts and statisticians to name a few. As data quality increases in importance and complexity, there is a need to motivate the exploitation of synergies across diverse research communities in order to form holistic solutions that span across its organizational, architectural and computational aspects. As a first step towards bridging gaps between the various research communities, we undertook a comprehensive literature study of data quality research published in the last two decades. In this study we considered a broad range of Information System (IS) and Computer Science (CS) publication outlets. The main aims of the study were to understand the current landscape of data quality research, create better awareness of (lack of) synergies between various research communities, and, subsequently, direct attention towards holistic solutions. In this paper, we present a summary of the findings from the study that outline the overlaps and distinctions between the two communities from various points of view, including publication outlets, topics and themes of research, highly cited or influential contributors and strength and nature of co-authorship networks

    Adaptive Time Synchronization for Homogeneous WSNs

    Get PDF
    Wireless sensor networks (WSNs) are being used for observing real‐world phenomenon. It is important that sensor nodes (SNs) must be synchronized to a common time in order to precisely map the data collected by SNs. Clock synchronization is very challenging in WSNs as the sensor networks are resource constrained networks. It is essential that clock synchronization protocols designed for WSNs must be light weight i.e. SNs must be synchronized with fewer synchronization message exchanges. In this paper, we propose a clock synchronization protocol for WSNs where first of all cluster heads (CHs) are synchronized with the sink and then the cluster nodes (CNs) are synchronized with their respective CHs. CNs are synchronized with the help of time synchronization node (TSN) chosen by the respective CHs. Simulation results show that proposed protocol requires considerably fewer synchronization messages as compared with the reference broadcast synchronization (RBS) protocol and minimum variance unbiased estimation (MUVE) method. Clock skew correction mechanism applied in proposed protocol guarantees long term stability and hence decreases re‐ synchronization frequency thereby conserving more energ

    Enhanced Stream Processing in a DBMS Kernel

    Get PDF
    Continuous query processing has emerged as a promising query processing paradigm with numerous applications. A recent development is the need to handle both streaming queries and typical one-time queries in the same application. For example, data warehousing can greatly benefit from the integration of stream semantics, i.e., online analysis of incoming data and combination with existing data. This is especially useful to provide low latency in data-intensive analysis in big data warehouses that are augmented with new data on a daily basis. However, state-of-the-art database technology cannot handle streams efficiently due to their "continuous" nature. At the same time, state-of-the-art stream technology is purely focused on stream applications. The research efforts are mostly geared towards the creation of specialized stream management systems built with a different philosophy than a DBMS. The drawback of this approach is the limited opportunities to exploit successful past data processing technology, e.g., query optimization techniques. For this new problem we need to combine the best of both worlds. Here we take a completely different route by designing a stream engine on top of an existing relational database kernel. This includes reuse of both its storage/execution engine and its optimizer infrastructure. The major challenge then becomes the efficient support for specialized stream features. This paper focuses on incremental window-based processing, arguably the most crucial stream-specific requirement. In order to maintain and reuse the generic storage and execution model of the DBMS, we elevate the problem at the query plan level. Proper op
    corecore