430 research outputs found

    Efficient methods for trace analysis parallelization

    Get PDF
    Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency

    In support of workload-aware streaming state management

    Full text link
    Modern distributed stream processors predominantly rely on LSM-based key-value stores to manage the state of long-running computations. We question the suitability of such general-purpose stores for streaming workloads and argue that they incur unnecessary overheads in exchange for state management capabilities. Since streaming operators are instantiated once and are long-running, state types, sizes, and access patterns, can either be inferred at compile time or learned during execution. This paper surfaces the limitations of established practices for streaming state management and advocates for configurable streaming backends, tailored to the state requirements of each operator. Using workload-aware state management, we achieve an order of magnitude improvement in p99 latency and 2x higher throughput.https://www.usenix.org/system/files/hotstorage20_paper_kalavri.pdfPublished versio

    A Survey on Transactional Stream Processing

    Full text link
    Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional functionality with stream processing remains to be established. Existing TSP solutions predominantly concentrate on specific application characteristics and involve complex design trade-offs. This survey intends to introduce TSP and present our perspective on its future progression. Our primary goals are twofold: to provide insights into the diverse TSP requirements and methodologies, and to inspire the design and development of groundbreaking TSP systems

    Support Efficient, Scalable, and Online Social Spam Detection in System

    Get PDF
    The broad success of online social networks (OSNs) has created fertile soil for the emergence and fast spread of social spam. Fake news, malicious URL links, fraudulent advertisements, fake reviews, and biased propaganda are bringing serious consequences for both virtual social networks and human life in the real world. Effectively detecting social spam is a hot topic in both academia and industry. However, traditional social spam detection techniques are limited to centralized processing on top of one specific data source but ignore the social spam correlations of distributed data sources. Moreover, a few research efforts are conducting in integrating the stream system (e.g., Storm, Spark) with the large-scale social spam detection, but they typically ignore the specific details in managing and recovering interim states during the social stream data processing. We observed that social spammers who aim to advertise their products or post victim links are more frequently spreading malicious posts during a very short period of time. They are quite smart to adapt themselves to old models that were trained based on historical records. Therefore, these bring a question: how can we uncover and defend against these online spam activities in an online and scalable manner? In this dissertation, we present there systems that support scalable and online social spam detection from streaming social data: (1) the first part introduces Oases, a scalable system that can support large-scale online social spam detection, (2) the second part introduces a system named SpamHunter, a novel system that supports efficient online scalable spam detection in social networks. The system gives novel insights in guaranteeing the efficiency of the modern stream applications by leveraging the spam correlations at scale, and (3) the third part refers to the state recovery during social spam detection, it introduces a customizable state recovery framework that provides fast and scalable state recovery mechanisms for protecting large distributed states in social spam detection applications

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    Cloud Based IoT Architecture

    Get PDF
    The Internet of Things (IoT) and cloud computing have grown in popularity over the past decade as the internet becomes faster and more ubiquitous. Cloud platforms are well suited to handle IoT systems as they are accessible and resilient, and they provide a scalable solution to store and analyze large amounts of IoT data. IoT applications are complex software systems and software developers need to have a thorough understanding of the capabilities, limitations, architecture, and design patterns of cloud platforms and cloud-based IoT tools to build an efficient, maintainable, and customizable IoT application. As the IoT landscape is constantly changing, research into cloud-based IoT platforms is either lacking or out of date. The goal of this thesis is to describe the basic components and requirements for a cloud-based IoT platform, to provide useful insights and experiences in implementing a cloud-based IoT solution using Microsoft Azure, and to discuss some of the shortcomings when combining IoT with a cloud platform
    corecore