6,774 research outputs found

    Benchmarking Distributed Stream Data Processing Systems

    Full text link
    The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

    GPU PaaS Computation Model in Aneka Cloud Computing Environment

    Full text link
    Due to the surge in the volume of data generated and rapid advancement in Artificial Intelligence (AI) techniques like machine learning and deep learning, the existing traditional computing models have become inadequate to process an enormous volume of data and the complex application logic for extracting intrinsic information. Computing accelerators such as Graphics processing units (GPUs) have become de facto SIMD computing system for many big data and machine learning applications. On the other hand, the traditional computing model has gradually switched from conventional ownership-based computing to subscription-based cloud computing model. However, the lack of programming models and frameworks to develop cloud-native applications in a seamless manner to utilize both CPU and GPU resources in the cloud has become a bottleneck for rapid application development. To support this application demand for simultaneous heterogeneous resource usage, programming models and new frameworks are needed to manage the underlying resources effectively. Aneka is emerged as a popular PaaS computing model for the development of Cloud applications using multiple programming models like Thread, Task, and MapReduce in a single container .NET platform. Since, Aneka addresses MIMD application development that uses CPU based resources and GPU programming like CUDA is designed for SIMD application development, here, the chapter discusses GPU PaaS computing model for Aneka Clouds for rapid cloud application development for .NET platforms. The popular opensource GPU libraries are utilized and integrated it into the existing Aneka task programming model. The scheduling policies are extended that automatically identify GPU machines and schedule respective tasks accordingly. A case study on image processing is discussed to demonstrate the system, which has been built using PaaS Aneka SDKs and CUDA library.Comment: Submitted as book chapter, under processing, 32 page

    ns3-gym: Extending OpenAI Gym for Networking Research

    Full text link
    OpenAI Gym is a toolkit for reinforcement learning (RL) research. It includes a large number of well-known problems that expose a common interface allowing to directly compare the performance results of different RL algorithms. Since many years, the ns-3 network simulation tool is the de-facto standard for academic and industry research into networking protocols and communications technology. Numerous scientific papers were written reporting results obtained using ns-3, and hundreds of models and modules were written and contributed to the ns-3 code base. Today as a major trend in network research we see the use of machine learning tools like RL. What is missing is the integration of a RL framework like OpenAI Gym into the network simulator ns-3. This paper presents the ns3-gym framework. First, we discuss design decisions that went into the software. Second, two illustrative examples implemented using ns3-gym are presented. Our software package is provided to the community as open source under a GPL license and hence can be easily extended

    Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models

    Full text link
    Abnormal event detection is one of the important objectives in research and practical applications of video surveillance. However, there are still three challenging problems for most anomaly detection systems in practical setting: limited labeled data, ambiguous definition of "abnormal" and expensive feature engineering steps. This paper introduces a unified detection framework to handle these challenges using energy-based models, which are powerful tools for unsupervised representation learning. Our proposed models are firstly trained on unlabeled raw pixels of image frames from an input video rather than hand-crafted visual features; and then identify the locations of abnormal objects based on the errors between the input video and its reconstruction produced by the models. To handle video stream, we develop an online version of our framework, wherein the model parameters are updated incrementally with the image frames arriving on the fly. Our experiments show that our detectors, using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs) as core modules, achieve superior anomaly detection performance to unsupervised baselines and obtain accuracy comparable with the state-of-the-art approaches when evaluating at the pixel-level. More importantly, we discover that our system trained with DBMs is able to simultaneously perform scene clustering and scene reconstruction. This capacity not only distinguishes our method from other existing detectors but also offers a unique tool to investigate and understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter

    Memory Warps for Learning Long-Term Online Video Representations

    Full text link
    This paper proposes a novel memory-based online video representation that is efficient, accurate and predictive. This is in contrast to prior works that often rely on computationally heavy 3D convolutions, ignore actual motion when aligning features over time, or operate in an off-line mode to utilize future frames. In particular, our memory (i) holds the feature representation, (ii) is spatially warped over time to compensate for observer and scene motions, (iii) can carry long-term information, and (iv) enables predicting feature representations in future frames. By exploring a variant that operates at multiple temporal scales, we efficiently learn across even longer time horizons. We apply our online framework to object detection in videos, obtaining a large 2.3 times speed-up and losing only 0.9% mAP on ImageNet-VID dataset, compared to prior works that even use future frames. Finally, we demonstrate the predictive property of our representation in two novel detection setups, where features are propagated over time to (i) significantly enhance a real-time detector by more than 10% mAP in a multi-threaded online setup and to (ii) anticipate objects in future frames

    Towards Efficient Large-Scale Graph Neural Network Computing

    Full text link
    Recent deep learning models have moved beyond low-dimensional regular grids such as image, video, and speech, to high-dimensional graph-structured data, such as social networks, brain connections, and knowledge graphs. This evolution has led to large graph-based irregular and sparse models that go beyond what existing deep learning frameworks are designed for. Further, these models are not easily amenable to efficient, at scale, acceleration on parallel hardwares (e.g. GPUs). We introduce NGra, the first parallel processing framework for graph-based deep neural networks (GNNs). NGra presents a new SAGA-NN model for expressing deep neural networks as vertex programs with each layer in well-defined (Scatter, ApplyEdge, Gather, ApplyVertex) graph operation stages. This model not only allows GNNs to be expressed intuitively, but also facilitates the mapping to an efficient dataflow representation. NGra addresses the scalability challenge transparently through automatic graph partitioning and chunk-based stream processing out of GPU core or over multiple GPUs, which carefully considers data locality, data movement, and overlapping of parallel processing and data movement. NGra further achieves efficiency through highly optimized Scatter/Gather operators on GPUs despite its sparsity. Our evaluation shows that NGra scales to large real graphs that none of the existing frameworks can handle directly, while achieving up to about 4 times speedup even at small scales over the multiple-baseline design on TensorFlow

    Fast and Accurate Performance Analysis of LTE Radio Access Networks

    Full text link
    An increasing amount of analytics is performed on data that is procured in a real-time fashion to make real-time decisions. Such tasks include simple reporting on streams to sophisticated model building. However, the practicality of such analyses are impeded in several domains because they are faced with a fundamental trade-off between data collection latency and analysis accuracy. In this paper, we study this trade-off in the context of a specific domain, Cellular Radio Access Networks (RAN). Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML). We find that the latency accuracy trade-off can be resolved using two broad, general techniques: intelligent data grouping and task formulations that leverage domain characteristics. Based on this, we present CellScope, a system that addresses this challenge by applying a domain specific formulation and application of Multi-task Learning (MTL) to RAN performance analysis. It achieves this goal using three techniques: feature engineering to transform raw data into effective features, a PCA inspired similarity metric to group data from geographically nearby base stations sharing performance commonalities, and a hybrid online-offline model for efficient model updates. Our evaluation of CellScope shows that its accuracy improvements over direct application of ML range from 2.5x to 4.4x while reducing the model update overhead by up to 4.8x. We have also used CellScope to analyze a live LTE consisting of over 2 million subscribers for a period of over 10 months, where it uncovered several problems and insights, some of them previously unknown

    noteEd - A web-based lecture capture system

    No full text
    Electronic capture and playback of lectures has long been the aim of many academic projects. Synote is an application developed under MACFoB (Multimedia Annotation and Community Folksonomy Building) project to synchronise the playback of lecture materials. However, Synote provides no functionality to capture such multimedia. This project involves the creation of a system called noteEd, which will capture a range of multimedia from lectures and make them available to Synote. This report describes the evolution of the noteEd project throughout the design and implementation of the proposed system. The performance of the system was checked in a user acceptance test with the customer, which is discussed after screenshots of our solution. Finally, the project management is presented containing a final project evaluation

    Massivizing Computer Systems: a Vision to Understand, Design, and Engineer Computer Ecosystems through and beyond Modern Distributed Systems

    Full text link
    Our society is digital: industry, science, governance, and individuals depend, often transparently, on the inter-operation of large numbers of distributed computer systems. Although the society takes them almost for granted, these computer ecosystems are not available for all, may not be affordable for long, and raise numerous other research challenges. Inspired by these challenges and by our experience with distributed computer systems, we envision Massivizing Computer Systems, a domain of computer science focusing on understanding, controlling, and evolving successfully such ecosystems. Beyond establishing and growing a body of knowledge about computer ecosystems and their constituent systems, the community in this domain should also aim to educate many about design and engineering for this domain, and all people about its principles. This is a call to the entire community: there is much to discover and achieve

    Improving the transfer of machine learning-based video QoE estimation across diverse networks

    Get PDF
    With video streaming traffic generally being encrypted end-to-end, there is a lot of interest from network operators to find novel ways to evaluate streaming performance at the application layer. Machine learning (ML) has been extensively used to develop solutions that infer application-level Key Performance Indicators (KPI) and/or Quality of Experience (QoE) from the patterns in encrypted traffic. Having such insights provides the means for more user-centric traffic management and enables the mitigation of QoE degradations, thus potentially preventing customer churn. The ML–based QoE/KPI estimation solutions proposed in literature are typically trained on a limited set of network scenarios and it is often unclear how the obtained models perform if applied in a previously unseen setting (e.g., if the model is applied at the premises of a different network operator). In this paper, we address this gap by cross-evaluating the performance of QoE/KPI estimation models trained on 4 separate datasets generated from streaming 48000 video streaming sessions. The paper evaluates a set of methods for improving the performance of models when applied in a different network. Analyzed methods require no or considerably less application-level ground-truth data collected in the new setting, thus significantly reducing the extensiveness of required data collection
    • 

    corecore