6,774 research outputs found
Benchmarking Distributed Stream Data Processing Systems
The need for scalable and efficient stream analysis has led to the
development of many open-source streaming data processing systems (SDPSs) with
highly diverging capabilities and performance characteristics. While first
initiatives try to compare the systems for simple workloads, there is a clear
gap of detailed analyses of the systems' performance characteristics. In this
paper, we propose a framework for benchmarking distributed stream processing
engines. We use our suite to evaluate the performance of three widely used
SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our
evaluation focuses in particular on measuring the throughput and latency of
windowed operations, which are the basic type of operations in stream
analytics. For this benchmark, we design workloads based on real-life,
industrial use-cases inspired by the online gaming industry. The contribution
of our work is threefold. First, we give a definition of latency and throughput
for stateful operators. Second, we carefully separate the system under test and
driver, in order to correctly represent the open world model of typical stream
processing deployments and can, therefore, measure system performance under
realistic conditions. Third, we build the first benchmarking framework to
define and test the sustainable performance of streaming systems.
Our detailed evaluation highlights the individual characteristics and
use-cases of each system.Comment: Published at ICDE 201
GPU PaaS Computation Model in Aneka Cloud Computing Environment
Due to the surge in the volume of data generated and rapid advancement in
Artificial Intelligence (AI) techniques like machine learning and deep
learning, the existing traditional computing models have become inadequate to
process an enormous volume of data and the complex application logic for
extracting intrinsic information. Computing accelerators such as Graphics
processing units (GPUs) have become de facto SIMD computing system for many big
data and machine learning applications. On the other hand, the traditional
computing model has gradually switched from conventional ownership-based
computing to subscription-based cloud computing model. However, the lack of
programming models and frameworks to develop cloud-native applications in a
seamless manner to utilize both CPU and GPU resources in the cloud has become a
bottleneck for rapid application development. To support this application
demand for simultaneous heterogeneous resource usage, programming models and
new frameworks are needed to manage the underlying resources effectively. Aneka
is emerged as a popular PaaS computing model for the development of Cloud
applications using multiple programming models like Thread, Task, and MapReduce
in a single container .NET platform. Since, Aneka addresses MIMD application
development that uses CPU based resources and GPU programming like CUDA is
designed for SIMD application development, here, the chapter discusses GPU PaaS
computing model for Aneka Clouds for rapid cloud application development for
.NET platforms. The popular opensource GPU libraries are utilized and
integrated it into the existing Aneka task programming model. The scheduling
policies are extended that automatically identify GPU machines and schedule
respective tasks accordingly. A case study on image processing is discussed to
demonstrate the system, which has been built using PaaS Aneka SDKs and CUDA
library.Comment: Submitted as book chapter, under processing, 32 page
ns3-gym: Extending OpenAI Gym for Networking Research
OpenAI Gym is a toolkit for reinforcement learning (RL) research. It includes
a large number of well-known problems that expose a common interface allowing
to directly compare the performance results of different RL algorithms. Since
many years, the ns-3 network simulation tool is the de-facto standard for
academic and industry research into networking protocols and communications
technology. Numerous scientific papers were written reporting results obtained
using ns-3, and hundreds of models and modules were written and contributed to
the ns-3 code base. Today as a major trend in network research we see the use
of machine learning tools like RL. What is missing is the integration of a RL
framework like OpenAI Gym into the network simulator ns-3. This paper presents
the ns3-gym framework. First, we discuss design decisions that went into the
software. Second, two illustrative examples implemented using ns3-gym are
presented. Our software package is provided to the community as open source
under a GPL license and hence can be easily extended
Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models
Abnormal event detection is one of the important objectives in research and
practical applications of video surveillance. However, there are still three
challenging problems for most anomaly detection systems in practical setting:
limited labeled data, ambiguous definition of "abnormal" and expensive feature
engineering steps. This paper introduces a unified detection framework to
handle these challenges using energy-based models, which are powerful tools for
unsupervised representation learning. Our proposed models are firstly trained
on unlabeled raw pixels of image frames from an input video rather than
hand-crafted visual features; and then identify the locations of abnormal
objects based on the errors between the input video and its reconstruction
produced by the models. To handle video stream, we develop an online version of
our framework, wherein the model parameters are updated incrementally with the
image frames arriving on the fly. Our experiments show that our detectors,
using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs)
as core modules, achieve superior anomaly detection performance to unsupervised
baselines and obtain accuracy comparable with the state-of-the-art approaches
when evaluating at the pixel-level. More importantly, we discover that our
system trained with DBMs is able to simultaneously perform scene clustering and
scene reconstruction. This capacity not only distinguishes our method from
other existing detectors but also offers a unique tool to investigate and
understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter
Memory Warps for Learning Long-Term Online Video Representations
This paper proposes a novel memory-based online video representation that is
efficient, accurate and predictive. This is in contrast to prior works that
often rely on computationally heavy 3D convolutions, ignore actual motion when
aligning features over time, or operate in an off-line mode to utilize future
frames. In particular, our memory (i) holds the feature representation, (ii) is
spatially warped over time to compensate for observer and scene motions, (iii)
can carry long-term information, and (iv) enables predicting feature
representations in future frames. By exploring a variant that operates at
multiple temporal scales, we efficiently learn across even longer time
horizons. We apply our online framework to object detection in videos,
obtaining a large 2.3 times speed-up and losing only 0.9% mAP on ImageNet-VID
dataset, compared to prior works that even use future frames. Finally, we
demonstrate the predictive property of our representation in two novel
detection setups, where features are propagated over time to (i) significantly
enhance a real-time detector by more than 10% mAP in a multi-threaded online
setup and to (ii) anticipate objects in future frames
Towards Efficient Large-Scale Graph Neural Network Computing
Recent deep learning models have moved beyond low-dimensional regular grids
such as image, video, and speech, to high-dimensional graph-structured data,
such as social networks, brain connections, and knowledge graphs. This
evolution has led to large graph-based irregular and sparse models that go
beyond what existing deep learning frameworks are designed for. Further, these
models are not easily amenable to efficient, at scale, acceleration on parallel
hardwares (e.g. GPUs). We introduce NGra, the first parallel processing
framework for graph-based deep neural networks (GNNs). NGra presents a new
SAGA-NN model for expressing deep neural networks as vertex programs with each
layer in well-defined (Scatter, ApplyEdge, Gather, ApplyVertex) graph operation
stages. This model not only allows GNNs to be expressed intuitively, but also
facilitates the mapping to an efficient dataflow representation. NGra addresses
the scalability challenge transparently through automatic graph partitioning
and chunk-based stream processing out of GPU core or over multiple GPUs, which
carefully considers data locality, data movement, and overlapping of parallel
processing and data movement. NGra further achieves efficiency through highly
optimized Scatter/Gather operators on GPUs despite its sparsity. Our evaluation
shows that NGra scales to large real graphs that none of the existing
frameworks can handle directly, while achieving up to about 4 times speedup
even at small scales over the multiple-baseline design on TensorFlow
Fast and Accurate Performance Analysis of LTE Radio Access Networks
An increasing amount of analytics is performed on data that is procured in a
real-time fashion to make real-time decisions. Such tasks include simple
reporting on streams to sophisticated model building. However, the practicality
of such analyses are impeded in several domains because they are faced with a
fundamental trade-off between data collection latency and analysis accuracy.
In this paper, we study this trade-off in the context of a specific domain,
Cellular Radio Access Networks (RAN). Our choice of this domain is influenced
by its commonalities with several other domains that produce real-time data,
our access to a large live dataset, and their real-time nature and
dimensionality which makes it a natural fit for a popular analysis technique,
machine learning (ML). We find that the latency accuracy trade-off can be
resolved using two broad, general techniques: intelligent data grouping and
task formulations that leverage domain characteristics. Based on this, we
present CellScope, a system that addresses this challenge by applying a domain
specific formulation and application of Multi-task Learning (MTL) to RAN
performance analysis. It achieves this goal using three techniques: feature
engineering to transform raw data into effective features, a PCA inspired
similarity metric to group data from geographically nearby base stations
sharing performance commonalities, and a hybrid online-offline model for
efficient model updates. Our evaluation of CellScope shows that its accuracy
improvements over direct application of ML range from 2.5x to 4.4x while
reducing the model update overhead by up to 4.8x. We have also used CellScope
to analyze a live LTE consisting of over 2 million subscribers for a period of
over 10 months, where it uncovered several problems and insights, some of them
previously unknown
noteEd - A web-based lecture capture system
Electronic capture and playback of lectures has long been the aim of many academic projects. Synote is an application developed under MACFoB (Multimedia Annotation and Community Folksonomy Building) project to synchronise the playback of lecture materials. However, Synote provides no functionality to capture such multimedia. This project involves the creation of a system called noteEd, which will capture a range of multimedia from lectures and make them available to Synote. This report describes the evolution of the noteEd project throughout the design and implementation of the proposed system. The performance of the system was checked in a user acceptance test with the customer, which is discussed after screenshots of our solution. Finally, the project management is presented containing a final project evaluation
Massivizing Computer Systems: a Vision to Understand, Design, and Engineer Computer Ecosystems through and beyond Modern Distributed Systems
Our society is digital: industry, science, governance, and individuals
depend, often transparently, on the inter-operation of large numbers of
distributed computer systems. Although the society takes them almost for
granted, these computer ecosystems are not available for all, may not be
affordable for long, and raise numerous other research challenges. Inspired by
these challenges and by our experience with distributed computer systems, we
envision Massivizing Computer Systems, a domain of computer science focusing on
understanding, controlling, and evolving successfully such ecosystems. Beyond
establishing and growing a body of knowledge about computer ecosystems and
their constituent systems, the community in this domain should also aim to
educate many about design and engineering for this domain, and all people about
its principles. This is a call to the entire community: there is much to
discover and achieve
Improving the transfer of machine learning-based video QoE estimation across diverse networks
With video streaming traffic generally being encrypted end-to-end, there is a lot of interest from network operators to find novel ways to evaluate streaming performance at the application layer. Machine learning (ML) has been extensively used to develop solutions that infer application-level Key Performance Indicators (KPI) and/or Quality of Experience (QoE) from the patterns in encrypted traffic. Having such insights provides the means for more user-centric traffic management and enables the mitigation of QoE degradations, thus potentially preventing customer churn. The MLâbased QoE/KPI estimation solutions proposed in literature are typically trained on a limited set of network scenarios and it is often unclear how the obtained models perform if applied in a previously unseen setting (e.g., if the model is applied at the premises of a different network operator). In this paper, we address this gap by cross-evaluating the performance of QoE/KPI estimation models trained on 4 separate datasets generated from streaming 48000 video streaming sessions. The paper evaluates a set of methods for improving the performance of models when applied in a different network. Analyzed methods require no or considerably less application-level ground-truth data collected in the new setting, thus significantly reducing the extensiveness of required data collection
- âŠ