10,993 research outputs found
Real-Time Anomaly Detection for Streaming Analytics
Much of the worlds data is streaming, time-series data, where anomalies give
significant information in critical situations. Yet detecting anomalies in
streaming data is a difficult task, requiring detectors to process data in
real-time, and learn while simultaneously making predictions. We present a
novel anomaly detection technique based on an on-line sequence memory algorithm
called Hierarchical Temporal Memory (HTM). We show results from a live
application that detects anomalies in financial metrics in real-time. We also
test the algorithm on NAB, a published benchmark for real-time anomaly
detection, where our algorithm achieves best-in-class results
Anomaly Detection for an E-commerce Pricing System
Online retailers execute a very large number of price updates when compared
to brick-and-mortar stores. Even a few mis-priced items can have a significant
business impact and result in a loss of customer trust. Early detection of
anomalies in an automated real-time fashion is an important part of such a
pricing system. In this paper, we describe unsupervised and supervised anomaly
detection approaches we developed and deployed for a large-scale online pricing
system at Walmart. Our system detects anomalies both in batch and real-time
streaming settings, and the items flagged are reviewed and actioned based on
priority and business impact. We found that having the right architecture
design was critical to facilitate model performance at scale, and business
impact and speed were important factors influencing model selection, parameter
choice, and prioritization in a production environment for a large-scale
system. We conducted analyses on the performance of various approaches on a
test set using real-world retail data and fully deployed our approach into
production. We found that our approach was able to detect the most important
anomalies with high precision.Comment: 10 pages, 4 figure
Low-Delay Adaptive Video Streaming Based on Short-Term TCP Throughput Prediction
Recently, HTTP-Based Adaptive Streaming has become the de facto standard for
video streaming over the Internet. It allows the client to adapt media
characteristics to varying network conditions in order to maximize Quality of
Experience (QoE). In the case of live streaming this task becomes particularly
challenging. An important factor than might help improving performance is the
capability to correctly predict network throughput dynamics on short to medium
timescales. It becomes notably difficult in wireless networks that are often
subject to continuous throughput fluctuations.
In the present work, we develop an adaptation algorithm for HTTP-Based
Adaptive Live Streaming that, for each adaptation decision, maximizes a
QoE-based utility function depending on the probability of playback
interruptions, average video quality, and the amount of video quality
fluctuations. To compute the utility function the algorithm leverages
throughput predictions, and dynamically estimated prediction accuracy.
We are trying to close the gap created by the lack of studies analyzing TCP
throughput on short to medium timescales. We study several time series
prediction methods and their error distributions. We observe that Simple Moving
Average performs best in most cases. We also observe that the relative
underestimation error is best represented by a truncated normal distribution,
while the relative overestimation error is best represented by a Lomax
distribution. Moreover, underestimations and overestimations exhibit a temporal
correlation that we use to further improve prediction accuracy.
We compare the proposed algorithm with a baseline approach that uses a fixed
margin between past throughput and selected media bit rate, and an oracle-based
approach that has perfect knowledge over future throughput for a certain time
horizon.Comment: Technical Report TKN-15-001, Telecommunication Networks Group,
Technische Universitaet Berlin. Updated by TR TKN-16-001, available at
http://arxiv.org/abs/1603.0085
Machine Learning for Networking: Workflow, Advances and Opportunities
Recently, machine learning has been used in every possible field to leverage
its amazing power. For a long time, the net-working and distributed computing
system is the key infrastructure to provide efficient computational resource
for machine learning. Networking itself can also benefit from this promising
technology. This article focuses on the application of Machine Learning
techniques for Networking (MLN), which can not only help solve the intractable
old network questions but also stimulate new network applications. In this
article, we summarize the basic workflow to explain how to apply the machine
learning technology in the networking domain. Then we provide a selective
survey of the latest representative advances with explanations on their design
principles and benefits. These advances are divided into several network design
objectives and the detailed information of how they perform in each step of MLN
workflow is presented. Finally, we shed light on the new opportunities on
networking design and community building of this new inter-discipline. Our goal
is to provide a broad research guideline on networking with machine learning to
help and motivate researchers to develop innovative algorithms, standards and
frameworks.Comment: 8 pages, 2 figure
Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models
Abnormal event detection is one of the important objectives in research and
practical applications of video surveillance. However, there are still three
challenging problems for most anomaly detection systems in practical setting:
limited labeled data, ambiguous definition of "abnormal" and expensive feature
engineering steps. This paper introduces a unified detection framework to
handle these challenges using energy-based models, which are powerful tools for
unsupervised representation learning. Our proposed models are firstly trained
on unlabeled raw pixels of image frames from an input video rather than
hand-crafted visual features; and then identify the locations of abnormal
objects based on the errors between the input video and its reconstruction
produced by the models. To handle video stream, we develop an online version of
our framework, wherein the model parameters are updated incrementally with the
image frames arriving on the fly. Our experiments show that our detectors,
using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs)
as core modules, achieve superior anomaly detection performance to unsupervised
baselines and obtain accuracy comparable with the state-of-the-art approaches
when evaluating at the pixel-level. More importantly, we discover that our
system trained with DBMs is able to simultaneously perform scene clustering and
scene reconstruction. This capacity not only distinguishes our method from
other existing detectors but also offers a unique tool to investigate and
understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter
Muffled Semi-Supervised Learning
We explore a novel approach to semi-supervised learning. This approach is
contrary to the common approach in that the unlabeled examples serve to
"muffle," rather than enhance, the guidance provided by the labeled examples.
We provide several variants of the basic algorithm and show experimentally that
they can achieve significantly higher AUC than boosted trees, random forests
and logistic regression when unlabeled examples are available
Uniqueness of Medical Data Mining: How the new technologies and data they generate are transforming medicine
The paper describes how the new technologies and data they generate are
transforming medicine. It stresses the uniqueness of heterogeneous medical data
and the ways of dealing with them. It lists different sources that generate big
medical data, their security, legal and ethical issues, as well as machine
learning/AI methods of dealing with them. A unique feature of the paper is use
of case studies to illustrate how the new technologies influence medical
practice
Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows
Previous VoIP steganalysis methods face great challenges in detecting speech
signals at low embedding rates, and they are also generally difficult to
perform real-time detection, making them hard to truly maintain cyberspace
security. To solve these two challenges, in this paper, combined with the
sliding window detection algorithm and Convolution Neural Network we propose a
real-time VoIP steganalysis method which based on multi-channel convolution
sliding windows. In order to analyze the correlations between frames and
different neighborhood frames in a VoIP signal, we define multi channel sliding
detection windows. Within each sliding window, we design two feature extraction
channels which contain multiple convolution layers with multiple convolution
kernels each layer to extract correlation features of the input signal. Then
based on these extracted features, we use a forward fully connected network for
feature fusion. Finally, by analyzing the statistical distribution of these
features, the discriminator will determine whether the input speech signal
contains covert information or not.We designed several experiments to test the
proposed model's detection ability under various conditions, including
different embedding rates, different speech length, etc. Experimental results
showed that the proposed model outperforms all the previous methods, especially
in the case of low embedding rate, which showed state-of-the-art performance.
In addition, we also tested the detection efficiency of the proposed model, and
the results showed that it can achieve almost real-time detection of VoIP
speech signals.Comment: 13 pages, summit to ieee transactions on information forensics and
security (tifs
Natural Disaster Classification using Aerial Photography Explainable for Typhoon Damaged Feature
Recent years, typhoon damages has become social problem owing to climate
change. In 9 September 2019, Typhoon Faxai passed on the Chiba in Japan, whose
damages included with electric provision stop because of strong wind recorded
on the maximum 45 meter per second. A large amount of tree fell down, and the
neighbor electric poles also fell down at the same time. These disaster
features have caused that it took 18 days for recovery longer than past ones.
Immediate responses are important for faster recovery. As long as we can,
aerial survey for global screening of devastated region would be required for
decision support to respond where to recover ahead. This paper proposes a
practical method to visualize the damaged areas focused on the typhoon disaster
features using aerial photography. This method can classify eight classes which
contains land covers without damages and areas with disaster. Using target
feature class probabilities, we can visualize disaster feature map to scale a
color range. Furthermore, we can realize explainable map on each unit grid
images to compute the convolutional activation map using Grad-CAM. We
demonstrate case studies applied to aerial photographs recorded at the Chiba
region after typhoon.Comment: 10pages, 5figure
Fast and Accurate Performance Analysis of LTE Radio Access Networks
An increasing amount of analytics is performed on data that is procured in a
real-time fashion to make real-time decisions. Such tasks include simple
reporting on streams to sophisticated model building. However, the practicality
of such analyses are impeded in several domains because they are faced with a
fundamental trade-off between data collection latency and analysis accuracy.
In this paper, we study this trade-off in the context of a specific domain,
Cellular Radio Access Networks (RAN). Our choice of this domain is influenced
by its commonalities with several other domains that produce real-time data,
our access to a large live dataset, and their real-time nature and
dimensionality which makes it a natural fit for a popular analysis technique,
machine learning (ML). We find that the latency accuracy trade-off can be
resolved using two broad, general techniques: intelligent data grouping and
task formulations that leverage domain characteristics. Based on this, we
present CellScope, a system that addresses this challenge by applying a domain
specific formulation and application of Multi-task Learning (MTL) to RAN
performance analysis. It achieves this goal using three techniques: feature
engineering to transform raw data into effective features, a PCA inspired
similarity metric to group data from geographically nearby base stations
sharing performance commonalities, and a hybrid online-offline model for
efficient model updates. Our evaluation of CellScope shows that its accuracy
improvements over direct application of ML range from 2.5x to 4.4x while
reducing the model update overhead by up to 4.8x. We have also used CellScope
to analyze a live LTE consisting of over 2 million subscribers for a period of
over 10 months, where it uncovered several problems and insights, some of them
previously unknown
- …