Search CORE

10,993 research outputs found

Real-Time Anomaly Detection for Streaming Analytics

Author: Ahmad Subutai
Purdy Scott
Publication venue
Publication date: 08/07/2016
Field of study

Much of the worlds data is streaming, time-series data, where anomalies give significant information in critical situations. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, and learn while simultaneously making predictions. We present a novel anomaly detection technique based on an on-line sequence memory algorithm called Hierarchical Temporal Memory (HTM). We show results from a live application that detects anomalies in financial metrics in real-time. We also test the algorithm on NAB, a published benchmark for real-time anomaly detection, where our algorithm achieves best-in-class results

arXiv.org e-Print Archive

Anomaly Detection for an E-commerce Pricing System

Author: Li Chao
Ramakrishnan Jagdish
Shaabani Elham
Sustik Mátyás A.
Publication venue
Publication date: 01/06/2019
Field of study

Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.Comment: 10 pages, 4 figure

arXiv.org e-Print Archive

Low-Delay Adaptive Video Streaming Based on Short-Term TCP Throughput Prediction

Author: Al-Tamimi Abdel-Karim
Miller Konstantin
Wolisz Adam
Publication venue
Publication date: 03/03/2016
Field of study

Recently, HTTP-Based Adaptive Streaming has become the de facto standard for video streaming over the Internet. It allows the client to adapt media characteristics to varying network conditions in order to maximize Quality of Experience (QoE). In the case of live streaming this task becomes particularly challenging. An important factor than might help improving performance is the capability to correctly predict network throughput dynamics on short to medium timescales. It becomes notably difficult in wireless networks that are often subject to continuous throughput fluctuations. In the present work, we develop an adaptation algorithm for HTTP-Based Adaptive Live Streaming that, for each adaptation decision, maximizes a QoE-based utility function depending on the probability of playback interruptions, average video quality, and the amount of video quality fluctuations. To compute the utility function the algorithm leverages throughput predictions, and dynamically estimated prediction accuracy. We are trying to close the gap created by the lack of studies analyzing TCP throughput on short to medium timescales. We study several time series prediction methods and their error distributions. We observe that Simple Moving Average performs best in most cases. We also observe that the relative underestimation error is best represented by a truncated normal distribution, while the relative overestimation error is best represented by a Lomax distribution. Moreover, underestimations and overestimations exhibit a temporal correlation that we use to further improve prediction accuracy. We compare the proposed algorithm with a baseline approach that uses a fixed margin between past throughput and selected media bit rate, and an oracle-based approach that has perfect knowledge over future throughput for a certain time horizon.Comment: Technical Report TKN-15-001, Telecommunication Networks Group, Technische Universitaet Berlin. Updated by TR TKN-16-001, available at http://arxiv.org/abs/1603.0085

arXiv.org e-Print Archive

Machine Learning for Networking: Workflow, Advances and Opportunities

Author: Cui Yong
Jiang Junchen
Wang Mowei
Wang Xin
Xiao Shihan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/11/2017
Field of study

Recently, machine learning has been used in every possible field to leverage its amazing power. For a long time, the net-working and distributed computing system is the key infrastructure to provide efficient computational resource for machine learning. Networking itself can also benefit from this promising technology. This article focuses on the application of Machine Learning techniques for Networking (MLN), which can not only help solve the intractable old network questions but also stimulate new network applications. In this article, we summarize the basic workflow to explain how to apply the machine learning technology in the networking domain. Then we provide a selective survey of the latest representative advances with explanations on their design principles and benefits. These advances are divided into several network design objectives and the detailed information of how they perform in each step of MLN workflow is presented. Finally, we shed light on the new opportunities on networking design and community building of this new inter-discipline. Our goal is to provide a broad research guideline on networking with machine learning to help and motivate researchers to develop innovative algorithms, standards and frameworks.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models

Author: Nguyen Tu Dinh
Phung Dinh
Vu Hung
Publication venue
Publication date: 29/09/2018
Field of study

Abnormal event detection is one of the important objectives in research and practical applications of video surveillance. However, there are still three challenging problems for most anomaly detection systems in practical setting: limited labeled data, ambiguous definition of "abnormal" and expensive feature engineering steps. This paper introduces a unified detection framework to handle these challenges using energy-based models, which are powerful tools for unsupervised representation learning. Our proposed models are firstly trained on unlabeled raw pixels of image frames from an input video rather than hand-crafted visual features; and then identify the locations of abnormal objects based on the errors between the input video and its reconstruction produced by the models. To handle video stream, we develop an online version of our framework, wherein the model parameters are updated incrementally with the image frames arriving on the fly. Our experiments show that our detectors, using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs) as core modules, achieve superior anomaly detection performance to unsupervised baselines and obtain accuracy comparable with the state-of-the-art approaches when evaluating at the pixel-level. More importantly, we discover that our system trained with DBMs is able to simultaneously perform scene clustering and scene reconstruction. This capacity not only distinguishes our method from other existing detectors but also offers a unique tool to investigate and understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter

arXiv.org e-Print Archive

Muffled Semi-Supervised Learning

Author: Balsubramani Akshay
Freund Yoav
Publication venue
Publication date: 27/05/2016
Field of study

We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than boosted trees, random forests and logistic regression when unlabeled examples are available

arXiv.org e-Print Archive

Uniqueness of Medical Data Mining: How the new technologies and data they generate are transforming medicine

Author: Cios Jacquelyne
Cios Krzysztof J.
Krawczyk Bartosz
Staley Kevin J.
Publication venue
Publication date: 22/05/2019
Field of study

The paper describes how the new technologies and data they generate are transforming medicine. It stresses the uniqueness of heterogeneous medical data and the ways of dealing with them. It lists different sources that generate big medical data, their security, legal and ethical issues, as well as machine learning/AI methods of dealing with them. A unique feature of the paper is use of case studies to illustrate how the new technologies influence medical practice

arXiv.org e-Print Archive

Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows

Author: Hu Yuting
Huang Yongfeng
Yang Hao
Yang Zhongliang
Zhang Yu-Jin
Publication venue
Publication date: 04/02/2019
Field of study

Previous VoIP steganalysis methods face great challenges in detecting speech signals at low embedding rates, and they are also generally difficult to perform real-time detection, making them hard to truly maintain cyberspace security. To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows. In order to analyze the correlations between frames and different neighborhood frames in a VoIP signal, we define multi channel sliding detection windows. Within each sliding window, we design two feature extraction channels which contain multiple convolution layers with multiple convolution kernels each layer to extract correlation features of the input signal. Then based on these extracted features, we use a forward fully connected network for feature fusion. Finally, by analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not.We designed several experiments to test the proposed model's detection ability under various conditions, including different embedding rates, different speech length, etc. Experimental results showed that the proposed model outperforms all the previous methods, especially in the case of low embedding rate, which showed state-of-the-art performance. In addition, we also tested the detection efficiency of the proposed model, and the results showed that it can achieve almost real-time detection of VoIP speech signals.Comment: 13 pages, summit to ieee transactions on information forensics and security (tifs

arXiv.org e-Print Archive

Natural Disaster Classification using Aerial Photography Explainable for Typhoon Damaged Feature

Author: Amakata Masazumi
Okano Masahiro
Yasuno Takato
Publication venue
Publication date: 16/11/2020
Field of study

Recent years, typhoon damages has become social problem owing to climate change. In 9 September 2019, Typhoon Faxai passed on the Chiba in Japan, whose damages included with electric provision stop because of strong wind recorded on the maximum 45 meter per second. A large amount of tree fell down, and the neighbor electric poles also fell down at the same time. These disaster features have caused that it took 18 days for recovery longer than past ones. Immediate responses are important for faster recovery. As long as we can, aerial survey for global screening of devastated region would be required for decision support to respond where to recover ahead. This paper proposes a practical method to visualize the damaged areas focused on the typhoon disaster features using aerial photography. This method can classify eight classes which contains land covers without damages and areas with disaster. Using target feature class probabilities, we can visualize disaster feature map to scale a color range. Furthermore, we can realize explainable map on each unit grid images to compute the convolutional activation map using Grad-CAM. We demonstrate case studies applied to aerial photographs recorded at the Chiba region after typhoon.Comment: 10pages, 5figure

arXiv.org e-Print Archive

Fast and Accurate Performance Analysis of LTE Radio Access Networks

Author: Chowdhury Mosharaf
Iyer Anand Padmanabha
Li Li Erran
Stoica Ion
Publication venue
Publication date: 17/05/2016
Field of study

An increasing amount of analytics is performed on data that is procured in a real-time fashion to make real-time decisions. Such tasks include simple reporting on streams to sophisticated model building. However, the practicality of such analyses are impeded in several domains because they are faced with a fundamental trade-off between data collection latency and analysis accuracy. In this paper, we study this trade-off in the context of a specific domain, Cellular Radio Access Networks (RAN). Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML). We find that the latency accuracy trade-off can be resolved using two broad, general techniques: intelligent data grouping and task formulations that leverage domain characteristics. Based on this, we present CellScope, a system that addresses this challenge by applying a domain specific formulation and application of Multi-task Learning (MTL) to RAN performance analysis. It achieves this goal using three techniques: feature engineering to transform raw data into effective features, a PCA inspired similarity metric to group data from geographically nearby base stations sharing performance commonalities, and a hybrid online-offline model for efficient model updates. Our evaluation of CellScope shows that its accuracy improvements over direct application of ML range from 2.5x to 4.4x while reducing the model update overhead by up to 4.8x. We have also used CellScope to analyze a live LTE consisting of over 2 million subscribers for a period of over 10 months, where it uncovered several problems and insights, some of them previously unknown

arXiv.org e-Print Archive