12,220 research outputs found
Exploring Outliers in Crowdsourced Ranking for QoE
Outlier detection is a crucial part of robust evaluation for crowdsourceable
assessment of Quality of Experience (QoE) and has attracted much attention in
recent years. In this paper, we propose some simple and fast algorithms for
outlier detection and robust QoE evaluation based on the nonconvex optimization
principle. Several iterative procedures are designed with or without knowing
the number of outliers in samples. Theoretical analysis is given to show that
such procedures can reach statistically good estimates under mild conditions.
Finally, experimental results with simulated and real-world crowdsourcing
datasets show that the proposed algorithms could produce similar performance to
Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up,
without or with a prior knowledge on the sparsity size of outliers,
respectively. Therefore the proposed methodology provides us a set of helpful
tools for robust QoE evaluation with crowdsourcing data.Comment: accepted by ACM Multimedia 2017 (Oral presentation). arXiv admin
note: text overlap with arXiv:1407.763
ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement
No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version
Log-based Anomaly Detection of CPS Using a Statistical Method
Detecting anomalies of a cyber physical system (CPS), which is a complex
system consisting of both physical and software parts, is important because a
CPS often operates autonomously in an unpredictable environment. However,
because of the ever-changing nature and lack of a precise model for a CPS,
detecting anomalies is still a challenging task. To address this problem, we
propose applying an outlier detection method to a CPS log. By using a log
obtained from an actual aquarium management system, we evaluated the
effectiveness of our proposed method by analyzing outliers that it detected. By
investigating the outliers with the developer of the system, we confirmed that
some outliers indicate actual faults in the system. For example, our method
detected failures of mutual exclusion in the control system that were unknown
to the developer. Our method also detected transient losses of functionalities
and unexpected reboots. On the other hand, our method did not detect anomalies
that were too many and similar. In addition, our method reported rare but
unproblematic concurrent combinations of operations as anomalies. Thus, our
approach is effective at finding anomalies, but there is still room for
improvement
Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian
The extraction of clusters from a dataset which includes multiple clusters
and a significant background component is a non-trivial task of practical
importance. In image analysis this manifests for example in anomaly detection
and target detection. The traditional spectral clustering algorithm, which
relies on the leading eigenvectors to detect clusters, fails in such
cases. In this paper we propose the {\it spectral embedding norm} which sums
the squared values of the first normalized eigenvectors, where can be
significantly larger than . We prove that this quantity can be used to
separate clusters from the background in unbalanced settings, including extreme
cases such as outlier detection. The performance of the algorithm is not
sensitive to the choice of , and we demonstrate its application on synthetic
and real-world remote sensing and neuroimaging datasets
- …