Search CORE

4 research outputs found

In Datacenter Performance, The Only Constant Is Change

Author: Duplyakin Dmitry
Maricq Aleksander
Ricci Robert
Uta Alexandru
Publication venue
Publication date: 10/03/2020
Field of study

All computing infrastructure suffers from performance variability, be it bare-metal or virtualized. This phenomenon originates from many sources: some transient, such as noisy neighbors, and others more permanent but sudden, such as changes or wear in hardware, changes in the underlying hypervisor stack, or even undocumented interactions between the policies of the computing resource provider and the active workloads. Thus, performance measurements obtained on clouds, HPC facilities, and, more generally, datacenter environments are almost guaranteed to exhibit performance regimes that evolve over time, which leads to undesirable nonstationarities in application performance. In this paper, we present our analysis of performance of the bare-metal hardware available on the CloudLab testbed where we focus on quantifying the evolving performance regimes using changepoint detection. We describe our findings, backed by a dataset with nearly 6.9M benchmark results collected from over 1600 machines over a period of 2 years and 9 months. These findings yield a comprehensive characterization of real-world performance variability patterns in one computing facility, a methodology for studying such patterns on other infrastructures, and contribute to a better understanding of performance variability in general.Comment: To be presented at the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid, http://cloudbus.org/ccgrid2020/) on May 11-14, 2020 in Melbourne, Victoria, Australi

arXiv.org e-Print Archive

VU Research Portal

Crossref

Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Author: Atkinson Katie
Hong Xianbin
Li Gangmin
Pal Gautam
Wang Zhuo
Wu Hongyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/12/2019
Field of study

University of Liverpool Repository

Real-time big data processing for anomaly detection : a survey

Author: Ahmed Ejaz
Ariyaluran Habeeb Riyaz
Gani Abdullah
Imran Muhammad
Nasaruddin Fariza
Targio Hashem Ibrahim
Publication venue: Elsevier Ltd
Publication date: 01/01/2019
Field of study

The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

ZENODO

Federation ResearchOnline

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Comparison of Real Time Stream Processing Frameworks

Author: Curtis Jonathan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

The need to process the ever-expanding volumes of information being generated daily in the modern world is driving radical changes in traditional data analysis techniques. As a result of this, a number of open source tools for handling real-time data streams has become available in recent years. Four, in particular, have gained significant traction: Apache Flink, Apache Samza, Apache Spark and Apache Storm. Despite the rising popularity of these frameworks, however, there are few studies that analyse their performance in terms of important metrics, such as throughput and latency. This study aims to correct this, by running several benchmarks against these frameworks

Arrow@TUDublin