253 research outputs found
Review and Analysis of Failure Detection and Prevention Techniques in IT Infrastructure Monitoring
Maintaining the health of IT infrastructure components for improved reliability and availability is a research and innovation topic for many years. Identification and handling of failures are crucial and challenging due to the complexity of IT infrastructure. System logs are the primary source of information to diagnose and fix failures.
In this work, we address three essential research dimensions about failures, such as the need for failure handling in IT infrastructure, understanding the contribution of system-generated log in failure detection and reactive & proactive approaches used to deal with failure situations.
This study performs a comprehensive analysis of existing literature by considering three prominent aspects as log preprocessing, anomaly & failure detection, and failure prevention.
With this coherent review, we (1) presume the need for IT infrastructure monitoring to avoid downtime, (2) examine the three types of approaches for anomaly and failure detection such as a rule-based, correlation method and classification, and (3) fabricate the recommendations for researchers on further research guidelines.
As far as the authors\u27 knowledge, this is the first comprehensive literature review on IT infrastructure monitoring techniques. The review has been conducted with the help of meta-analysis and comparative study of machine learning and deep learning techniques. This work aims to outline significant research gaps in the area of IT infrastructure failure detection. This work will help future researchers understand the advantages and limitations of current methods and select an adequate approach to their problem
Network problems detection and classification by analyzing syslog data
Network troubleshooting is an important process which has a wide research field. The first step in troubleshooting procedures is to collect information in order to diagnose the problems. Syslog messages which are sent by almost all network devices contain a massive amount of data related to the network problems. It is found that in many studies conducted previously, analyzing syslog data which can be a guideline for network problems and their causes was used. Detecting network problems could be more efficient if the detected problems have been classified in
terms of network layers. Classifying syslog data needs to identify the syslog messages that describe the network problems for each layer, taking into account the different formats of various syslog for vendors’ devices. This study provides a method to classify syslog messages that indicates the network problem in terms of network layers. The method used data mining tool to classify the syslog messages
while the description part of the syslog message was used for classification process. Related syslog messages were identified; features were then selected to train the classifiers. Six classification algorithms were learned; LibSVM, SMO, KNN, NaĂŻve Bayes, J48, and Random Forest. A real data set which was obtained from the
Universiti Utara Malaysia’s (UUM) network devices is used for the prediction stage. Results indicate that SVM shows the best performance during the training and prediction stages. This study contributes to the field of network troubleshooting, and the field of text data classification
EvLog: Evolving Log Analyzer for Anomalous Logs Identification
Software logs record system activities, aiding maintainers in identifying the
underlying causes for failures and enabling prompt mitigation actions. However,
maintainers need to inspect a large volume of daily logs to identify the
anomalous logs that reveal failure details for further diagnosis. Thus, how to
automatically distinguish these anomalous logs from normal logs becomes a
critical problem. Existing approaches alleviate the burden on software
maintainers, but they are built upon an improper yet critical assumption:
logging statements in the software remain unchanged. While software keeps
evolving, our empirical study finds that evolving software brings three
challenges: log parsing errors, evolving log events, and unstable log
sequences.
In this paper, we propose a novel unsupervised approach named Evolving Log
analyzer (EvLog) to mitigate these challenges. We first build a multi-level
representation extractor to process logs without parsing to prevent errors from
the parser. The multi-level representations preserve the essential semantics of
logs while leaving out insignificant changes in evolving events. EvLog then
implements an anomaly discriminator with an attention mechanism to identify the
anomalous logs and avoid the issue brought by the unstable sequence. EvLog has
shown effectiveness in two real-world system evolution log datasets with an
average F1 score of 0.955 and 0.847 in the intra-version setting and
inter-version setting, respectively, which outperforms other state-of-the-art
approaches by a wide margin. To our best knowledge, this is the first study on
tackling anomalous logs over software evolution. We believe our work sheds new
light on the impact of software evolution with the corresponding solutions for
the log analysis community
Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World
This report documents the program and the outcomes of GI-Dagstuhl Seminar
16394 "Software Performance Engineering in the DevOps World".
The seminar addressed the problem of performance-aware DevOps. Both, DevOps
and performance engineering have been growing trends over the past one to two
years, in no small part due to the rise in importance of identifying
performance anomalies in the operations (Ops) of cloud and big data systems and
feeding these back to the development (Dev). However, so far, the research
community has treated software engineering, performance engineering, and cloud
computing mostly as individual research areas. We aimed to identify
cross-community collaboration, and to set the path for long-lasting
collaborations towards performance-aware DevOps.
The main goal of the seminar was to bring together young researchers (PhD
students in a later stage of their PhD, as well as PostDocs or Junior
Professors) in the areas of (i) software engineering, (ii) performance
engineering, and (iii) cloud computing and big data to present their current
research projects, to exchange experience and expertise, to discuss research
challenges, and to develop ideas for future collaborations
Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems
Cloud computing systems fail in complex and unexpected ways due to unexpected
combinations of events and interactions between hardware and software
components. Fault injection is an effective means to bring out these failures
in a controlled environment. However, fault injection experiments produce
massive amounts of data, and manually analyzing these data is inefficient and
error-prone, as the analyst can miss severe failure modes that are yet unknown.
This paper introduces a new paradigm (fault injection analytics) that applies
unsupervised machine learning on execution traces of the injected system, to
ease the discovery and interpretation of failure modes. We evaluated the
proposed approach in the context of fault injection experiments on the
OpenStack cloud computing platform, where we show that the approach can
accurately identify failure modes with a low computational cost.Comment: IEEE Transactions on Dependable and Secure Computing; 16 pages. arXiv
admin note: text overlap with arXiv:1908.1164
Data-Driven Approach for Automatic Telephony Threat Analysis and Campaign Detection
The growth of the telephone network and the availability of Voice over Internet Protocol (VoIP) have both contributed to the availability of a flexible and easy to use artifact for users, but also to a significant increase in cyber-criminal activity. These criminals use emergent technologies to conduct illegal and suspicious activities. For instance, they use VoIP’s flexibility to abuse and scam victims.
A lot of interest has been expressed into the analysis and assessment of telephony cyber-threats. A better understanding of these types of abuse is required in order to detect, mitigate, and attribute these attacks. The purpose of this research work is to generate relevant and timely telephony abuse intelligence that can support the mitigation and/or the investigation of such activities. To achieve this objective, we present, in this thesis, the design and implementation of a Telephony Abuse Intelligence Framework (TAINT) that automatically aggregates, analyzes and reports on telephony abuse activities.
Such a framework monitors and analyzes, in near-real-time, crowd-sourced telephony complaints data from various sources. We deploy our framework on a large dataset of telephony complaints, spanning over seven years, to provide in-depth insights and intelligence about merging telephony threats. The framework presented in this thesis is of paramount importance when it comes to the mitigation, the prevention and the attribution of telephony abuse incidents. We analyze the data and report on the complaint distribution, the used numbers and the spoofed callers’ identifiers. In addition, we identify and geo-locate the sources of the phone calls, and further investigate the underlying telephony threats. Moreover, we quantify the similarity between reported phone numbers to unveil potential groups that are behind specific telephony abuse activities that are actually launched as telephony abuse campaigns
Do Androids Dream of Electric Sheep? On Privacy in the Android Supply Chain
The Android Open Source Project (AOSP) was first released by Google in 2008 and
has since become the most used operating system [Andaf]. Thanks to the openness
of its source code, any smartphone vendor or original equipment manufacturer
(OEM) can modify and adapt Android to their specific needs, or add proprietary features
before installing it on their devices in order to add custom features to differentiate themselves
from competitors. This has created a complex and diverse supply chain, completely opaque to
end-users, formed by manufacturers, resellers, chipset manufacturers, network operators, and
prominent actors of the online industry that partnered with OEMs. Each of these stakeholders
can pre-install extra apps, or implement proprietary features at the framework level.
However, such customizations can create privacy and security threats to end-users. Preinstalled
apps are privileged by the operating system, and can therefore access system APIs
or personal data more easily than apps installed by the user. Unfortunately, despite these
potential threats, there is currently no end-to-end control over what apps come pre-installed
on a device and why, and no traceability of the different software and hardware components
used in a given Android device. In fact, the landscape of pre-installed software in Android and
its security and privacy implications has largely remained unexplored by researchers.
In this thesis, I investigate the customization of Android devices and their impact on the
privacy and security of end-users. Specifically, I perform the first large-scale and systematic
analysis of pre-installed Android apps and the supply chain. To do so, I first develop an app,
Firmware Scanner [Sca], to crowdsource close to 34,000 Android firmware versions from 1,000
different OEMs from all over the world. This dataset allows us to map the stakeholders involved
in the supply chain and their relationships, from device manufacturers and mobile network operators
to third-party organizations like advertising and tracking services, and social network
platforms. I could identify multiple cases of privacy-invasive and potentially harmful behaviors.
My results show a disturbing lack of transparency and control over the Android supply
chain, thus showing that it can be damageable privacy- and security-wise to end-users.
Next, I study the evolution of the Android permission system, an essential security feature of the Android framework. Coupled with other protection mechanisms such as process sandboxing,
the permission system empowers users to control what sensitive resources (e.g., user
contacts, the camera, location sensors) are accessible to which apps. The research community
has extensively studied the permission system, but most previous studies focus on its limitations
or specific attacks. In this thesis, I present an up-to-date view and longitudinal analysis
of the evolution of the permissions system. I study how some lesser-known features of the
permission system, specifically permission flags, can impact the permission granting process,
making it either more restrictive or less. I then highlight how pre-installed apps developers
use said flags in the wild and focus on the privacy and security implications. Specifically, I
show the presence of third-party apps, installed as privileged system apps, potentially using
said features to share resources with other third-party apps.
Another salient feature of the permission system is its extensibility: apps can define their
own custom permissions to expose features and data to other apps. However, little is known
about how widespread the usage of custom permissions is, and what impact these permissions
may have on users’ privacy and security. In the last part of this thesis, I investigate the exposure
and request of custom permissions in the Android ecosystem and their potential for opening
privacy and security risks. I gather a 2.2-million-app-large dataset of both pre-installed and
publicly available apps using both Firmware Scanner and purpose-built app store crawlers.
I find the usage of custom permissions to be pervasive, regardless of the origin of the apps,
and seemingly growing over time. Despite this prevalence, I find that custom permissions are
virtually invisible to end-users, and their purpose is mostly undocumented. While Google recommends
that developers use their reverse domain name as the prefix of their custom permissions
[Gpla], I find widespread violations of this recommendation, making sound attribution
at scale virtually impossible. Through static analysis methods, I demonstrate that custom permissions
can facilitate access to permission-protected system resources to apps that lack those
permissions, without user awareness. Due to the lack of tools for studying such risks, I design
and implement two tools, PermissionTracer [Pere] and PermissionTainter [Perd] to study
custom permissions. I highlight multiple cases of concerning use of custom permissions by
Android apps in the wild.
In this thesis, I systematically studied, at scale, the vast and overlooked ecosystem of preinstalled
Android apps. My results show a complete lack of control of the supply chain which
is worrying, given the huge potential impact of pre-installed apps on the privacy and security
of end-users. I conclude with a number of open research questions and future avenues for
further research in the ecosystem of the supply chain of Android devices.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en IngenierĂa Telemática por la Universidad Carlos III de MadridPresidente: Douglas Leith.- Secretario: RubĂ©n Cuevas RumĂn.- Vocal: Hamed Haddad
Development and Validation of a Proof-of-Concept Prototype for Analytics-based Malicious Cybersecurity Insider Threat in a Real-Time Identification System
Insider threat has continued to be one of the most difficult cybersecurity threat vectors detectable by contemporary technologies. Most organizations apply standard technology-based practices to detect unusual network activity. While there have been significant advances in intrusion detection systems (IDS) as well as security incident and event management solutions (SIEM), these technologies fail to take into consideration the human aspects of personality and emotion in computer use and network activity, since insider threats are human-initiated. External influencers impact how an end-user interacts with both colleagues and organizational resources. Taking into consideration external influencers, such as personality, changes in organizational polices and structure, along with unusual technical activity analysis, would be an improvement over contemporary detection tools used for identifying at-risk employees. This would allow upper management or other organizational units to intervene before a malicious cybersecurity insider threat event occurs, or mitigate it quickly, once initiated.
The main goal of this research study was to design, develop, and validate a proof-of-concept prototype for a malicious cybersecurity insider threat alerting system that will assist in the rapid detection and prediction of human-centric precursors to malicious cybersecurity insider threat activity. Disgruntled employees or end-users wishing to cause harm to the organization may do so by abusing the trust given to them in their access to available network and organizational resources. Reports on malicious insider threat actions indicated that insider threat attacks make up roughly 23% of all cybercrime incidents, resulting in $2.9 trillion in employee fraud losses globally. The damage and negative impact that insider threats cause was reported to be higher than that of outsider or other types of cybercrime incidents. Consequently, this study utilized weighted indicators to measure and correlate simulated user activity to possible precursors to malicious cybersecurity insider threat attacks. This study consisted of a mixed method approach utilizing an expert panel, developmental research, and quantitative data analysis using the developed tool on simulated data set. To assure validity and reliability of the indicators, a panel of subject matter experts (SMEs) reviewed the indicators and indicator categorizations that were collected from prior literature following the Delphi technique. The SMEs’ responses were incorporated into the development of a proof-of-concept prototype. Once the proof-of-concept prototype was completed and fully tested, an empirical simulation research study was conducted utilizing simulated user activity within a 16-month time frame. The results of the empirical simulation study were analyzed and presented. Recommendations resulting from the study also be provided
Performance and Reliability Evaluation of Apache Kafka Messaging System
Streaming data is now flowing across various devices and applications around us. This type of data means any unbounded, ever growing, infinite data set which is continuously generated by all kinds of sources. Examples include sensor data transmitted among different Internet of Things (IoT) devices, user activity records collected on websites and payment requests sent from mobile devices. In many application scenarios, streaming data needs to be processed in real-time because its value can be futile over time. A variety of stream processing systems have been developed in the last decade and are evolving to address rising challenges.
A typical stream processing system consists of multiple processing nodes in the topology of a DAG (directed acyclic graph). To build real-time streaming data pipelines across those nodes, message middleware technology is widely applied. As a distributed messaging system with high durability and scalability, Apache Kafka has become very popular among modern companies. It ingests streaming data from upstream applications and store the data in its distributed cluster, which provides a fault-tolerant data source for stream processors. Therefore, Kafka plays a critical role to ensure the completeness, correctness and timeliness of streaming data delivery.
However, it is impossible to meet all the user requirements in real-time cases with a simple and fixed data delivery strategy. In this thesis, we address the challenge of choosing a proper configuration to guarantee both performance and reliability of Kafka for complex streaming application scenarios. We investigate the features that have an impact on the performance and reliability metrics. We propose a queueing based prediction model to predict the performance metrics, including producer throughput and packet latency of Kafka. We define two reliability metrics, the probability of message loss and the probability of message duplication. We create an ANN model to predict these metrics given unstable network metrics like network delay and packet loss rate. To collect sufficient training data we build a Docker-based Kafka testbed with a fault injection module. We use a new quality-of-service metric, timely throughput to help us choosing proper batch size in Kafka. Based on this metric, we propose a dynamic configuration method, which reactively guarantees both performance and reliability of Kafka under complex operation conditions
- …