666 research outputs found
Assessing Data Usefulness for Failure Analysis in Anonymized System Logs
System logs are a valuable source of information for the analysis and
understanding of systems behavior for the purpose of improving their
performance. Such logs contain various types of information, including
sensitive information. Information deemed sensitive can either directly be
extracted from system log entries by correlation of several log entries, or can
be inferred from the combination of the (non-sensitive) information contained
within system logs with other logs and/or additional datasets. The analysis of
system logs containing sensitive information compromises data privacy.
Therefore, various anonymization techniques, such as generalization and
suppression have been employed, over the years, by data and computing centers
to protect the privacy of their users, their data, and the system as a whole.
Privacy-preserving data resulting from anonymization via generalization and
suppression may lead to significantly decreased data usefulness, thus,
hindering the intended analysis for understanding the system behavior.
Maintaining a balance between data usefulness and privacy preservation,
therefore, remains an open and important challenge. Irreversible encoding of
system logs using collision-resistant hashing algorithms, such as SHAKE-128, is
a novel approach previously introduced by the authors to mitigate data privacy
concerns. The present work describes a study of the applicability of the
encoding approach from earlier work on the system logs of a production high
performance computing system. Moreover, a metric is introduced to assess the
data usefulness of the anonymized system logs to detect and identify the
failures encountered in the system.Comment: 11 pages, 3 figures, submitted to 17th IEEE International Symposium
on Parallel and Distributed Computin
Network problems detection and classification by analyzing syslog data
Network troubleshooting is an important process which has a wide research field. The first step in troubleshooting procedures is to collect information in order to diagnose the problems. Syslog messages which are sent by almost all network devices contain a massive amount of data related to the network problems. It is found that in many studies conducted previously, analyzing syslog data which can be a guideline for network problems and their causes was used. Detecting network problems could be more efficient if the detected problems have been classified in
terms of network layers. Classifying syslog data needs to identify the syslog messages that describe the network problems for each layer, taking into account the different formats of various syslog for vendors’ devices. This study provides a method to classify syslog messages that indicates the network problem in terms of network layers. The method used data mining tool to classify the syslog messages
while the description part of the syslog message was used for classification process. Related syslog messages were identified; features were then selected to train the classifiers. Six classification algorithms were learned; LibSVM, SMO, KNN, Naïve Bayes, J48, and Random Forest. A real data set which was obtained from the
Universiti Utara Malaysia’s (UUM) network devices is used for the prediction stage. Results indicate that SVM shows the best performance during the training and prediction stages. This study contributes to the field of network troubleshooting, and the field of text data classification
Anomaly Detection in High Performance Computers: A Vicinity Perspective
In response to the demand for higher computational power, the number of
computing nodes in high performance computers (HPC) increases rapidly. Exascale
HPC systems are expected to arrive by 2020. With drastic increase in the number
of HPC system components, it is expected to observe a sudden increase in the
number of failures which, consequently, poses a threat to the continuous
operation of the HPC systems. Detecting failures as early as possible and,
ideally, predicting them, is a necessary step to avoid interruptions in HPC
systems operation. Anomaly detection is a well-known general purpose approach
for failure detection, in computing systems. The majority of existing methods
are designed for specific architectures, require adjustments on the computing
systems hardware and software, need excessive information, or pose a threat to
users' and systems' privacy. This work proposes a node failure detection
mechanism based on a vicinity-based statistical anomaly detection approach
using passively collected and anonymized system log entries. Application of the
proposed approach on system logs collected over 8 months indicates an anomaly
detection precision between 62% to 81%.Comment: 9 pages, Submitted to the 18th IEEE International Symposium on
Parallel and Distributed Computin
Comparative analysis of classification techniques for network fault management
Network troubleshooting is a significant process. Many studies were conducted about it. The first step in the troubleshooting procedures is represented in collecting information. It's collected in order to identify the problems. Syslog messages which are sent by almost all network devices include a massive amount of data that concern the network problems. Based on several studies, it was found that analyzing syslog data (which) can be a guideline for network problems and their causes. The detection of network problems can become more efficient if the detected problems have been classified based on the network layers. Classifying syslog data requires identifying the syslog messages that describe the network problems for each layer. It also requires taking into account the formats of syslog for vendors' devices. The present study aimed to propose a method for classifying the syslog messages which identify the network problem.This classification is conducted based on the network layers. This method uses data mining instrument to classify the syslog messages. The description part of the syslog message was used for carrying out the classification process.The relevant syslog messages were identified. The features were then selected to train the classifiers. Six classification algorithms were learned; LibSVM, SMO, KNN, Naïve Bayes, J48, and Random Forest. A real data set was obtained from an educational network device. This dataset was used for the prediction stage. It was found that that LibSVM outperforms other classifiers in terms of the probability rate of the classified instances where it was in the range of 89.90%-32.80%. Furthermore, the validation results indicate that the probability rate of the correctly classified instances is >70%. © 2020 Turkiye Klinikleri. All rights reserved
Understanding a large-scale IPTV network via system logs
Recently, there has been a global trend among the telecommunication industry on the rapid deployment of IPTV (Internet Protocol Television) infrastructure and services. While the industry rushes into the IPTV era, the comprehensive understanding of the status and dynamics of IPTV network lags behind. Filling this gap requires in-depth analysis of large amounts of measurement data across the IPTV network. One type of the data of particular interest is device or system log, which has not been systematically studied before. In this dissertation, we will explore the possibility of utilizing system logs to serve a wide range of IPTV network management purposes including health monitoring, troubleshooting and performance evaluation, etc. In particular, we develop a tool to convert raw router syslogs to meaningful network events. In addition, by analyzing set-top box (STB) logs, we propose a series of models to capture both channel popularity and dynamics, and users' activity on the IPTV network.Ph.D.Committee Chair: Jun Xu; Committee Member: Jia Wang; Committee Member: Mostafa H. Ammar; Committee Member: Nick Feamster; Committee Member: Xiaoli M
- …