6 research outputs found

    Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

    Full text link
    The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations

    Dynamic Core Community Detection and Information Diffusion Processes on Networks

    Full text link
    Interest in network science has been increasingly shared among various research communities due to its broad range of applications. Many real world systems can be abstracted as networks, a group of nodes connected by pairwise edges, and examples include friendship networks, metabolic networks, and world wide web among others. Two of the main research areas in network science that have received a lot of focus are community detection and information diffusion. As for community detection, many well developed algorithms are available for such purposes in static networks, for example, spectral partitioning and modularity function based optimization algorithms. As real world data becomes richer, community detection in temporal networks becomes more and more desirable and algorithms such as tensor decomposition and generalized modularity function optimization are developed. One scenario not well investigated is when the core community structure persists over long periods of time with possible noisy perturbations and changes only over periods of small time intervals. The contribution of this thesis in this area is to propose a new algorithm based on low rank component recovery of adjacency matrices so as to identify the phase transition time points and improve the accuracy of core community structure recovery. As for information diffusion, traditionally it was studied using either threshold models or independent interaction models as an epidemic process. But information diffusion mechanism is different from epidemic process such as disease transmission because of the reluctance to tell stale news and to address this issue other models such as DK model was proposed taking into consideration of the reluctance of spreaders to diffuse the information as time goes by. However, this does not capture some cases such as the losing interest of information receivers as in viral marketing. The contribution of this thesis in this area is we proposed two new models coined susceptible-informed-immunized (SIM) model and exponentially time decaying susceptible-informed (SIT) model to successfully capture the intrinsic time value of information from both the spreader and receiver points of view. Rigorous analysis of the dynamics of the two models were performed based mainly on mean field theory. The third contribution of this thesis is on the information diffusion optimization. Controlling information diffusion has been widely studied because of its important applications in areas such as social census, disease control and marketing. Traditionally the problem is formulated as identifying the set of k seed nodes, informed initially, so as to maximize the diffusion size. Heuristic algorithms have been developed to find approximate solutions for this NP-hard problem, and measures such as k-shell, node degree and centrality have been used to facilitate the searching for optimal solutions. The contribution of this thesis in this field is to design a more realistic objective function and apply binary particle swarm optimization algorithm for this combinatorial optimization problem. Instead of fixating the seed nodes size and maximize the diffusion size, we maximize the profit defined as the revenue, which is simply the diffusion size, minus the cost of setting those seed nodes, which is designed as a function of degrees of the seed nodes or a measure that is similar to the centrality of nodes. Because of the powerful algorithm, we were able to study complex scenarios such as information diffusion optimization on multilayer networks.PHDPhysicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145937/1/wbao_1.pd
    corecore