9,192 research outputs found

    A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering

    Get PDF
    The purpose of this paper is to predict failures based on textual sequence data. The current failure prediction is mainly based on structured data. However, there are many unstructured data in aircraft maintenance. The failure mentioned here refers to failure types, such as transmitter failure and signal failure, which are classified by the clustering algorithm based on the failure text. For the failure text, this paper uses the natural language processing technology. Firstly, segmentation and the removal of stop words for Chinese failure text data is performed. The study applies the word2vec moving distance model to obtain the failure occurrence sequence for failure texts collected in a fixed period of time. According to the distance, a clustering algorithm is used to obtain a typical number of fault types. Secondly, the failure occurrence sequence is mined using sequence mining algorithms, such as-PrefixSpan. Finally, the above failure sequence is used to train the Bayesian failure network model. The final experimental results show that the Bayesian failure network has higher accuracy for failure prediction

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    A network approach to topic models

    Full text link
    One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io

    Data Mining in Internet of Things Systems: A Literature Review

    Get PDF
    The Internet of Things (IoT) and cloud technologies have been the main focus of recent research, allowing for the accumulation of a vast amount of data generated from this diverse environment. These data include without any doubt priceless knowledge if could correctly discovered and correlated in an efficient manner. Data mining algorithms can be applied to the Internet of Things (IoT) to extract hidden information from the massive amounts of data that are generated by IoT and are thought to have high business value. In this paper, the most important data mining approaches covering classification, clustering, association analysis, time series analysis, and outlier analysis from the knowledge will be covered. Additionally, a survey of recent work in in this direction is included. Another significant challenges in the field are collecting, storing, and managing the large number of devices along with their associated features. In this paper, a deep look on the data mining for the IoT platforms will be given concentrating on real applications found in the literatur

    Review and Analysis of Failure Detection and Prevention Techniques in IT Infrastructure Monitoring

    Get PDF
    Maintaining the health of IT infrastructure components for improved reliability and availability is a research and innovation topic for many years. Identification and handling of failures are crucial and challenging due to the complexity of IT infrastructure. System logs are the primary source of information to diagnose and fix failures. In this work, we address three essential research dimensions about failures, such as the need for failure handling in IT infrastructure, understanding the contribution of system-generated log in failure detection and reactive & proactive approaches used to deal with failure situations. This study performs a comprehensive analysis of existing literature by considering three prominent aspects as log preprocessing, anomaly & failure detection, and failure prevention. With this coherent review, we (1) presume the need for IT infrastructure monitoring to avoid downtime, (2) examine the three types of approaches for anomaly and failure detection such as a rule-based, correlation method and classification, and (3) fabricate the recommendations for researchers on further research guidelines. As far as the authors\u27 knowledge, this is the first comprehensive literature review on IT infrastructure monitoring techniques. The review has been conducted with the help of meta-analysis and comparative study of machine learning and deep learning techniques. This work aims to outline significant research gaps in the area of IT infrastructure failure detection. This work will help future researchers understand the advantages and limitations of current methods and select an adequate approach to their problem

    A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

    Get PDF
    It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators
    • …
    corecore