811 research outputs found

    Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

    Get PDF
    Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history of growth given only the network as it exists today and a generative model by which the network is believed to have evolved. Our likelihood-based method finds a probable previous state of the network by reversing the forward growth model. This approach retains node identities so that the history of individual nodes can be tracked. We apply these algorithms to uncover older, non-extant biological and social networks believed to have grown via several models, including duplication-mutation with complementarity, forest fire, and preferential attachment. Through experiments on both synthetic and real-world data, we find that our algorithms can estimate node arrival times, identify anchor nodes from which new nodes copy links, and can reveal significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Situation recognition using soft computing techniques

    Get PDF
    Includes bibliographical references.The last decades have witnessed the emergence of a large number of devices pervasively launched into our daily lives as systems producing and collecting data from a variety of information sources to provide different services to different users via a variety of applications. These include infrastructure management, business process monitoring, crisis management and many other system-monitoring activities. Being processed in real-time, these information production/collection activities raise an interest for live performance monitoring, analysis and reporting, and call for data-mining methods in the recognition, prediction, reasoning and controlling of the performance of these systems by controlling changes in the system and/or deviations from normal operation. In recent years, soft computing methods and algorithms have been applied to data mining to identify patterns and provide new insight into data. This thesis revisits the issue of situation recognition for systems producing massive datasets by assessing the relevance of using soft computing techniques for finding hidden pattern in these systems

    Normalization Techniques For Improving The Performance Of Knowledge Graph Creation Pipelines

    Get PDF
    With the rapid growth of data within the web, demands on discovering information within data and consecutively exploiting knowledge graphs rise much more than we think it does. Data integration systems can be of great help to meet this precious demand in that they offer transformation of data from various sources and with different volumes. To this end, a data integration system takes advantage of utilizing mapping rules-- specified in a language like RML -- to integrate data collected from various data sources into a knowledge graph. However, large data sources may suffer from various data quality issues, being redundant one of them. Regarding this, the Semantic Web community contributes to Knowledge Engineering with techniques to create a knowledge graph efficiently. The thesis reported in this document tackles creating knowledge graphs in the presence of data sources with redundant data, and a novel normalization theory is proposed to solve this problem. This theory covers not only the characteristics of the data sources but also mapping rules used to integrate the data sources into a knowledge graph. Based on this, three normal forms are proposed and an algorithm for transforming mapping rules and data sources into these normal forms. The proposed approach's performance is evaluated in different testbeds composed of real-world data and synthetic data. The observed results suggest that the proposed techniques can dramatically reduce the execution time of knowledge graph creation. Therefore, this thesis's normalization theory contributes to the repertoire of tools that facilitate the creation of knowledge graphs at scale

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Using artificial immune system and fuzzy logic for alert correlation

    Get PDF
    One of the most important challenges facing the intrusion detection systems (IDSs) is the huge number of generated alerts. A system administrator will be overwhelmed by these alerts in such a way that she/he cannot manage and use the alerts. The best-known solution is to correlate low-level alerts into a higher level attack and then produce a high-level alert for them. In this paper a new automated alert correlation approach is presented. It employs Fuzzy Logic and Artificial Immune System (AIS) to discover and learn the degree of correlation between two alerts and uses this knowledge to extract the attack scenarios. The proposed system doesn't need vast domain knowledge or rule definition efforts. To correlate each new alert with previous alerts, the system first tries to find the correlation probability based on its fuzzy rules. Then, if there is no matching rule with the required matching threshold, it uses the AIRS algorithm. The system is evaluated using DARPA 2000 dataset and a netForensics honeynet data. The completeness, soundness and false alert rate are calculated. The average completeness for LLDoS1.0 and LLDoS2.0, are 0.957 and 0.745 respectively. The system generates the attack graphs with an acceptable accuracy and, the computational complexity of the probability assignment algorithm is linear

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    State of the Art Intrusion Detection System for Cloud Computing

    Get PDF
    The term Cloud computing is not new anymore in computing technology. This form of computing technology previously considered only as marketing term, but today Cloud computing not only provides innovative improvements in resource utilisation but it also creates a new opportunities in data protection mechanisms where the advancement of intrusion detection technologies  are blooming rapidly. From the perspective of security, Cloud computing also introduces concerns about data protection and intrusion detection mechanism. This paper surveys, explores and informs researchers about the latest developed Cloud Intrusion Detection Systems by providing a comprehensive taxonomy and investigating possible solutions to detect intrusions in cloud computing systems. As a result, we provide a comprehensive review of Cloud Intrusion Detection System research, while highlighting the specific properties of Cloud Intrusion Detection System. We also present taxonomy on the key issues in Cloud Intrusion Detection System area and discuss the different approaches taken to solve the issues. We conclude the paper with a critical analysis of challenges that have not fully solved
    • …
    corecore