131,149 research outputs found

    Scalable Mining of Common Routes in Mobile Communication Network Traffic Data

    Get PDF
    A probabilistic method for inferring common routes from mobile communication network traffic data is presented. Besides providing mobility information, valuable in a multitude of application areas, the method has the dual purpose of enabling efficient coarse-graining as well as anonymisation by mapping individual sequences onto common routes. The approach is to represent spatial trajectories by Cell ID sequences that are grouped into routes using locality-sensitive hashing and graph clustering. The method is demonstrated to be scalable, and to accurately group sequences using an evaluation set of GPS tagged data

    NEMICO: Mining network data through cloud-based data mining techniques

    Get PDF
    Thanks to the rapid advances in Internet-based applications, data acquisition and storage technologies, petabyte-sized network data collections are becoming more and more common, thus prompting the need for scalable data analysis solutions. By leveraging today’s ubiquitous many-core computer architectures and the increasingly popular cloud computing paradigm, the applicability of data mining algorithms to these large volumes of network data can be scaled up to gain interesting insights. This paper proposes NEMICO, a comprehensive Big Data mining system targeted to network traffic flow analyses (e.g., traffic flow characterization, anomaly detection, multiplelevel pattern mining). NEMICO comprises new approaches that contribute to a paradigm-shift in distributed data mining by addressing most challenging issues related to Big Data, such as data sparsity, horizontal scaling, and parallel computation

    Data mining as a tool for environmental scientists

    Get PDF
    Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

    An experimental study on network intrusion detection systems

    Get PDF
    A signature database is the key component of an elaborate intrusion detection system. The efficiency of signature generation for an intrusion detection system is a crucial requirement because of the rapid appearance of new attacks on the World Wide Web. However, in the commercial applications, signature generation is still a manual process, which requires professional skills and heavy human effort. Knowledge Discovery and Data Mining methods may be a solution to this problem. Data Mining and Machine Learning algorithms can be applied to the network traffic databases, in order to automatically generate signatures. The purpose of this thesis and the work related to it is to construct a feasible architecture for building a database of network traffic data. This database can then be used to generate signatures automatically. This goal is achieved using network traffic data captured on the data communication network at the New Jersey Institute of Technology (NJIT)

    A traffic classification method using machine learning algorithm

    Get PDF
    Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

    Incremental High Throughput Network Traffic Classifier

    Get PDF
    Today’s network traffic are dynamic and fast. Con-ventional network traffic classification based on flow feature and data mining are not able to process traffic efficiently. Hardware based network traffic classifier is needed to be adaptable to dynamic network state and to provide accurate and updated classification at high speed. In this paper, a hardware architecture of online incremental semi-supervised algorithm is proposed. The hardware architecture is designed such that it is suitable to be incorporated in NetFPGA reference switch design. The experimental results on real datasets show that with only 10% of labeled data, the proposed architecture can perform online classification of network traffic at 1Gbps bitrate with 91% average accuracy without loosing any flows

    Network Digest analysis by means of association rules

    Get PDF
    The continuous growth in connection speed allows huge amounts of data to be transferred through a network. An important issue in this context is network traffic analysis to profile communications and detect security threats. Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., network traffic characterization). However, to discover (potentially relevant) knowledge a very low support threshold needs to be enforced hence generating a large number of unmanageable rules. To address this issue in network traffic analysis, an efficient technique to reduce traffic volume is needed. This paper presents a NEtwork Digest framework, which performs network traffic analysis by means of data mining techniques to characterize traffic data and detect anomalies. NED exploits continuous queries to efficiently perform realtime aggregation of captured network data and supports filtering operations to further reduce traffic volume focusing on relevant data. Furthermore, NED provides an efficient algorithm to perform refinement analysis by means of association rules to discover traffic features. Extracted rules allow traffic data characterization in terms of correlation and recurrence of feature patterns. Preliminary experimental results performed on different network dumps showed the efficiency and effectiveness of the NED framework to characterize traffic data
    • 

    corecore