97,790 research outputs found

    Hybrid multicriteria fuzzy classification of network traffic patterns, anomalies, and protocols

    Get PDF
    © 2017, Springer-Verlag London Ltd., part of Springer Nature. Traffic classification in computer networks has very significant roles in network operation, management, and security. Examples include controlling the flow of information, allocating resources effectively, provisioning quality of service, detecting intrusions, and blocking malicious and unauthorized access. This problem has attracted a growing attention over years and a number of techniques have been proposed ranging from traditional port-based and payload inspection of TCP/IP packets to supervised, unsupervised, and semi-supervised machine learning paradigms. With the increasing complexity of network environments and support for emerging mobility services and applications, more robust and accurate techniques need to be investigated. In this paper, we propose a new supervised hybrid machine-learning approach for ubiquitous traffic classification based on multicriteria fuzzy decision trees with attribute selection. Moreover, our approach can handle well the imbalanced datasets and zero-day applications (i.e., those without previously known traffic patterns). Evaluating the proposed methodology on several benchmark real-world traffic datasets of different nature demonstrated its capability to effectively discriminate a variety of traffic patterns, anomalies, and protocols for unencrypted and encrypted traffic flows. Comparing with other methods, the performance of the proposed methodology showed remarkably better classification accuracy

    K2 Variable Catalogue II: Machine Learning Classification of Variable Stars and Eclipsing Binaries in K2 Fields 0-4

    Get PDF
    We are entering an era of unprecedented quantities of data from current and planned survey telescopes. To maximise the potential of such surveys, automated data analysis techniques are required. Here we implement a new methodology for variable star classification, through the combination of Kohonen Self Organising Maps (SOM, an unsupervised machine learning algorithm) and the more common Random Forest (RF) supervised machine learning technique. We apply this method to data from the K2 mission fields 0-4, finding 154 ab-type RR Lyraes (10 newly discovered), 377 Delta Scuti pulsators, 133 Gamma Doradus pulsators, 183 detached eclipsing binaries, 290 semi-detached or contact eclipsing binaries and 9399 other periodic (mostly spot-modulated) sources, once class significance cuts are taken into account. We present lightcurve features for all K2 stellar targets, including their three strongest detected frequencies, which can be used to study stellar rotation periods where the observed variability arises from spot modulation. The resulting catalogue of variable stars, classes, and associated data features are made available online. We publish our SOM code in Python as part of the open source PyMVPA package, which in combination with already available RF modules can be easily used to recreate the method.Comment: Accepted for publication in MNRAS, 16 pages, 13 figures. Updated with proof corrections. Full catalogue tables available at https://www2.warwick.ac.uk/fac/sci/physics/research/astro/people/armstrong/ or at the CD

    Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

    Get PDF
    Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR'16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further

    A Very Brief Introduction to Machine Learning With Applications to Communication Systems

    Get PDF
    Given the unprecedented availability of data and computing resources, there is widespread renewed interest in applying data-driven machine learning methods to problems for which the development of conventional engineering solutions is challenged by modelling or algorithmic deficiencies. This tutorial-style paper starts by addressing the questions of why and when such techniques can be useful. It then provides a high-level introduction to the basics of supervised and unsupervised learning. For both supervised and unsupervised learning, exemplifying applications to communication networks are discussed by distinguishing tasks carried out at the edge and at the cloud segments of the network at different layers of the protocol stack

    Enhanced Industrial Machinery Condition Monitoring Methodology based on Novelty Detection and Multi-Modal Analysis

    Get PDF
    This paper presents a condition-based monitoring methodology based on novelty detection applied to industrial machinery. The proposed approach includes both, the classical classification of multiple a priori known scenarios, and the innovative detection capability of new operating modes not previously available. The development of condition-based monitoring methodologies considering the isolation capabilities of unexpected scenarios represents, nowadays, a trending topic able to answer the demanding requirements of the future industrial processes monitoring systems. First, the method is based on the temporal segmentation of the available physical magnitudes, and the estimation of a set of time-based statistical features. Then, a double feature reduction stage based on Principal Component Analysis and Linear Discriminant Analysis is applied in order to optimize the classification and novelty detection performances. The posterior combination of a Feed-forward Neural Network and One-Class Support Vector Machine allows the proper interpretation of known and unknown operating conditions. The effectiveness of this novel condition monitoring scheme has been verified by experimental results obtained from an automotive industry machine.Postprint (published version

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

    Data-driven design of intelligent wireless networks: an overview and tutorial

    Get PDF
    Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves
    corecore