1,807 research outputs found

    A traffic classification method using machine learning algorithm

    Get PDF
    Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

    Analysing similarity assessment in feature-vector case representations

    Get PDF
    Case-Based Reasoning (CBR) is a good technique to solve new problems based in previous experience. Main assumption in CBR relies in the hypothesis that similar problems should have similar solutions. CBR systems retrieve the most similar cases or experiences among those stored in the Case Base. Then, previous solutions given to these most similar past-solved cases can be adapted to fit new solutions for new cases or problems in a particular domain, instead of derive them from scratch. Thus, similarity measures are key elements in obtaining reliable similar cases, which will be used to derive solutions for new cases. This paper describes a comparative analysis of several commonly used similarity measures, including a measure previously developed by the authors, and a study on its performance in the CBR retrieval step for feature-vector case representations. The testing has been done using six-teen data sets from the UCI Machine Learning Database Repository, plus two complex environmental databases.Postprint (published version

    Towards Intelligent Assistance for a Data Mining Process:-

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a comprehensive demonstration of cost-sensitive classification using a more involved process and data from the 1998 KDDCUP competition.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    A Decision tree-based attribute weighting filter for naive Bayes

    Get PDF
    The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness--the assumption that attributes are independent given the class. All of them improve the performance of naĂŻve Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model

    Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype intelligent discovery assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition

    Intelligent Assistance for the Data Mining Process: An Ontology-based Approach

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings. We discuss how an IDA is an important tool for knowledge sharing among a team of data miners. Finally, we illustrate all the claims with a comprehensive demonstration using a more involved process and data from the 1998 KDDCUP competition.Information Systems Working Papers Serie

    Intelligent Assistance for the Data Mining Process: An Ontology-based Approach

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings. We discuss how an IDA is an important tool for knowledge sharing among a team of data miners. Finally, we illustrate all the claims with a comprehensive demonstration using a more involved process and data from the 1998 KDDCUP competition.Information Systems Working Papers Serie

    Meta-Analysis of Vaterite Secondary Data Revealed the Synthesis Conditions for Polymorphic Control

    Get PDF
    Acknowledgements This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Peer reviewedPostprin

    Data Masking, Encryption, and their Effect on Classification Performance: Trade-offs Between Data Security and Utility

    Get PDF
    As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed that the use of data masking and encryption can impact the quality of data mining results. This dissertation investigated and compared the causal effects of data masking and encryption on classification performance as a measure of the quality of knowledge discovery. A review of the literature found a gap in the body of knowledge, indicating that this problem had not been studied before in an experimental setting. The objective of this dissertation was to gain an understanding of the trade-offs between data security and utility in the field of analytics and data mining. The research used a nationally recognized cancer incidence database, to show how masking and encryption of potentially sensitive demographic attributes such as patients’ marital status, race/ethnicity, origin, and year of birth, could have a statistically significant impact on the patients’ predicted survival. Performance parameters measured by four different classifiers delivered sizable variations in the range of 9% to 10% between a control group, where the select attributes were untouched, and two experimental groups where the attributes were substituted or suppressed to simulate the effects of the data protection techniques. In practice, this represented a corroboration of the potential risk involved when basing medical treatment decisions using data mining applications where attributes in the data sets are masked or encrypted for patient privacy and security concerns
    • 

    corecore