6,333 research outputs found

    An unsupervised approach to feature discretization and selection

    Get PDF
    Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques

    Enhancing the Efficiency of Attack Detection System Using Feature selection and Feature Discretization Methods

    Get PDF
    Intrusion detection technologies have grown in popularity in recent years using machine learning. The variety of new security attacks are increasing, necessitating the development of effective and intelligent countermeasures. The existing intrusion detection system (IDS) uses Signature or Anomaly based detection systems with machine learning algorithms to detect malicious activities. The Signature-based detection rely only on signatures that have been pre-programmed into the systems, detect known attacks and cannot detect any new or unusual activity. The Anomaly based detection using supervised machine learning algorithm detects only known threats. To address this issue, the proposed model employs an unsupervised machine learning approach for detecting attacks. This approach combines the Sub Space Clustering and One Class Support Vector Machine algorithms and utilizes feature selection methods such as Chi-square, as well as Feature Discretization Methods like Equal Width Discretization to identify both known and undiscovered assaults. The results of the experiments using proposed model outperforms several of the existing system in terms of detection rate and accuracy and decrease in the computational time

    A traffic classification method using machine learning algorithm

    Get PDF
    Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

    Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare

    Full text link
    For the last years, time-series mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of time-series for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected situation. Moreover, complex applications may involve the temporal study of several heterogeneous parameters. In that paper, we propose a method for mining heterogeneous multivariate time-series for learning meaningful patterns. The proposed approach allows for mixed time-series -- containing both pattern and non-pattern data -- such as for imprecise matches, outliers, stretching and global translating of patterns instances in time. We present the early results of our approach in the context of monitoring the health status of a person at home. The purpose is to build a behavioral profile of a person by analyzing the time variations of several quantitative or qualitative parameters recorded through a provision of sensors installed in the home

    Model-based learning of local image features for unsupervised texture segmentation

    Full text link
    Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Surveying human habit modeling and mining techniques in smart spaces

    Get PDF
    A smart space is an environment, mainly equipped with Internet-of-Things (IoT) technologies, able to provide services to humans, helping them to perform daily tasks by monitoring the space and autonomously executing actions, giving suggestions and sending alarms. Approaches suggested in the literature may differ in terms of required facilities, possible applications, amount of human intervention required, ability to support multiple users at the same time adapting to changing needs. In this paper, we propose a Systematic Literature Review (SLR) that classifies most influential approaches in the area of smart spaces according to a set of dimensions identified by answering a set of research questions. These dimensions allow to choose a specific method or approach according to available sensors, amount of labeled data, need for visual analysis, requirements in terms of enactment and decision-making on the environment. Additionally, the paper identifies a set of challenges to be addressed by future research in the field
    • 

    corecore