6 research outputs found

    Unsupervised and semi-supervised fuzzy clustering with multiple kernels.

    Get PDF
    For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Recently, kernel-based clustering has been proposed to perform clustering in a higher-dimensional feature space spanned by embedding maps and corresponding kernel functions. Although good results were obtained using the Gaussian kernel function, its performance depends on the selection of the scaling parameter among an extensive range of possibilities. This step is often heavily influenced by prior knowledge about the data and by the patterns we expect to discover. Unfortunately, it is often unclear which kernels are more suitable for a particular task. The problem is aggravated for many real-world clustering applications, in which the distributions of the different clusters in the feature space exhibit large variations. Thus, in the absence of a priori knowledge, a single kernel selected from a predefined group is sometimes insufficient to represent the data. One way to learn optimal scaling parameters is through an exhaustive search of one optimal scaling parameter for each cluster. However, this approach is not practical since it is computationally expensive, especially when the data includes a large number of clusters and when the dynamic range of possible values of the scaling parameters is large. Moreover, the evaluation of the resulting partition in order to select the optimal parameters is not an easy task. To overcome the above drawbacks, we introduce two novel fuzzy clustering techniques that use Multiple Kernel Learning to provide an elegant solution for parameter selection. The Fuzzy C-Means with Multiple Kernels algorithm (FCMK) simultaneously finds the optimal partition and the cluster-dependent kernel combination weights that reflect the intrinsic structure of the data. The Relational Fuzzy Clustering with Multiple Kernels (RFCMK) learns the kernel combination weights by optimizing the relational dissimilarities. Consequently, the learned kernel combination weights reflect the relative density, size, and position of each cluster with respect to the other clusters. We also extended FCMK and RFCMK to the semi-supervised paradigms. We show that the incorporation of prior knowledge in the unsupervised clustering task in the form of a small set of constraints on which instances should or should not reside in the same cluster, guides the unsupervised approaches to a better partitioning of the data and avoid local minima, especially for high dimensional real world data. All of the proposed algorithms are optimized iteratively by dynamically updating the partition and the kernel combination weights in each iteration. This makes these algorithms simple and fast. Moreover, our algorithms are formulated to work on both vector and relational data. This makes them applicable to data where objects cannot be represented by vectors or when clusters of similar objects cannot be represented efficiently by a single prototype. We also introduced two relational fuzzy clustering with multiple kernel algorithms for large data to deal with the scalability issue of RFCMK. The random sample and extend RFCMK (rseRFCMK) computes cluster prototypes from a smaller sample of randomly selected objects, and then extends the partition to the remainder of the data. The single pass RFCMK (spRFCMK) sequentially loads manageable sized chunks, clustering the chunks in a single pass, and then combining the results from each chunk. Our extensive experiments show that RFCMK and SS-RFCMK outperform existing algorithms. In particular, we show that when data include clusters with various intrinsic structures and densities, learning kernel weights that vary over clusters is crucial in obtaining a good partition

    Hybrid intelligent approach for network intrusion detection

    Get PDF
    In recent years, computer networks are broadly used, and they have become very complicated. A lot of sensitive information passes through various kinds of computer devices, ranging from minicomputers to servers and mobile devices. These occurring changes have led to draw the conclusion that the number of attacks on important information over the network systems is increasing with every year. Intrusion is the main threat to the network. It is defined as a series of activities aimed for exposing the security of network systems in terms of confidentiality, integrity and availability, as a result; intrusion detection is extremely important as a part of the defense. Hence, there must be substantial improvement in network intrusion detection techniques and systems. Due to the prevailing limitations of finding novel attacks, high false detection, and accuracy in previous intrusion detection approaches, this study has proposed a hybrid intelligent approach for network intrusion detection based on k-means clustering algorithm and support vector machine classification algorithm. The aim of this study is to reduce the rate of false alarm and also to improve the detection rate, comparing with the existing intrusion detection approaches. In the present study, NSL-KDD intrusion dataset has been used for training and testing the proposed approach. In order to improve classification performance, some steps have been taken beforehand. The first one is about unifying the types and filtering the dataset by data transformation. Then, a features selection algorithm is applied to remove irrelevant and noisy features for the purpose of intrusion. Feature selection has decreased the features from 41 to 21 features for intrusion detection and later normalization method is employed to perform and reduce the differences among the data. Clustering is the last step of processing before classification has been performed, using k-means algorithm. Under the purpose of classification, support vector machine have been used. After training and testing the proposed hybrid intelligent approach, the results of performance evaluation have shown that the proposed network intrusion detection has achieved high accuracy and low false detection rate. The accuracy is 96.025 percent and the false alarm is 3.715 percent

    Bolstering user authentication: a kernel-based fuzzy-clustering model for typing dynamics

    Get PDF
    In most information systems today, static user authentication is accomplished when the user provides a credential (for example, user ID and the matching password). However, passwords appear to be the most insecure authentication method as they are vulnerable to attacks chiefly caused by poor password hygiene. We contend that an additional, non-intrusive level of security can be achieved by analyzing keystroke biometrics and coming up with a unique biometric template of a user\u27s typing pattern. The paper proposes a new model for representing raw keystroke data collected when analyzing typing biometrics. The model is based on fuzzy sets and kernel functions. The corresponding algorithm is developed. In the static authentication problem, our model demonstrated relatively higher performance than some classic anomaly-detection algorithms, such as Mahalanobis, Manhattan, nearest neighbor, outlier counting, neural network, and the support-vector machine

    Ensemble learning method for hidden markov models.

    Get PDF
    For complex classification systems, data are gathered from various sources and potentially have different representations. Thus, data may have large intra-class variations. In fact, modeling each data class with a single model might lead to poor generalization. The classification error can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is a need for building a classification system that takes into account the variations within each class in the data. This dissertation introduces an ensemble learning method for temporal data that uses a mixture of Hidden Markov Model (HMM) classifiers. We hypothesize that the data are generated by K models, each of which reacts a particular trend in the data. Model identification could be achieved through clustering in the feature space or in the parameters space. However, this approach is inappropriate in the context of sequential data. The proposed approach is based on clustering in the log-likelihood space, and has two main steps. First, one HMM is fit to each of the N individual sequences. For each fitted model, we evaluate the log-likelihood of each sequence. This will result in an N-by-N log-likelihood distance matrix that will be partitioned into K groups using a relational clustering algorithm. In the second step, we learn the parameters of one HMM per group. We propose using and optimizing various training approaches for the different K groups depending on their size and homogeneity. In particular, we investigate the maximum likelihood (ML), the minimum classification error (MCE) based discriminative, and the Variational Bayesian (VB) training approaches. Finally, to test a new sequence, its likelihood is computed in all the models and a final confidence value is assigned by combining the multiple models outputs using a decision level fusion method such as an artificial neural network or a hierarchical mixture of experts. Our approach was evaluated on two real-world applications: (1) identification of Cardio-Pulmonary Resuscitation (CPR) scenes in video simulating medical crises; and (2) landmine detection using Ground Penetrating Radar (GPR). Results on both applications show that the proposed method can identify meaningful and coherent HMM mixture components that describe different properties of the data. Each HMM mixture component models a group of data that share common attributes. The results indicate that the proposed method outperforms the baseline HMM that uses one model for each class in the data

    Fuzzy clustering with Multiple Kernels

    No full text

    Incremental fuzzy clustering with multiple kernels

    No full text
    corecore