2,409 research outputs found

    Fuzzy C-Means Algorithm Based on Common Mahalanobis Distances

    Get PDF
    [[abstract]]Some of the well-known fuzzy clustering algorithms are based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters. However, GK algorithm needs added constraint of fuzzy covariance matrix, GK algorithm can only be used for the data with multivariate Gaussian distribution. A Fuzzy C-Means algorithm based on Mahalanobis distance (FCM-M) was proposed by our previous work to improve those limitations of GG and GK algorithms, but it is not stable enough when some of its covariance matrices are not equal. In this paper, A improved Fuzzy C-Means algorithm based on a Common Mahalanobis distance (FCM-CM) is proposed The experimental results of three real data sets show that the performance of our proposed FCM-CM algorithm is better than those of the FCM, GG, GK and FCM-M algorithms

    Computational fluids domain reduction to a simplified fluid network

    Get PDF
    The primary goal of this project is to demonstrate the practical use of data mining algorithms to cluster a solved steady-state computational fluids simulation (CFD) flow domain into a simplified lumped-parameter network. A commercial-quality code, “cfdMine” was created using a volume-weighted k-means clustering that that can accomplish the clustering of a 20 million cell CFD domain on a single CPU in several hours or less. Additionally agglomeration and k-means Mahalanobis were added as optional post-processing steps to further enhance the separation of the clusters. The resultant nodal network is considered a reduced-order model and can be solved transiently at a very minimal computational cost. The reduced order network is then instantiated in the commercial thermal solver MuSES to perform transient conjugate heat transfer using convection predicted using a lumped network (based on steady-state CFD). When inserting the lumped nodal network into a MuSES model, the potential for developing a “localized heat transfer coefficient” is shown to be an improvement over existing techniques. Also, it was found that the use of the clustering created a new flow visualization technique. Finally, fixing clusters near equipment newly demonstrates a capability to track temperatures near specific objects (such as equipment in vehicles)

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Indicators of Economic Crises : A Data-Driven Clustering Approach

    Get PDF
    The determination of reliable early-warning indicators of economic crises is a hot topic in economic sciences. Pinning down recurring patterns or combinations of macroeconomic indicators is indispensable for adequate policy adjustments to prevent a looming crisis. We investigate the ability of several macroeconomic variables telling crisis countries apart from non-crisis economies. We introduce a selfcalibrated clustering-algorithm, which accounts for both similarity and dissimilarity in macroeconomic fundamentals across countries. Furthermore, imposing a desired community structure, we allow the data to decide by itself, which combination of indicators would have most accurately foreseen the exogeneously defined network topology. We quantitatively evaluate the degree of matching between the data-generated clustering and the desired community-structure.info:eu-repo/semantics/publishedVersio

    Weighted Mahalanobis Distance for Hyper-Ellipsoidal Clustering

    Get PDF
    Cluster analysis is widely used in many applications, ranging from image and speech coding to pattern recognition. A new method that uses the weighted Mahalanobis distance (WMD) via the covariance matrix of the individual clusters as the basis for grouping is presented in this thesis. In this algorithm, the Mahalanobis distance is used as a measure of similarity between the samples in each cluster. This thesis discusses some difficulties associated with using the Mahalanobis distance in clustering. The proposed method provides solutions to these problems. The new algorithm is an approximation to the well-known expectation maximization (EM) procedure used to find the maximum likelihood estimates in a Gaussian mixture model. Unlike the EM procedure, WMD eliminates the requirement of having initial parameters such as the cluster means and variances as it starts from the raw data set. Properties of the new clustering method are presented by examining the clustering quality for codebooks designed with the proposed method and competing methods on a variety of data sets. The competing methods are the Linde-Buzo-Gray (LBG) algorithm and the Fuzzy c-means (FCM) algorithm, both of them use the Euclidean distance. The neural network for hyperellipsoidal clustering (HEC) that uses the Mahalnobis distance is also studied and compared to the WMD method and the other techniques as well. The new method provides better results than the competing methods. Thus, this method becomes another useful tool for use in clustering

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    Fuzzy Side Information Clustering-Based Framework for Effective Recommendations

    Get PDF
    Collaborative filtering (CF) is the most successful and widely implemented algorithm in the area of recommender systems (RSs). It generates recommendations using a set of user-product ratings by matching similarity between the profiles of different users. Computing similarity among user profiles efficiently in case of sparse data is the most crucial component of the CF technique. Data sparsity and accuracy are the two major issues associated with the classical CF approach. In this paper, we try to solve these issues using a novel approach based on the side information (user-product background content) and the Mahalanobis distance measure. The side information has been incorporated into RSs to further improve their performance, especially in the case of data sparsity. However, incorporation of side information into traditional two-dimensional recommender systems would increase the dimensionality and complexity of the system. Therefore, to alleviate the problem of dimensionality, we cluster users based on their side information using k-means clustering algorithm and each user's similarity is computed using the Mahalanobis distance method. Additionally, we use fuzzy sets to represent the side information more efficiently. Results of the experimentation with two benchmark datasets show that our framework improves the recommendations quality and predictive accuracy of both traditional and clustering-based collaborative recommendations

    Transformation-Based Fuzzy Rule Interpolation With Mahalanobis Distance Measures Supported by Choquet Integral

    Get PDF
    Fuzzy rule interpolation (FRI) strongly supports approximate inference when a new observation matches no rules, through selecting and subsequently interpolating appropriate rules close to the observation from the given (sparse) rule base. Traditional ways of implementing the critical rule selection process are typically based on the exploitation of Euclidean distances between the observation and rules. It is conceptually straightforward for implementation but applying this distance metric may systematically lead to inferior results because it fails to reflect the variations of the relevance or significance levels amongst different domain features. To address this important issue, a novel transformation-based FRI approach is presented, on the basis of utilising the Mahalanobis distance metric. The new FRI method works by transforming a given sparse rule base into a coordinates system where the distance between instances of the same category becomes closer while that between different categories becomes further apart. In so doing, when an observation is present that matches no rules, the most relevant neighbouring rules to implement the required interpolation are more likely to be selected. Following this, the scale and move factors within the classical transformation-based FRI procedure are also modified by Choquet integral. Systematic experimental investigation over a range of classification problems demonstrates that the proposed approach remarkably outperforms the existing state-of-the-art FRI methods in both accuracy and efficiency
    corecore