20 research outputs found

    A Novel Fuzzy c -Means Clustering Algorithm Using Adaptive Norm

    Get PDF
    Abstract(#br)The fuzzy c -means (FCM) clustering algorithm is an unsupervised learning method that has been widely applied to cluster unlabeled data automatically instead of artificially, but is sensitive to noisy observations due to its inappropriate treatment of noise in the data. In this paper, a novel method considering noise intelligently based on the existing FCM approach, called adaptive-FCM and its extended version (adaptive-REFCM) in combination with relative entropy, are proposed. Adaptive-FCM, relying on an inventive integration of the adaptive norm, benefits from a robust overall structure. Adaptive-REFCM further integrates the properties of the relative entropy and normalized distance to preserve the global details of the dataset. Several experiments are carried out,..

    Image Segmentation using Rough Set based Fuzzy K-Means Algorithm

    Get PDF
    Image segmentation is critical for many computer vision and information retrieval systems and has received significant attention from industry and academia over last three decades Despite notable advances in the area there is no standard technique for selecting a segmentation algorithm to use in a particular application nor even is there an agreed upon means of comparing the performance of one method with another This paper explores Rough-Fuzzy K-means RFKM algorithm a new intelligent technique used to discover data dependencies data reduction approximate set classification and rule induction from image databases Rough sets offer an effective approach of managing uncertainties and also used for image segmentation feature identification dimensionality reduction and pattern classification The proposed algorithm is based on a modified K-means clustering using rough set theory RFKM for image segmentation which is further divided into two parts Primarily the cluster centers are determined and then in the next phase they are reduced using Rough set theory RST K-means clustering algorithm is then applied on the reduced and optimized set of cluster centers with the purpose of segmentation of the images The existing clustering algorithms require initialization of cluster centers whereas the proposed scheme does not require any such prior information to partition the exact regions Experimental results show that the proposed method perform well and improve the segmentation results in the vague areas of the imag

    Correlation of Climate Variability and Burned Area in Borneo using Clustering Methods

    Get PDF
    The island of Borneo has faced seasonal forest fires for decades. This phenomenon is worsening during dry seasons, especially when droughts are concurrent with the El Niño-Southern Oscillation (ENSO) phenomenon. Climate is therefore one of the drivers of the fire phenomenon. This paper studies the relationship between climate variables, namely temperature, precipitation, relative humidity, and wind speed, and the occurrence of forest fire using two clustering methods, K-means and Fuzzy C-means (FCM) clustering methods. Borneo is clustered into four areas based on burned area data obtained from Global Fire Emission Data (GFED). It is also clustered according to the combinations of climate variables. Both methods reach the highest correlation between the climate variable and the burned area clusters in September. The K-means method gives a correlation of -0.54 while the FCM gives -0.55. In August until October, relative humidity provides the dominant correlation affecting burned area, even though an additional precipitation or wind variable slightly increases the correlation in the FCM method. In November, temperature largely contributed to the burned area by a positive correlation of 0.31 in K-means and 0.33 in FCM. The evaluation performance of the methods is conducted by an internal validation called the Silhouette index. Both methods have positive index values ranging from 0.39 to 0.69 and the maximum value is influenced by the wind cluster. This indicates that the clustering methods applied in this paper can identify one or a combination of climate variables into dense and well-separated clusters

    Fuzzy clustering with entropy regularization for interval-valued data with an application to scientific journal citations

    Get PDF
    In recent years, the research of statistical methods to analyze complex structures of data has increased. In particular, a lot of attention has been focused on the interval-valued data. In a classical cluster analysis framework, an interesting line of research has focused on the clustering of interval-valued data based on fuzzy approaches. Following the partitioning around medoids fuzzy approach research line, a new fuzzy clustering model for interval-valued data is suggested. In particular, we propose a new model based on the use of the entropy as a regularization function in the fuzzy clustering criterion. The model uses a robust weighted dissimilarity measure to smooth noisy data and weigh the center and radius components of the interval-valued data, respectively. To show the good performances of the proposed clustering model, we provide a simulation study and an application to the clustering of scientific journals in research evaluation

    A survey on feature weighting based K-Means algorithms

    Get PDF
    This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

    Robust DTW-based entropy fuzzy clustering of time series

    Get PDF
    Time series are complex data objects whose partitioning into homogeneous groups is still a challenging task, especially in the presence of outliers or noisy data. To address the problem of robustness against outliers in clustering techniques, this paper proposes a robust fuzzy C-medoids method based on entropy regularization. In-depth, we use an appropriate exponential transformation of the dissimilarity based on Dynamic Time Warping, which can be computed also for time series of different length. In addition, the fuzzy framework provides the necessary flexibility to cope with the complexity of the features space. It allows a time series to be assigned to more than one group, considering potential switching behaviours. Moreover, the use of a medoids-based approach enables the identification of observed representative objects within the dataset, thus enhancing interpretability for practical applications. Through an extensive simulation study, we successfully demonstrate the effectiveness of our proposal, comparing and emphasizing its strengths. Finally, our proposed methodology is applied to the daily mean concentrations of three air pollutants in 2022 in the Province of Rome. This application highlights its potential, namely the capability to intercept outliers and switching time series while preserving group structures
    corecore