344 research outputs found

    Taming Wild High Dimensional Text Data with a Fuzzy Lash

    Full text link
    The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

    Fuzzy clustering with volume prototypes and adaptive cluster merging

    Get PDF
    Two extensions to the objective function-based fuzzy clustering are proposed. First, the (point) prototypes are extended to hypervolumes, whose size can be fixed or can be determined automatically from the data being clustered. It is shown that clustering with hypervolume prototypes can be formulated as the minimization of an objective function. Second, a heuristic cluster merging step is introduced where the similarity among the clusters is assessed during optimization. Starting with an overestimation of the number of clusters in the data, similar clusters are merged in order to obtain a suitable partitioning. An adaptive threshold for merging is proposed. The extensions proposed are applied to Gustafson–Kessel and fuzzy c-means algorithms, and the resulting extended algorithm is given. The properties of the new algorithm are illustrated by various examples

    Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

    Full text link
    In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP

    Forecasting bus passenger flows by using a clustering-based support vector regression approach

    Get PDF
    As a significant component of the intelligent transportation system, forecasting bus passenger flows plays a key role in resource allocation, network planning, and frequency setting. However, it remains challenging to recognize high fluctuations, nonlinearity, and periodicity of bus passenger flows due to varied destinations and departure times. For this reason, a novel forecasting model named as affinity propagation-based support vector regression (AP-SVR) is proposed based on clustering and nonlinear simulation. For the addressed approach, a clustering algorithm is first used to generate clustering-based intervals. A support vector regression (SVR) is then exploited to forecast the passenger flow for each cluster, with the use of particle swarm optimization (PSO) for obtaining the optimized parameters. Finally, the prediction results of the SVR are rearranged by chronological order rearrangement. The proposed model is tested using real bus passenger data from a bus line over four months. Experimental results demonstrate that the proposed model performs better than other peer models in terms of absolute percentage error and mean absolute percentage error. It is recommended that the deterministic clustering technique with stable cluster results (AP) can improve the forecasting performance significantly.info:eu-repo/semantics/publishedVersio

    Adaptive fuzzy system for 3-D vision

    Get PDF
    An adaptive fuzzy system using the concept of the Adaptive Resonance Theory (ART) type neural network architecture and incorporating fuzzy c-means (FCM) system equations for reclassification of cluster centers was developed. The Adaptive Fuzzy Leader Clustering (AFLC) architecture is a hybrid neural-fuzzy system which learns on-line in a stable and efficient manner. The system uses a control structure similar to that found in the Adaptive Resonance Theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two stage process; a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from Fuzzy c-Means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The performance of the AFLC algorithm is presented through application of the algorithm to the Anderson Iris data, and laser-luminescent fingerprint image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. The hybrid neuro-fuzzy AFLC algorithm will enhance analysis of a number of difficult recognition and control problems involved with Tethered Satellite Systems and on-orbit space shuttle attitude controller

    Інформаційна технологія підвищення швидкодії автоматичного захисту компресора від помпажу на основі злиття даних (data fusion)

    Get PDF
    Обґрунтовано доцільність застосування новітніх методів злиття даних для вирішення науково-практичної проблеми автоматичного захисту компресора від помпажу, як явища, яке приводить до втрати працездатності газоперекачувального агрегату і має складний нелінійний характер. Доведено, що застосування інформаційної технології багатопараметричного злиття даних покращує продуктивність системи оцінювання поточних значень в декількох напрямках, таких як швидкодія, точність, надійність і робастність

    East Asian Currency Area: A Fuzzy Clustering Analysis of Homogeneity

    Get PDF
    This paper attempts to examine the degree of homogeneity and the plausibility of a currency union in East Asia from the perspective of multiple OCA criteria, using the technique of fuzzy clustering analysis. The question of homogeneity is obviously of importance to the smooth formation and operation of the prospective currency union. We find that East Asia has not been sufficiently homogeneous and can be divided into about four groups with significant degree of fuzziness. We find no notable trend of convergence from the data. In fact, East Asian has appeared to be more diverged since the onset of the regional financial crisis. Thus, we suspect the possibility of forming a currency union in East Asia in the near future.Fuzzy Clustering Analysis; East Asia; Currency Area
    corecore