41,523 research outputs found

    XML documents clustering using a tensor space model

    Get PDF
    The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information

    3D medical volume segmentation using hybrid multiresolution statistical approaches

    Get PDF
    This article is available through the Brunel Open Access Publishing Fund. Copyright © 2010 S AlZu’bi and A Amira.3D volume segmentation is the process of partitioning voxels into 3D regions (subvolumes) that represent meaningful physical entities which are more meaningful and easier to analyze and usable in future applications. Multiresolution Analysis (MRA) enables the preservation of an image according to certain levels of resolution or blurring. Because of multiresolution quality, wavelets have been deployed in image compression, denoising, and classification. This paper focuses on the implementation of efficient medical volume segmentation techniques. Multiresolution analysis including 3D wavelet and ridgelet has been used for feature extraction which can be modeled using Hidden Markov Models (HMMs) to segment the volume slices. A comparison study has been carried out to evaluate 2D and 3D techniques which reveals that 3D methodologies can accurately detect the Region Of Interest (ROI). Automatic segmentation has been achieved using HMMs where the ROI is detected accurately but suffers a long computation time for its calculations

    Fast Robust PCA on Graphs

    Get PDF
    Mining useful clusters from high dimensional data has received significant attention of the computer vision and pattern recognition community in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of dimensionality. However, often such methods are accompanied with three different problems: high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix factorization methods) and susceptibility to gross corruptions in the data. In this paper we propose a principal component analysis (PCA) based solution that overcomes these three issues and approximates a low-rank recovery method for high dimensional datasets. We target the low-rank recovery by enforcing two types of graph smoothness assumptions, one on the data samples and the other on the features by designing a convex optimization problem. The resulting algorithm is fast, efficient and scalable for huge datasets with O(nlog(n)) computational complexity in the number of data samples. It is also robust to gross corruptions in the dataset as well as to the model parameters. Clustering experiments on 7 benchmark datasets with different types of corruptions and background separation experiments on 3 video datasets show that our proposed model outperforms 10 state-of-the-art dimensionality reduction models. Our theoretical analysis proves that the proposed model is able to recover approximate low-rank representations with a bounded error for clusterable data

    Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization

    Full text link
    This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approach guarantees the stationary point being used in deriving the equivalence is located on the feasible region in the nonnegative orthant. Additionally, since clustering capability of a matrix decomposition technique can sometimes imply its latent semantic indexing (LSI) aspect, we will also evaluate LSI aspect of the NMF by showing its capability in solving the synonymy and polysemy problems in synthetic datasets. And more extensive evaluation will be conducted by comparing LSI performances of the NMF and the singular value decomposition (SVD), the standard LSI method, using some standard datasets.Comment: 28 pages, 5 figure

    Mining and Analyzing the Italian Parliament: Party Structure and Evolution

    Full text link
    The roll calls of the Italian Parliament in the XVI legislature are studied by employing multidimensional scaling, hierarchical clustering, and network analysis. In order to detect changes in voting behavior, the roll calls have been divided in seven periods of six months each. All the methods employed pointed out an increasing fragmentation of the political parties endorsing the previous government that culminated in its downfall. By using the concept of modularity at different resolution levels, we identify the community structure of Parliament and its evolution in each of the considered time periods. The analysis performed revealed as a valuable tool in detecting trends and drifts of Parliamentarians. It showed its effectiveness at identifying political parties and at providing insights on the temporal evolution of groups and their cohesiveness, without having at disposal any knowledge about political membership of Representatives.Comment: 27 pages, 14 figure
    • 

    corecore