Search CORE

4 research outputs found

Evaluation of a hierarchical taxonomy preparation method for document classification

Author: Balke Wolf-Tilo
Gracia Hernández Sara
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2013
Field of study

El objetivo del proyecto es la creación de una taxonomía jerárquica de forma automática para la categorización de textos matemáticos haciendo uso de coclustering basado en el algoritmo "Consistent Bipartite Spectral Graph Copartitioning". Una vez creada la taxonomía habrá que evaluar sus prestaciones analizando los resultados con datos reales pertenecientes a una biblioteca digital matemátic

Repositorio Universidad de Zaragoza

Efficient Mining of Heterogeneous Star-Structured Data

Author: Rege Manjeet
Yu Qi
Publication venue: RIT Scholar Works
Publication date: 01/01/2008
Field of study

Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

CiteSeerX

RIT Scholar Works

Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

Author: Bin Gao
Guang Feng
Qian-sheng Cheng
Tao Qin
Tie-yan Liu
Wei-ying Ma
Publication venue
Publication date: 01/01/2005
Field of study

Multiclass classification has been investigated for many years in the literature. Recently, the scales of real-world multiclass classification applications have become larger and larger. For example, there are hundreds of thousands of categories employed in the Open Directory Project (ODP) and the Yahoo! directory. In such cases, the scalability of classification methods turns out to be a major concern. To tackle this problem, hierarchical classification is proposed and widely adopted to get better trade-off between effectiveness and efficiency. Unfortunately, many data sets are not explicitly organized in hierarchical forms and, therefore, hierarchical classification cannot be used directly. In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. In particular, we first compute matrices to represent the relations among categories, documents, and terms. And, then, we cocluster the three substances at different scales through consistent bipartite spectral graph copartitioning, which is formulated as a generalized singular value decomposition problem. At last, a hierarchical taxonomy is constructed from the category clusters. Our experiments showed that the proposed algorithm could discover very reasonable taxonomy hierarchy and help improve the classification accuracy

CiteSeerX

Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning

Author: Bin Gao
Guang Feng
Qian-Sheng Cheng
Tao Qin
Tie-Yan Liu
Wei-Ying Ma
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref