71,278 research outputs found
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
Dealing with uncertain entities in ontology alignment using rough sets
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision
- …