5,899 research outputs found
Semantic distillation: a method for clustering objects by their contextual specificity
Techniques for data-mining, latent semantic analysis, contextual search of
databases, etc. have long ago been developed by computer scientists working on
information retrieval (IR). Experimental scientists, from all disciplines,
having to analyse large collections of raw experimental data (astronomical,
physical, biological, etc.) have developed powerful methods for their
statistical analysis and for clustering, categorising, and classifying objects.
Finally, physicists have developed a theory of quantum measurement, unifying
the logical, algebraic, and probabilistic aspects of queries into a single
formalism. The purpose of this paper is twofold: first to show that when
formulated at an abstract level, problems from IR, from statistical data
analysis, and from physical measurement theories are very similar and hence can
profitably be cross-fertilised, and, secondly, to propose a novel method of
fuzzy hierarchical clustering, termed \textit{semantic distillation} --
strongly inspired from the theory of quantum measurement --, we developed to
analyse raw data coming from various types of experiments on DNA arrays. We
illustrate the method by analysing DNA arrays experiments and clustering the
genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence,
Springer-Verla
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
View subspaces for indexing and retrieval of 3D models
View-based indexing schemes for 3D object retrieval are gaining popularity
since they provide good retrieval results. These schemes are coherent with the
theory that humans recognize objects based on their 2D appearances. The
viewbased techniques also allow users to search with various queries such as
binary images, range images and even 2D sketches. The previous view-based
techniques use classical 2D shape descriptors such as Fourier invariants,
Zernike moments, Scale Invariant Feature Transform-based local features and 2D
Digital Fourier Transform coefficients. These methods describe each object
independent of others. In this work, we explore data driven subspace models,
such as Principal Component Analysis, Independent Component Analysis and
Nonnegative Matrix Factorization to describe the shape information of the
views. We treat the depth images obtained from various points of the view
sphere as 2D intensity images and train a subspace to extract the inherent
structure of the views within a database. We also show the benefit of
categorizing shapes according to their eigenvalue spread. Both the shape
categorization and data-driven feature set conjectures are tested on the PSB
database and compared with the competitor view-based 3D shape retrieval
algorithmsComment: Three-Dimensional Image Processing (3DIP) and Applications
(Proceedings Volume) Proceedings of SPIE Volume: 7526 Editor(s): Atilla M.
Baskurt ISBN: 9780819479198 Date: 2 February 201
Multi-mode partitioning for text clustering to reduce dimensionality and noises
Co-clustering in text mining has been proposed to partition words and documents simultaneously. Although the
main advantage of this approach may improve interpretation of clusters on the data, there are still few proposals
on these methods; while one-way partition is even now widely utilized for information retrieval. In contrast to
structured information, textual data suffer of high dimensionality and sparse matrices, so it is strictly necessary
to pre-process texts for applying clustering techniques. In this paper, we propose a new procedure to reduce high
dimensionality of corpora and to remove the noises from the unstructured data. We test two different processes
to treat data applying two co-clustering algorithms; based on the results we present the procedure that provides
the best interpretation of the data
- …