5,853 research outputs found
Unsupervised Graph-based Rank Aggregation for Improved Retrieval
This paper presents a robust and comprehensive graph-based rank aggregation
approach, used to combine results of isolated ranker models in retrieval tasks.
The method follows an unsupervised scheme, which is independent of how the
isolated ranks are formulated. Our approach is able to combine arbitrary
models, defined in terms of different ranking criteria, such as those based on
textual, image or hybrid content representations.
We reformulate the ad-hoc retrieval problem as a document retrieval based on
fusion graphs, which we propose as a new unified representation model capable
of merging multiple ranks and expressing inter-relationships of retrieval
results automatically. By doing so, we claim that the retrieval system can
benefit from learning the manifold structure of datasets, thus leading to more
effective results. Another contribution is that our graph-based aggregation
formulation, unlike existing approaches, allows for encapsulating contextual
information encoded from multiple ranks, which can be directly used for
ranking, without further computations and post-processing steps over the
graphs. Based on the graphs, a novel similarity retrieval score is formulated
using an efficient computation of minimum common subgraphs. Finally, another
benefit over existing approaches is the absence of hyperparameters.
A comprehensive experimental evaluation was conducted considering diverse
well-known public datasets, composed of textual, image, and multimodal
documents. Performed experiments demonstrate that our method reaches top
performance, yielding better effectiveness scores than state-of-the-art
baseline methods and promoting large gains over the rankers being fused, thus
demonstrating the successful capability of the proposal in representing queries
based on a unified graph-based model of rank fusions
Identification of functionally related enzymes by learning-to-rank methods
Enzyme sequences and structures are routinely used in the biological sciences
as queries to search for functionally related enzymes in online databases. To
this end, one usually departs from some notion of similarity, comparing two
enzymes by looking for correspondences in their sequences, structures or
surfaces. For a given query, the search operation results in a ranking of the
enzymes in the database, from very similar to dissimilar enzymes, while
information about the biological function of annotated database enzymes is
ignored.
In this work we show that rankings of that kind can be substantially improved
by applying kernel-based learning algorithms. This approach enables the
detection of statistical dependencies between similarities of the active cleft
and the biological function of annotated enzymes. This is in contrast to
search-based approaches, which do not take annotated training data into
account. Similarity measures based on the active cleft are known to outperform
sequence-based or structure-based measures under certain conditions. We
consider the Enzyme Commission (EC) classification hierarchy for obtaining
annotated enzymes during the training phase. The results of a set of sizeable
experiments indicate a consistent and significant improvement for a set of
similarity measures that exploit information about small cavities in the
surface of enzymes
A semi-supervised learning algorithm for relevance feedback and collaborative image retrieval
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)The interaction of users with search services has been recognized as an important mechanism for expressing and handling user information needs. One traditional approach for supporting such interactive search relies on exploiting relevance feedbacks (RF) in the searching process. For large-scale multimedia collections, however, the user efforts required in RF search sessions is considerable. In this paper, we address this issue by proposing a novel semi-supervised approach for implementing RF-based search services. In our approach, supervised learning is performed taking advantage of relevance labels provided by users. Later, an unsupervised learning step is performed with the objective of extracting useful information from the intrinsic dataset structure. Furthermore, our hybrid learning approach considers feedbacks of different users, in collaborative image retrieval (CIR) scenarios. In these scenarios, the relationships among the feedbacks provided by different users are exploited, further reducing the collective efforts. Conducted experiments involving shape, color, and texture datasets demonstrate the effectiveness of the proposed approach. Similar results are also observed in experiments considering multimodal image retrieval tasks.The interaction of users with search services has been recognized as an important mechanism for expressing and handling user information needs. One traditional approach for supporting such interactive search relies on exploiting relevance feedbacks (RF) i2015FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)FAPESP [2013/08645-0, 2013/50169-1]CNPq [306580/2012-8, 484254/2012-0]2013/08645-0; 2013/50169-1306580/2012-8;484254/2012-0SEM INFORMAÇÃ
Local selection of features and its applications to image search and annotation
In multimedia applications, direct representations of data objects typically involve hundreds or thousands of features. Given a query object, the similarity between the query object and a database object can be computed as the distance between their feature vectors. The neighborhood of the query object consists of those database objects that are close to the query object. The semantic quality of the neighborhood, which can be measured as the proportion of neighboring objects that share the same class label as the query object, is crucial for many applications, such as content-based image retrieval and automated image annotation. However, due to the existence of noisy or irrelevant features, errors introduced into similarity measurements are detrimental to the neighborhood quality of data objects.
One way to alleviate the negative impact of noisy features is to use feature selection techniques in data preprocessing. From the original vector space, feature selection techniques select a subset of features, which can be used subsequently in supervised or unsupervised learning algorithms for better performance. However, their performance on improving the quality of data neighborhoods is rarely evaluated in the literature. In addition, most traditional feature selection techniques are global, in the sense that they compute a single set of features across the entire database. As a consequence, the possibility that the feature importance may vary across different data objects or classes of objects is neglected.
To compute a better neighborhood structure for objects in high-dimensional feature spaces, this dissertation proposes several techniques for selecting features that are important to the local neighborhood of individual objects. These techniques are then applied to image applications such as content-based image retrieval and image label propagation. Firstly, an iterative K-NN graph construction method for image databases is proposed. A local variant of the Laplacian Score is designed for the selection of features for individual images. Noisy features are detected and sparsified iteratively from the original standardized feature vectors. This technique is incorporated into an approximate K-NN graph construction method so as to improve the semantic quality of the graph. Secondly, in a content-based image retrieval system, a generalized version of the Laplacian Score is used to compute different feature subspaces for images in the database. For online search, a query image is ranked in the feature spaces of database images. Those database images for which the query image is ranked highly are selected as the query results. Finally, a supervised method for the local selection of image features is proposed, for refining the similarity graph used in an image label propagation framework. By using only the selected features to compute the edges leading from labeled image nodes to unlabeled image nodes, better annotation accuracy can be achieved.
Experimental results on several datasets are provided in this dissertation, to demonstrate the effectiveness of the proposed techniques for the local selection of features, and for the image applications under consideration
- …