15 research outputs found
Una teorÃa cognitiva integral para la recuperación de información: saliendo del entorno del laboratorio
The paper demonstrates how the Laboratory Research Framework fits into the integrated Cognitive Framework for IR. It first discusses the Laboratory Framework with emphasis on its underlying assumptions and known limitations. This is followed by a view of interaction and relevance phenomena associated with IR evaluation and central to the understanding of IR. The ensuing section outlines how interactive IR is viewed from a Cognitive Framework, and ‘light’ interactive IR experiments are suggested performed by drawing on the latter framework’s contextual possibilities. These include independent variables drawn from a collection, matching principles in a retrieval system, and the searcher’s situation and task context. The paper ends with concluding points of summarization of issues encountered.Este artÃculo demuestra cómo el marco de investigación en laboratorio encaja bien dentro del marco cognitivo integral para la Recuperación de información. Se discute primero el marco de investigación en laboratorio, con énfasis en sus asunciones y limitaciones. Se analizan los fenómenos de la interacción y relevancia asociados con la evaluación en RI., asà como el modo de desarrollar experimentos interactivos de Recuperación de información dentro del marco cognitivo, considerando la situación del investigador y el contexto de la tarea llevada a cabo
Investigating text power in predicting semantic similarity
This article presents an empirical evaluation to investigate the distributional semantic power of abstract, body and full-text, as different text levels, in predicting the semantic similarity using a collection of open access articles from PubMed. The semantic similarity is measured based on two criteria namely, linear MeSH terms intersection and hierarchical MeSH terms distance. As such, a random sample of 200 queries and 20000 documents are selected from a test collection built on CITREC open source code. Sim Pack Java Library is used to calculate the textual and semantic similarities. The nDCG value corresponding to two of the semantic similarity criteria is calculated at three precision points. Finally, the nDCG values are compared by using the Friedman test to determine the power of each text level in predicting the semantic similarity. The results showed the effectiveness of the text in representing the semantic similarity in such a way that texts with maximum textual similarity are also shown to be 77% and 67% semantically similar in terms of linear and hierarchical criteria, respectively. Furthermore, the text length is found to be more effective in representing the hierarchical semantic compared to the linear one. Based on the findings, it is concluded that when the subjects are homogenous in the tree of knowledge, abstracts provide effective semantic capabilities, while in heterogeneous milieus, full-texts processing or knowledge bases is needed to acquire IR effectiveness
Trademark image retrieval by local features
The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current
operational trademark retrieval systems involve manual annotation of the images
(the current ‘gold standard’). Accordingly, current systems require a substantial
amount of time and labour to access, and are therefore expensive to operate. This
thesis focuses on the development of algorithms that mimic aspects of human
visual perception in order to retrieve similar abstract trademark images
automatically. A significant category of trademark images are typically highly
stylised, comprising a collection of distinctive graphical elements that often
include geometric shapes. Therefore, in order to compare the similarity of such
images the principal aim of this research has been to develop a method for solving
the partial matching and shape perception problem.
There are few useful techniques for partial shape matching in the context of
trademark retrieval, because those existing techniques tend not to support multicomponent
retrieval. When this work was initiated most trademark image
retrieval systems represented images by means of global features, which are not
suited to solving the partial matching problem. Instead, the author has
investigated the use of local image features as a means to finding similarities
between trademark images that only partially match in terms of their subcomponents.
During the course of this work, it has been established that the
Harris and Chabat detectors could potentially perform sufficiently well to serve as
the basis for local feature extraction in trademark image retrieval. Early findings
in this investigation indicated that the well established SIFT (Scale Invariant
Feature Transform) local features, based on the Harris detector, could potentially
serve as an adequate underlying local representation for matching trademark
images.
There are few researchers who have used mechanisms based on human
perception for trademark image retrieval, implying that the shape representations
utilised in the past to solve this problem do not necessarily reflect the shapes
contained in these image, as characterised by human perception. In response, a
ii
practical approach to trademark image retrieval by perceptual grouping has been
developed based on defining meta-features that are calculated from the spatial
configurations of SIFT local image features. This new technique measures certain
visual properties of the appearance of images containing multiple graphical
elements and supports perceptual grouping by exploiting the non-accidental
properties of their configuration.
Our validation experiments indicated that we were indeed able to capture
and quantify the differences in the global arrangement of sub-components evident
when comparing stylised images in terms of their visual appearance properties.
Such visual appearance properties, measured using 17 of the proposed metafeatures,
include relative sub-component proximity, similarity, rotation and
symmetry. Similar work on meta-features, based on the above Gestalt proximity,
similarity, and simplicity groupings of local features, had not been reported in the
current computer vision literature at the time of undertaking this work.
We decided to adopted relevance feedback to allow the visual appearance
properties of relevant and non-relevant images returned in response to a query to
be determined by example. Since limited training data is available when
constructing a relevance classifier by means of user supplied relevance feedback,
the intrinsically non-parametric machine learning algorithm ID3 (Iterative
Dichotomiser 3) was selected to construct decision trees by means of dynamic
rule induction. We believe that the above approach to capturing high-level visual
concepts, encoded by means of meta-features specified by example through
relevance feedback and decision tree classification, to support flexible trademark
image retrieval and to be wholly novel.
The retrieval performance the above system was compared with two other
state-of-the-art image trademark retrieval systems: Artisan developed by Eakins
(Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using
relevance feedback, our system achieves higher average normalised precision
than either of the systems developed by Eakins’ or Jiang. However, while our
trademark image query and database set is based on an image dataset used by
Eakins, we employed different numbers of images. It was not possible to access to
the same query set and image database used in the evaluation of Jiang’s trademark
iii
image retrieval system evaluation. Despite these differences in evaluation
methodology, our approach would appear to have the potential to improve
retrieval effectiveness
Abstract Binary and graded relevance in IR evaluations— Comparison of the effects on ranking of IR systems
In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. KendallÕs rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures. Ó 2005 Elsevier Ltd. All rights reserved. 1
Extending low-rank matrix factorizations for emerging applications
Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis.
A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation.
In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix
factorization} (fMF), a novel cold-start recommendation method that
solves the problem of adaptive interview construction based on low-rank matrix factorizations.
The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces.
Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items.
In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner.
In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.Ph.D