326 research outputs found
Recommended from our members
Improving tag recommendation using social networks
In this paper we address the task of recommending additional tags to partially annotated media objects, in our case images. We propose an extendable framework that can recommend tags using a combination of different personalised and collective contexts. We combine information from four contexts: (1) all the photos in the system, (2) a user's own photos, (3) the photos of a user's social contacts, and (4) the photos posted in the groups of which a user is a member. Variants of methods (1) and (2) have been proposed in previous work, but the use of (3) and (4) is novel.
For each of the contexts we use the same probabilistic model and Borda Count based aggregation approach to generate recommendations from different contexts into a unified ranking of recommended tags. We evaluate our system using a large set of real-world data from Flickr. We show that by using personalised contexts we can significantly improve tag recommendation compared to using collective knowledge alone. We also analyse our experimental results to explore the capabilities of our system with respect to a user's social behaviour
Building user interest profiles from wikipedia clusters
Users of search systems are often reluctant to explicitly build profiles to indicate their search interests. Thus automatically building user profiles is an important research area for personalized search. One difficult component of doing this is accessing a knowledge system which provides broad coverage of user search interests. In this work, we describe a
method to build category id based user profiles from a user's
historical search data. Our approach makes significant use
of Wikipedia as an external knowledge resource
Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. We apply the Baire distance to
spectrometric and photometric redshifts from the Sloan Digital Sky Survey
using, in this work, about half a million astronomical objects. We want to know
how well the (more cos\ tly to determine) spectrometric redshifts can predict
the (more easily obtained) photometric redshifts, i.e. we seek to regress the
spectrometric on the photometric redshifts, and we develop a clusterwise
nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure
Determining the Characteristic Vocabulary for a Specialized Dictionary using Word2vec and a Directed Crawler
Specialized dictionaries are used to understand concepts in specific domains,
especially where those concepts are not part of the general vocabulary, or
having meanings that differ from ordinary languages. The first step in creating
a specialized dictionary involves detecting the characteristic vocabulary of
the domain in question. Classical methods for detecting this vocabulary involve
gathering a domain corpus, calculating statistics on the terms found there, and
then comparing these statistics to a background or general language corpus.
Terms which are found significantly more often in the specialized corpus than
in the background corpus are candidates for the characteristic vocabulary of
the domain. Here we present two tools, a directed crawler, and a distributional
semantics package, that can be used together, circumventing the need of a
background corpus. Both tools are available on the web
Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework
The relevance of a document has many facets, going beyond the usual topical one, which have to be considered to satisfy a user's information need. Multiple representations of documents, like user-given reviews or the actual document content, can give evidence towards certain facets of relevance. In this respect polyrepresentation of documents, where such evidence is combined, is a crucial concept to estimate the relevance of a document. In this paper, we discuss how a geometrical retrieval framework inspired by quantum mechanics can be extended to support polyrepresentation. We show by example how different representations of a document can be modelled in a Hilbert space, similar to physical systems known from quantum mechanics. We further illustrate how these representations are combined by means of the tensor product to support polyrepresentation, and discuss the case that representations of documents are not independent from a user point of view. Besides giving a principled framework for polyrepresentation, the potential of this approach is to capture and formalise the complex interdependent relationships that the different representations can have between each other
Text Classification: A Sequential Reading Approach
We propose to model the text classification process as a sequential decision
process. In this process, an agent learns to classify documents into topics
while reading the document sentences sequentially and learns to stop as soon as
enough information was read for deciding. The proposed algorithm is based on a
modelisation of Text Classification as a Markov Decision Process and learns by
using Reinforcement Learning. Experiments on four different classical
mono-label corpora show that the proposed approach performs comparably to
classical SVM approaches for large training sets, and better for small training
sets. In addition, the model automatically adapts its reading process to the
quantity of training information provided.Comment: ECIR201
ModÚle de langue pour l'ordonnancement conjoint d'entités pertinentes dans un réseau d'informations hétérogÚnes
National audienceDans ce papier, nous proposons un nouveau modĂšle, appelĂ© BibRank, ayant pour objectif d'ordonnancer conjointement des ressources hĂ©tĂ©rogĂšnes, documents et auteurs, d'un rĂ©seau bibliographique selon leur degrĂ© de pertinence vis-Ă -vis d'une requĂȘte. Ce modĂšle utilise le principe de propagation des scores des entitĂ©s en considĂ©rant Ă la fois la structure du rĂ©seau et le sujet de la requĂȘte. De plus, ce modĂšle introduit deux indicateurs de proximitĂ© thĂ©matique entre entitĂ©s connectĂ©es suivant le type des entitĂ©s reliĂ©es. Pour les relations entre entitĂ©s homogĂšnes, cet indicateur dĂ©tecte les citations marginales tandis que pour les relations entre entitĂ©s hĂ©tĂ©rogĂšnes, il utilise deux sources d'Ă©vidence : le sujet du document et l'expertise de l'auteur. Des expĂ©rimentations, menĂ©es en utilisant le rĂ©seau bibliographique CiteSeerX, montrent l'efficacitĂ© du modĂšle d'ordonnancement proposĂ©
A social model for Literature Access: Towards a weighted social network of authors
International audienceThis paper presents a novel retrieval approach for literature access based on social network analysis. In fact, we investigate a social model where authors represent the main entities and relationships are extracted from co-author and citation links. Moreover, we define a weighting model for social relationships which takes into account the authors positions in the social network and their mutual collaborations. Assigned weights express influence, knowledge transfer and shared interest between authors. Furthermore, we estimate document relevance by combing the document-query similarity and the document social importance derived from corresponding authors. To evaluate the effectiveness of our model, we conduct a series of experiments on a scientific document dataset that includes textual content and social data extracted from the academic social network CiteULike. Final results show that the proposed model improves the retrieval effectiveness and outperforms traditional and social information retrieval baselines
- âŠ