6,760 research outputs found
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
Recommended from our members
The effect of dyslexia on information retrieval: A pilot study
Purpose – The purpose of the paper is to resolve a gap in our knowledge of how people with dyslexia interact with Information Retrieval (IR) systems, specifically an understanding of their information searching behaviour. Very little research has been undertaken with this particular user group, and given the size of the group (an estimated 10% of the population) this lack of knowledge needs to be addressed.
Design/Methodology/Approach - We use elements of the dyslexia cognitive profile to design a logging system recording the difference between two sets of participants: dyslexic and control users. We use a standard Okapi interface together with two standard TREC topics in order to record the information searching behaviour of these users. We gather evidence from various sources, including quantitative information on search logs, together with qualitative information from interviews and questionnaires. We record variables on queries, documents, relevance assessments and sessions in the search logs. We use this evidence to examine the difference in searching between the two sets of users, in order to understand the effect of dyslexia on the information searching behaviour. A topic analysis is also conducted on the quantitative data to show any effect on the results from the information need.
Research limitations/implications – As this is a pilot study, only 10 participants were recruited for the study, 5 for each user group. Due to ethical issues, the number of topics per search was restricted to one topic only. The study shows that the methodology applied is useful for distinguishing between the two user groups, taking into account differences between topic. We outline further research on the back of this pilot study in four main areas. A different approach from the proposed methodology is needed to measure the effect on query variables, which takes account of topic variation. More details on users are needed such as reading abilities, speed of language processing and working memory to distinguish the user groups. Effect of topic on search interaction must be measured in order to record the potential impact on the dyslexic user group. Work is needed on relevance assessment and effect on precision and recall for users who may not read many documents.
Findings – Using the log data, we establish the differences in information searching behaviour of control and dyslexic users i.e. in the way the two groups interact with Okapi, and that qualitative information collected (such as experience etc) may not be able to account for these differences. Evidence from query variables was unable to distinguish between groups, but differences on topic for the same variables were recorded. Users who view more documents tended to judge more documents as being relevant, either in terms of the user group or topic. Session data indicated that there may be an important difference between the number of iterations used in a search between the user groups, as there may be little effect from the topic on this variable.
Originality/Value – This is the first study of the effect of dyslexia on information search behaviour, and provides some evidence to take the field forward
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
A user evaluation of hierarchical phrase browsing
Phrase browsing interfaces based on hierarchies of phrases extracted automatically from document collections offer a useful compromise between automatic full-text searching and manually-created subject indexes. The literature contains descriptions of such systems that many find compelling and persuasive. However, evaluation studies have either been anecdotal, or focused on objective measures of the quality of automatically-extracted index terms, or restricted to questions of computational efficiency and feasibility. This paper reports on an empirical, controlled user study that compares hierarchical phrase browsing with full-text searching over a range of information seeking tasks. Users found the results located via phrase browsing to be relevant and useful but preferred keyword searching for certain types of queries. Users experiences were marred by interface details, including inconsistencies between the phrase browser and the surrounding digital library interface
Finding co-solvers on Twitter, with a little help from Linked Data
In this paper we propose a method for suggesting potential collaborators for solving innovation challenges online, based on their competence, similarity of interests and social proximity with the user. We rely on Linked Data to derive a measure of semantic relatedness that we use to enrich both user profiles and innovation problems with additional relevant topics, thereby improving the performance of co-solver recommendation. We evaluate this approach against state of the art methods for query enrichment based on the distribution of topics in user profiles, and demonstrate its usefulness in recommending collaborators that are both complementary in competence and compatible with the user. Our experiments are grounded using data from the social networking service Twitter.com
- …