717 research outputs found
Bibliometric Indicators of Young Authors in Astrophysics: Can Later Stars be Predicted?
We test 16 bibliometric indicators with respect to their validity at the
level of the individual researcher by estimating their power to predict later
successful researchers. We compare the indicators of a sample of astrophysics
researchers who later co-authored highly cited papers before their first
landmark paper with the distributions of these indicators over a random control
group of young authors in astronomy and astrophysics. We find that field and
citation-window normalisation substantially improves the predicting power of
citation indicators. The two indicators of total influence based on citation
numbers normalised with expected citation numbers are the only indicators which
show differences between later stars and random authors significant on a 1%
level. Indicators of paper output are not very useful to predict later stars.
The famous -index makes no difference at all between later stars and the
random control group.Comment: 14 pages, 10 figure
Preliminary Experiments using Subjective Logic for the Polyrepresentation of Information Needs
According to the principle of polyrepresentation, retrieval accuracy may
improve through the combination of multiple and diverse information object
representations about e.g. the context of the user, the information sought, or
the retrieval system. Recently, the principle of polyrepresentation was
mathematically expressed using subjective logic, where the potential
suitability of each representation for improving retrieval performance was
formalised through degrees of belief and uncertainty. No experimental evidence
or practical application has so far validated this model. We extend the work of
Lioma et al. (2010), by providing a practical application and analysis of the
model. We show how to map the abstract notions of belief and uncertainty to
real-life evidence drawn from a retrieval dataset. We also show how to estimate
two different types of polyrepresentation assuming either (a) independence or
(b) dependence between the information objects that are combined. We focus on
the polyrepresentation of different types of context relating to user
information needs (i.e. work task, user background knowledge, ideal answer) and
show that the subjective logic model can predict their optimal combination
prior and independently to the retrieval process
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Evaluation Measures for Relevance and Credibility in Ranked Lists
Recent discussions on alternative facts, fake news, and post truth politics
have motivated research on creating technologies that allow people not only to
access information, but also to assess the credibility of the information
presented to them by information retrieval systems. Whereas technology is in
place for filtering information according to relevance and/or credibility, no
single measure currently exists for evaluating the accuracy or precision (and
more generally effectiveness) of both the relevance and the credibility of
retrieved results. One obvious way of doing so is to measure relevance and
credibility effectiveness separately, and then consolidate the two measures
into one. There at least two problems with such an approach: (I) it is not
certain that the same criteria are applied to the evaluation of both relevance
and credibility (and applying different criteria introduces bias to the
evaluation); (II) many more and richer measures exist for assessing relevance
effectiveness than for assessing credibility effectiveness (hence risking
further bias).
Motivated by the above, we present two novel types of evaluation measures
that are designed to measure the effectiveness of both relevance and
credibility in ranked lists of retrieval results. Experimental evaluation on a
small human-annotated dataset (that we make freely available to the research
community) shows that our measures are expressive and intuitive in their
interpretation
A review of the characteristics of 108 author-level bibliometric indicators
An increasing demand for bibliometric assessment of individuals has led to a
growth of new bibliometric indicators as well as new variants or combinations
of established ones. The aim of this review is to contribute with objective
facts about the usefulness of bibliometric indicators of the effects of
publication activity at the individual level. This paper reviews 108 indicators
that can potentially be used to measure performance on the individual author
level, and examines the complexity of their calculations in relation to what
they are supposed to reflect and ease of end-user application.Comment: to be published in Scientometrics, 201
Preliminary study of technical terminology for the retrieval of scientific book metadata records
Books only represented by brief metadata (book records) are particularly hard to retrieve. One way of improving their retrieval is by extracting retrieval enhancing features from them. This work focusses on scientific (physics) book records. We ask if their technical terminology can be used as a retrieval enhancing feature. A study of 18,443 book records shows a strong correlation between their technical terminology and their likelihood of relevance. Using this finding for retrieval yields >+5% precision and recall gains
Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)
What if Information Retrieval (IR) systems did not just retrieve relevant
information that is stored in their indices, but could also "understand" it and
synthesise it into a single document? We present a preliminary study that makes
a first step towards answering this question. Given a query, we train a
Recurrent Neural Network (RNN) on existing relevant information to that query.
We then use the RNN to "deep learn" a single, synthetic, and we assume,
relevant document for that query. We design a crowdsourcing experiment to
assess how relevant the "deep learned" document is, compared to existing
relevant documents. Users are shown a query and four wordclouds (of three
existing relevant documents and our deep learned synthetic document). The
synthetic document is ranked on average most relevant of all.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21,
2016, Pisa, Ital
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
- …