11,105 research outputs found
Relevance judgments and the incremental presentation of document representations
A new approach to the solicitation and measurement of relevance judgments is presented, which attempts to resolve some of the difficulties inherent in the nature of relevance and human judgment, and which further seeks to examine how users' judgments of document representations change as more information about documents is revealed to them. Subjects (university faculty and doctoral students) viewed three incremental versions of documents, and recorded ratio-level relevance judgments for each version. These judgments were analyzed by a variety of methods, including graphical inspection and examination of the number and degree of changes of judgments as new information is seen. A post questionnaire was also administered to obtain subjects' perceptions of the process and the individual fields of information presented. A consistent pattern of perception and importance of these fields is seen: Abstracts are by far the most important field and have the greatest impact, followed by titles, bibliographic information, and indexing.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/29634/1/0000723.pd
Estimating Position Bias without Intrusive Interventions
Presentation bias is one of the key challenges when learning from implicit
feedback in search engines, as it confounds the relevance signal. While it was
recently shown how counterfactual learning-to-rank (LTR) approaches
\cite{Joachims/etal/17a} can provably overcome presentation bias when
observation propensities are known, it remains to show how to effectively
estimate these propensities. In this paper, we propose the first method for
producing consistent propensity estimates without manual relevance judgments,
disruptive interventions, or restrictive relevance modeling assumptions. First,
we show how to harvest a specific type of intervention data from historic
feedback logs of multiple different ranking functions, and show that this data
is sufficient for consistent propensity estimation in the position-based model.
Second, we propose a new extremum estimator that makes effective use of this
data. In an empirical evaluation, we find that the new estimator provides
superior propensity estimates in two real-world systems -- Arxiv Full-text
Search and Google Drive Search. Beyond these two points, we find that the
method is robust to a wide range of settings in simulation studies
How users assess web pages for information-seeking
In this paper, we investigate the criteria used by online searchers when assessing the relevance of web pages for information-seeking tasks. Twenty four participants were given three tasks each, and indicated the features of web pages which they employed when deciding about the usefulness of the pages in relation to the tasks. These tasks were presented within the context of a simulated work-task situation. We investigated the relative utility of features identified by participants (web page content,structure and quality), and how the importance of these features is affected by the type of information-seeking task performed and the stage of the search. The results of this study provide a set of criteria used by searchers to decide about the utility of web pages for different types of tasks. Such criteria can have implications for the design of systems that use or recommend web pages
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Aerospace medicine and biology: A continuing bibliography with indexes (supplement 341)
This bibliography lists 133 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during September 1990. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
An Empirical Study of User Navigation during Document Triage
Περιέχει το πλήρες κείμενοDocument triage is the moment in the information seeking
process when the user first decides the relevance of a document to their
information need[17]. This paper reports a study of user behaviour during
document triage. The study reveals two main findings: first, that there
is a small set of common navigational patterns; second, that certain
document features strongly influence users’ navigation
- …