213 research outputs found
Context Matters: An Analysis of assessments of XML Documents
The paper analyses searchers’ assessments of usefulness and specificity on different levels of granularity in XML-coded documents. Documents are assessed on 10 usefulness/specificity combinations and on the granularity levels of article, section, and subsection. Overlapping judgements show a remarkable lack of consistency between searchers. There is an inverse relationship between articles and sections both in the assessment of specificity and of usefulness, indicating that retrieval on different granularity levels are a useful feature of a retrieval system. Searchers find the full article more useful when they assess the same document both on the article and section level indicating that there is a need to provide context to the sections and subsections when presenting result list of XML-documents
Evaluating implicit feedback models using searcher simulations
In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. We introduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffrey's rule of conditioning outperformed other models under investigation
Geographic information retrieval in a mobile environment: evaluating the needs of mobile individuals
This paper describes research that aims to define the information needs of mobile individuals, to implement a mobile information system that can satisfy those needs, and finally to evaluate the performance of that system with end-users. First a review of the emerging discipline of geographic information retrieval (GIR) is presented as background to the more specific issue of mobile information retrieval. Following this, a user needs study is described evaluating the requirements of potential users of a mobile information system; the study finds that there is a strong geographic component to users' information needs. Next, four geographic post-query filters are described which attempt to represent the region of space associated with an individual's query made at some specific spatial location. These filters are spatial proximity (distance in space), temporal proximity (travel time), speed-heading prediction surfaces (likelihood of visiting locations) and visibility (locations that can be seen). Two of these filters — spatial proximity and speed-heading prediction surfaces — are implemented in a mobile information system and subsequently evaluated with users in an outdoor setting. The results of evaluation suggest that retrieved information to which post-query geographic filters have been applied is considered more relevant than unfiltered information, and that users find information sorted by spatial proximity to be more relevant than that sorted by a prediction surface of likely future locations. The paper closes with a discussion of the wider implications of these results for developers of mobile information systems and location-based services
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
A Markov chain model for changes in users’ assessment of search results
Previous research shows that users tend to change their assessment of search results over time. This is a first study that investigates the factors and reasons for these changes, and describes a stochastic model of user behaviour that may explain these changes. In particular, we hypothesise that most of the changes are local, i.e. between results with similar or close relevance to the query, and thus belong to the same ”coarse” relevance category. According to the theory of coarse beliefs and categorical thinking, humans tend to divide the range of values under consideration into coarse categories, and are thus able to distinguish only between cross-category values but not within them. To test this hypothesis we conducted five experiments with about 120 subjects divided into 3 groups. Each student in every group was asked to rank and assign relevance scores to the same set of search results over two or three rounds, with a period of three to nine weeks between each round. The subjects of the last three-round experiment were then exposed to the differences in their judgements and were asked to explain them. We make use of a Markov chain model to measure change in users’ judgments between the different rounds. The Markov chain demonstrates that the changes converge, and that a majority of the changes are local to a neighbouring relevance category. We found that most of the subjects were satisfied with their changes, and did not perceive them as mistakes but rather as a legitimate phenomenon, since they believe that time has influenced their relevance assessment. Both our quantitative analysis and user comments support the hypothesis of the existence of coarse relevance categories resulting from categorical thinking in the context of user evaluation of search results
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Mapping recent information behavior research: an analysis of co-authorship and cocitation networks
There has been an increase in research published on information behavior in recent years, and this has been accompanied by an increase in its diversity and interaction with other fields, particularly information retrieval (HR). The aims of this study are to determine which researchers have contributed to producing the current body of knowledge on this subject, and to describe its intellectual basis. A bibliometric and network analysis was applied to authorship and co-authorship as well as citation and co-citation. According to these analyses, there is a small number of authors who can be considered to be the most productive and who publish regularly, and a large number of transient ones. Other findings reveal a marked predominance of theoretical works, some examples of qualitative methodology that originate in other areas of social science, and a high incidence of research focused on the user interaction with information retrieval systems and the information behavior of doctors
Configuração epistemológica da Ciência da Informação na literatura periódica Brasileira por meio de análise de citações (1972-2008)
- …
