7 research outputs found
Evaluating Information Retrieval and Access Tasks
This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one
Cross-language Information Retrieval
Two key assumptions shape the usual view of ranked retrieval: (1) that the
searcher can choose words for their query that might appear in the documents
that they wish to see, and (2) that ranking retrieved documents will suffice
because the searcher will be able to recognize those which they wished to find.
When the documents to be searched are in a language not known by the searcher,
neither assumption is true. In such cases, Cross-Language Information Retrieval
(CLIR) is needed. This chapter reviews the state of the art for CLIR and
outlines some open research questions.Comment: 49 pages, 0 figure
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education
This report documents the program and the outcomes of Dagstuhl Seminar 23031
``Frontiers of Information Access Experimentation for Research and Education'',
which brought together 37 participants from 12 countries.
The seminar addressed technology-enhanced information access (information
retrieval, recommender systems, natural language processing) and specifically
focused on developing more responsible experimental practices leading to more
valid results, both for research as well as for scientific education.
The seminar brought together experts from various sub-fields of information
access, namely IR, RS, NLP, information science, and human-computer interaction
to create a joint understanding of the problems and challenges presented by
next generation information access systems, from both the research and the
experimentation point of views, to discuss existing solutions and impediments,
and to propose next steps to be pursued in the area in order to improve not
also our research methods and findings but also the education of the new
generation of researchers and developers.
The seminar featured a series of long and short talks delivered by
participants, who helped in setting a common ground and in letting emerge
topics of interest to be explored as the main output of the seminar. This led
to the definition of five groups which investigated challenges, opportunities,
and next steps in the following areas: reality check, i.e. conducting
real-world studies, human-machine-collaborative relevance judgment frameworks,
overcoming methodological challenges in information retrieval and recommender
systems through awareness and education, results-blind reviewing, and guidance
for authors.Comment: Dagstuhl Seminar 23031, report
Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments
To evaluate Information Retrieval (IR) effectiveness, a possible approach is
to use test collections, which are composed of a collection of documents, a set
of description of information needs (called topics), and a set of relevant
documents to each topic. Test collections are modelled in a competition
scenario: for example, in the well known TREC initiative, participants run
their own retrieval systems over a set of topics and they provide a ranked list
of retrieved documents; some of the retrieved documents (usually the first
ranked) constitute the so called pool, and their relevance is evaluated by
human assessors; the document list is then used to compute effectiveness
metrics and rank the participant systems. Private Web Search companies also run
their in-house evaluation exercises; although the details are mostly unknown,
and the aims are somehow different, the overall approach shares several issues
with the test collection approach.
The aim of this work is to: (i) develop and improve some state-of-the-art
work on the evaluation of IR effectiveness while saving resources, and (ii)
propose a novel, more principled and engineered, overall approach to test
collection based effectiveness evaluation.
[...