27 research outputs found
Evaluating Information Retrieval and Access Tasks
This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one
ir_metadata: An Extensible Metadata Schema for IR Experiments
The information retrieval (IR) community has a strong tradition of making the
computational artifacts and resources available for future reuse, allowing the
validation of experimental results. Besides the actual test collections, the
underlying run files are often hosted in data archives as part of conferences
like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide
much information about the underlying experiment. For instance, the single run
file is not of much use without the context of the shared task's website or the
run data archive. In other domains, like the social sciences, it is good
practice to annotate research data with metadata. In this work, we introduce
ir_metadata - an extensible metadata schema for TREC run files based on the
PRIMAD model. We propose to align the metadata annotations to PRIMAD, which
considers components of computational experiments that can affect
reproducibility. Furthermore, we outline important components and information
that should be reported in the metadata and give evidence from the literature.
To demonstrate the usefulness of these metadata annotations, we implement new
features in repro_eval that support the outlined metadata schema for the use
case of reproducibility studies. Additionally, we curate a dataset with run
files derived from experiments with different instantiations of PRIMAD
components and annotate these with the corresponding metadata. In the
experiments, we cover reproducibility experiments that are identified by the
metadata and classified by PRIMAD. With this work, we enable IR researchers to
annotate TREC run files and improve the reuse value of experimental artifacts
even further.Comment: Resource pape
Joint Upper & Lower Bound Normalization for IR Evaluation
In this paper, we present a novel perspective towards IR evaluation by
proposing a new family of evaluation metrics where the existing popular metrics
(e.g., nDCG, MAP) are customized by introducing a query-specific lower-bound
(LB) normalization term. While original nDCG, MAP etc. metrics are normalized
in terms of their upper bounds based on an ideal ranked list, a corresponding
LB normalization for them has not yet been studied. Specifically, we introduce
two different variants of the proposed LB normalization, where the lower bound
is estimated from a randomized ranking of the corresponding documents present
in the evaluation set. We next conducted two case-studies by instantiating the
new framework for two popular IR evaluation metric (with two variants, e.g.,
DCG_UL_V1,2 and MSP_UL_V1,2 ) and then comparing against the traditional metric
without the proposed LB normalization. Experiments on two different data-sets
with eight Learning-to-Rank (LETOR) methods demonstrate the following
properties of the new LB normalized metric: 1) Statistically significant
differences (between two methods) in terms of original metric no longer remain
statistically significant in terms of Upper Lower (UL) Bound normalized version
and vice-versa, especially for uninformative query-sets. 2) When compared
against the original metric, our proposed UL normalized metrics demonstrate
higher Discriminatory Power and better Consistency across different data-sets.
These findings suggest that the IR community should consider UL normalization
seriously when computing nDCG and MAP and more in-depth study of UL
normalization for general IR evaluation is warranted.Comment: 26 pages, 3 figure
On Term Selection Techniques for Patent Prior Art Search
A patent is a set of exclusive rights granted to an inventor to
protect his invention for
a limited period of time. Patent prior art search involves
finding previously granted
patents, scientific articles, product descriptions, or any other
published work that
may be relevant to a new patent application. Many well-known
information retrieval
(IR) techniques (e.g., typical query expansion methods), which
are proven effective
for ad hoc search, are unsuccessful for patent prior art search.
In this thesis, we
mainly investigate the reasons that generic IR techniques are not
effective for prior
art search on the CLEF-IP test collection. First, we analyse the
errors caused due to
data curation and experimental settings like applying
International Patent Classification
codes assigned to the patent topics to filter the search results.
Then, we investigate
the influence of term selection on retrieval performance on the
CLEF-IP prior art
test collection, starting with the description section of the
reference patent and using
language models (LM) and BM25 scoring functions. We find that an
oracular relevance
feedback system, which extracts terms from the judged relevant
documents
far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs
twice as well on mean
average precision (MAP) as the best participant in CLEF-IP 2010
(i.e., 0.22 vs. 0.48).
We find a very clear term selection value threshold for use when
choosing terms. We
also notice that most of the useful feedback terms are actually
present in the original
query and hypothesise that the baseline system can be
substantially improved by removing
negative query terms. We try four simple automated approaches to
identify
negative terms for query reduction but we are unable to improve
on the baseline
performance with any of them. However, we show that a simple,
minimal feedback
interactive approach, where terms are selected from only the
first retrieved relevant
document outperforms the best result from CLEF-IP 2010,
suggesting the promise of
interactive methods for term selection in patent prior art
search
Sounding Together
Sounding Together: Collaborative Perspectives on U.S. Music in the Twenty-21st Century is a multi-authored, collaboratively conceived book of essays that tackles key challenges facing scholars studying music of the United States in the early twenty-first century. This book encourages scholars in music circles and beyond to explore the intersections between social responsibility, community engagement, and academic practices through the simple act of working together. The book’s essays—written by a diverse and cross-generational group of scholars, performers, and practitioners—demonstrate how collaboration can harness complementary skills and nourish comparative boundary-crossing through interdisciplinary research. The chapters of the volume address issues of race, nationalism, mobility, cultural domination, and identity; as well as the crisis of the Trump era and the political power of music. Each contribution to the volume is written collaboratively by two scholars, bringing together contributors who represent a mix of career stages and positions. Through the practice of and reflection on collaboration, Sounding Together breaks out of long-established paradigms of solitude in humanities scholarship and works toward social justice in the study of music
Risk Mitigation, Vulnerability Management and Resilience under Disasters
The Special Issue (SI) discusses the topic of Disaster Risk Management and its cornerstones: vulnerability reduction and resilience building. The focus of the SI is the impact of risk information, communication and representation, risk knowledge as related to science and practice, risk perception and awareness, and risk culture on multi-faceted vulnerability and several aspects of resilience
Recommended from our members
Linking Textual Resources to Support Information Discovery
A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of.
In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery.
This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge
Geological and Mineralogical Sequestration of CO2
The rapid increasing of concentrations of anthropologically generated greenhouse gases (primarily CO2) in the atmosphere is responsible for global warming and ocean acidification. The International Panel on Climate Change (IPCC) indicates that carbon capture and storage (CCS) techniques are a necessary measure to reduce greenhouse gas emissions in the short-to-medium term. One of the technological solutions is the long-term storage of CO2 in appropriate geological formations, such as deep saline formations and depleted oil and gas reservoirs. Promising alternative options that guarantee the permanent capture of CO2, although on a smaller scale, are the in-situ and ex-situ fixation of CO2 in the form of inorganic carbonates via the carbonation of mafic and ultramafic rocks and of Mg/Ca-rich fly ash, iron and steel slags, cement waste, and mine tailings. According to this general framework, this Special Issue collects articles covering various aspects of recent scientific advances in the geological and mineralogical sequestration of CO2. In particular, it includes the assessment of the storage potential of candidate injection sites in Croatia, Greece, and Norway; numerical modelling of geochemical–mineralogical reactions and CO2 flow; studies of natural analogues providing information on the processes and the physical–chemical conditions characterizing serpentinite carbonation; and experimental investigations to better understand the effectiveness and mechanisms of geological and mineralogical CO2 sequestration
Internet of Things. Information Processing in an Increasingly Connected World
This open access book constitutes the refereed post-conference proceedings of the First IFIP International Cross-Domain Conference on Internet of Things, IFIPIoT 2018, held at the 24th IFIP World Computer Congress, WCC 2018, in Poznan, Poland, in September 2018. The 12 full papers presented were carefully reviewed and selected from 24 submissions. Also included in this volume are 4 WCC 2018 plenary contributions, an invited talk and a position paper from the IFIP domain committee on IoT. The papers cover a wide range of topics from a technology to a business perspective and include among others hardware, software and management aspects, process innovation, privacy, power consumption, architecture, applications