19,136 research outputs found
A reproducible approach with R markdown to automatic classification of medical certificates in French
In this paper, we report the ongoing developments of our first participation to the Cross-Language Evaluation Forum (CLEF) eHealth Task 1: “Multilingual Information Extraction - ICD10 coding” (Névéol et al., 2017). The task consists in labelling death certificates, in French with international standard codes. In particular, we wanted to accomplish the goal of the ‘Replication track’ of this Task which promotes the sharing of tools and the dissemination of solid, reproducible results.In questo articolo presentiamo gli sviluppi del lavoro iniziato con la partecipazione al Laboratorio CrossLanguage Evaluation Forum (CLEF) eHealth denominato: “Multilingual Information Extraction - ICD10 coding” (Névéol et al., 2017) che ha come obiettivo quello di classificare certificati di morte in lingua francese con dei codici standard internazionali. In particolare, abbiamo come obiettivo quello proposto dalla ‘Replication track’ di questo Task, che promuove la condivisione di strumenti e la diffusione di risultati riproducibili
Induced Magnetic Ordering by Proton Irradiation in Graphite
We provide evidence that proton irradiation of energy 2.25 MeV on
highly-oriented pyrolytic graphite samples triggers ferro- or ferrimagnetism.
Measurements performed with a superconducting quantum interferometer device
(SQUID) and magnetic force microscopy (MFM) reveal that the magnetic ordering
is stable at room temperature.Comment: 3 Figure
Measuring reproducibility of high-throughput experiments
Reproducibility is essential to reliable scientific discovery in
high-throughput experiments. In this work we propose a unified approach to
measure the reproducibility of findings identified from replicate experiments
and identify putative discoveries using reproducibility. Unlike the usual
scalar measures of reproducibility, our approach creates a curve, which
quantitatively assesses when the findings are no longer consistent across
replicates. Our curve is fitted by a copula mixture model, from which we derive
a quantitative reproducibility score, which we call the "irreproducible
discovery rate" (IDR) analogous to the FDR. This score can be computed at each
set of paired replicate ranks and permits the principled setting of thresholds
both for assessing reproducibility and combining replicates. Since our approach
permits an arbitrary scale for each replicate, it provides useful descriptive
measures in a wide variety of situations to be explored. We study the
performance of the algorithm using simulations and give a heuristic analysis of
its theoretical properties. We demonstrate the effectiveness of our method in a
ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
ir_metadata: An Extensible Metadata Schema for IR Experiments
The information retrieval (IR) community has a strong tradition of making the
computational artifacts and resources available for future reuse, allowing the
validation of experimental results. Besides the actual test collections, the
underlying run files are often hosted in data archives as part of conferences
like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide
much information about the underlying experiment. For instance, the single run
file is not of much use without the context of the shared task's website or the
run data archive. In other domains, like the social sciences, it is good
practice to annotate research data with metadata. In this work, we introduce
ir_metadata - an extensible metadata schema for TREC run files based on the
PRIMAD model. We propose to align the metadata annotations to PRIMAD, which
considers components of computational experiments that can affect
reproducibility. Furthermore, we outline important components and information
that should be reported in the metadata and give evidence from the literature.
To demonstrate the usefulness of these metadata annotations, we implement new
features in repro_eval that support the outlined metadata schema for the use
case of reproducibility studies. Additionally, we curate a dataset with run
files derived from experiments with different instantiations of PRIMAD
components and annotate these with the corresponding metadata. In the
experiments, we cover reproducibility experiments that are identified by the
metadata and classified by PRIMAD. With this work, we enable IR researchers to
annotate TREC run files and improve the reuse value of experimental artifacts
even further.Comment: Resource pape
From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences
We describe the state-of-the-art in performance modeling and prediction for Information Retrieval
(IR), Natural Language Processing (NLP) and Recommender Systems (RecSys) along with its
shortcomings and strengths. We present a framework for further research, identifying five major
problem areas: understanding measures, performance analysis, making underlying assumptions
explicit, identifying application features determining performance, and the development of prediction
models describing the relationship between assumptions, features and resulting performanc
- …