87 research outputs found
Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
Statistical significance testing is widely accepted as a means to assess how
well a difference in effectiveness reflects an actual difference between
systems, as opposed to random noise because of the selection of topics.
According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is
the most popular choice among IR researchers. However, previous work has
suggested computer intensive tests like the bootstrap or the permutation test,
based mainly on theoretical arguments. On empirical grounds, others have
suggested non-parametric alternatives such as the Wilcoxon test. Indeed, the
question of which tests we should use has accompanied IR and related fields for
decades now. Previous theoretical studies on this matter were limited in that
we know that test assumptions are not met in IR experiments, and empirical
studies were limited in that we do not have the necessary control over the null
hypotheses to compute actual Type I and Type II error rates under realistic
conditions. Therefore, not only is it unclear which test to use, but also how
much trust we should put in them. In contrast to past studies, in this paper we
employ a recent simulation methodology from TREC data to go around these
limitations. Our study comprises over 500 million p-values computed for a range
of tests, systems, effectiveness measures, topic set sizes and effect sizes,
and for both the 2-tail and 1-tail cases. Having such a large supply of IR
evaluation data with full knowledge of the null hypotheses, we are finally in a
position to evaluate how well statistical significance tests really behave with
IR data, and make sound recommendations for practitioners.Comment: 10 pages, 6 figures, SIGIR 201
The Semantic Web MIDI Tape: An Interface for Interlinking MIDI and Context Metadata
The Linked Data paradigm has been used to publish a large number of musical datasets and ontologies on the Semantic Web, such as MusicBrainz, AcousticBrainz, and the Music Ontology. Recently, the MIDI Linked Data Cloud has been added to these datasets, representing more than 300,000 pieces in MIDI format as Linked Data, opening up the possibility for linking fine-grained symbolic music representations to existing music metadata databases. Despite the dataset making MIDI resources available in Web data standard formats such as RDF and SPARQL, the important issue of finding meaningful links between these MIDI resources and relevant contextual metadata in other datasets remains. A fundamental barrier for the provision and generation of such links is the difficulty that users have at adding new MIDI performance data and metadata to the platform. In this paper, we propose the Semantic Web MIDI Tape, a set of tools and associated interface for interacting with the MIDI Linked Data Cloud by enabling users to record, enrich, and retrieve MIDI performance data and related metadata in native Web data standards. The goal of such interactions is to find meaningful links between published MIDI resources and their relevant contextual metadata. We evaluate the Semantic Web MIDI Tape in various use cases involving user-contributed content, MIDI similarity querying, and entity recognition methods, and discuss their potential for finding links between MIDI resources and metadata
MelodyShape at MIREX 2014 Symbolic Melodic Similarity
ABSTRACT This short paper describes our three submissions to the 2014 edition of the MIREX Symbolic Melodic Similarity task. All three submissions rely on a geometric model that represents melodies as spline curves in the pitch-time plane. The similarity between two melodies is then computed with a sequence alignment algorithm between sequences of spline spans: the more similar the shape of the curves, the more similar the melodies they represent. As in the previous MIREX 2010MIREX , 2011MIREX , 2012 and 2013 editions, our systems ranked first for all effectiveness measures. The main difference with last year is that we submitted a re-implementation of all algorithms, contained in the new open source library MelodyShape
One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies
Inspired by the success of deploying deep learning in the fields of Computer
Vision and Natural Language Processing, this learning paradigm has also found
its way into the field of Music Information Retrieval. In order to benefit from
deep learning in an effective, but also efficient manner, deep transfer
learning has become a common approach. In this approach, it is possible to
reuse the output of a pre-trained neural network as the basis for a new
learning task. The underlying hypothesis is that if the initial and new
learning tasks show commonalities and are applied to the same type of input
data (e.g. music audio), the generated deep representation of the data is also
informative for the new task. Since, however, most of the networks used to
generate deep representations are trained using a single initial learning
source, their representation is unlikely to be informative for all possible
future tasks. In this paper, we present the results of our investigation of
what are the most important factors to generate deep representations for the
data and learning tasks in the music domain. We conducted this investigation
via an extensive empirical study that involves multiple learning sources, as
well as multiple deep learning architectures with varying levels of information
sharing between sources, in order to learn music representations. We then
validate these representations considering multiple target datasets for
evaluation. The results of our experiments yield several insights on how to
approach the design of methods for learning widely deployable deep data
representations in the music domain.Comment: This work has been accepted to "Neural Computing and Applications:
Special Issue on Deep Learning for Music and Audio
Information retrieval systems adapted to the biomedical domain
The terminology used in biomedicine has lexical characteristics that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of these techniques, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources
Configuración de NethServer para la administración de servicios en infraestructuras de red
En el presente artÃculo se documentarán diferentes servicios destinados a la gestión de infraestructuras de red mediante la utilización del sistema operativo GNU/Linux. Para ello se iniciará con la instalación de Nethserver, el cual permitirá la configuración de servicios como DHCP para la asignación automática de direcciones IP en la red, DNS o sistema de nombres de dominio para el servidor, y la implementación de un controlador de dominio como medida de seguridad para la administración de usuarios en la red. Además, se abordará la implementación de servicios que establezcan un nivel de seguridad adecuado para la infraestructura, seguido de la configuración del Proxy y Firewall con el propósito de supervisar el acceso a internet y prevenir el acceso no autorizado a la red. Finalmente, se llevará a cabo la integración de servicios File Server y Print Server, con el objetivo de compartir recursos en la infraestructura implementada.This article will document different services for network infrastructure management using the GNU/Linux operating system. To do so, we will start with the installation of Nethserver, which will allow the configuration of services such as DHCP for the automatic assignment of IP addresses on the network, DNS or domain name system for the server, and the implementation of a domain controller as a security measure for the administration of users on the network. In addition, the implementation of services that establish an adequate level of security for the infrastructure will be addressed, followed by the configuration of the Proxy and Firewall in order to monitor Internet access and prevent unauthorized access to the network. Finally, the integration of File Server and Print Server services will be carried out in order to share resources in the implemented infrastructure. implemented infrastructure
- …