81,293 research outputs found
Towards an Automated Comparison of OpenStreetMap with Authoritative Road Datasets
OpenStreetMap (OSM) is an extraordinarily large and diverse spatial database of the world. Road networks are amongst the most frequently occurring spatial content within the OSM database. These road network representations are usable in many applications. However the quality of these representations can vary between locations. Comparing OSM road networks with authoritative road datasets for a given area or region is an important task in assessing OSM’s fitness for use for applications like routing and navigation. Such comparisons can be technically challenging and no software implementation exists which facilitates them easily and automatically. In this article we develop and propose a flexible methodology for comparing the geometry of OSM road network data with other road datasets. Quantitative measures for the completeness and spatial accuracy of OSM are computed, including the compatibility of OSM road data with other map databases. Our methodology provides users with significant flexibility in how they can adjust the parameterization to suit their needs. This software implementation is exclusively built on open source software and a significant degree of automation is provided for these comparisons. This software can subsequently be extended and adapted for comparison between OSM and other external road datasets
A First Step Towards Automatically Building Network Representations
To fully harness Grids, users or middlewares must have some knowledge on the
topology of the platform interconnection network. As such knowledge is usually
not available, one must uses tools which automatically build a topological
network model through some measurements. In this article, we define a
methodology to assess the quality of these network model building tools, and we
apply this methodology to representatives of the main classes of model builders
and to two new algorithms. We show that none of the main existing techniques
build models that enable to accurately predict the running time of simple
application kernels for actual platforms. However some of the new algorithms we
propose give excellent results in a wide range of situations
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks
Automatically assessing emotional valence in human speech has historically
been a difficult task for machine learning algorithms. The subtle changes in
the voice of the speaker that are indicative of positive or negative emotional
states are often "overshadowed" by voice characteristics relating to emotional
intensity or emotional activation. In this work we explore a representation
learning approach that automatically derives discriminative representations of
emotional speech. In particular, we investigate two machine learning strategies
to improve classifier performance: (1) utilization of unlabeled data using a
deep convolutional generative adversarial network (DCGAN), and (2) multitask
learning. Within our extensive experiments we leverage a multitask annotated
emotional corpus as well as a large unlabeled meeting corpus (around 100
hours). Our speaker-independent classification experiments show that in
particular the use of unlabeled data in our investigations improves performance
of the classifiers and both fully supervised baseline approaches are
outperformed considerably. We improve the classification of emotional valence
on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
is competitive to state-of-the-art performance
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
- …