47,528 research outputs found
Comparing human and automatic thesaurus mapping approaches in the agricultural domain
Knowledge organization systems (KOS), like thesauri and other controlled
vocabularies, are used to provide subject access to information systems across
the web. Due to the heterogeneity of these systems, mapping between
vocabularies becomes crucial for retrieving relevant information. However,
mapping thesauri is a laborious task, and thus big efforts are being made to
automate the mapping process. This paper examines two mapping approaches
involving the agricultural thesaurus AGROVOC, one machine-created and one human
created. We are addressing the basic question "What are the pros and cons of
human and automatic mapping and how can they complement each other?" By
pointing out the difficulties in specific cases or groups of cases and grouping
the sample into simple and difficult types of mappings, we show the limitations
of current automatic methods and come up with some basic recommendations on
what approach to use when.Comment: 10 pages, Int'l Conf. on Dublin Core and Metadata Applications 200
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
Differential effects of internal and external factors on the development of vocabulary, tense morphology and morpho-syntax in successive bilingual children
The present study investigates the effects of child internal (age/time) and child external/environmental factors on the development of a wide range of language domains in successive bilingual (L2) Turkish-English children of homogeneously low SES. Forty-three L2 children were tested on standardized assessments examining the acquisition of vocabulary and morpho-syntax. The L2 children exhibited a differential acquisition of the various domains: they were better on the general comprehension of grammar and tense morphology and less accurate on the acquisition of vocabulary and (complex) morpho-syntax. Profile effects were confirmed by the differential effects of internal and external factors on the language domains. The development of vocabulary and complex syntax were affected by internal and external factors, whereas external factors had no contribution to the development of tense morphology. These results are discussed in light of previous studies on the impact of internal and external factors in child L2 acquisition
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
Development of a speech recognition system for Spanish broadcast news
This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported
- âŠ