4,148 research outputs found
Topic Identification for Speech without ASR
Modern topic identification (topic ID) systems for speech use automatic
speech recognition (ASR) to produce speech transcripts, and perform supervised
classification on such ASR outputs. However, under resource-limited conditions,
the manually transcribed speech required to develop standard ASR systems can be
severely limited or unavailable. In this paper, we investigate alternative
unsupervised solutions to obtaining tokenizations of speech in terms of a
vocabulary of automatically discovered word-like or phoneme-like units, without
depending on the supervised training of ASR systems. Moreover, using automatic
phoneme-like tokenizations, we demonstrate that a convolutional neural network
based framework for learning spoken document representations provides
competitive performance compared to a standard bag-of-words representation, as
evidenced by comprehensive topic ID evaluations on both single-label and
multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201
Sampling Assumptions Affect Use of Indirect Negative Evidence in Language Learning.
A classic debate in cognitive science revolves around understanding how children learn complex linguistic patterns, such as restrictions on verb alternations and contractions, without negative evidence. Recently, probabilistic models of language learning have been applied to this problem, framing it as a statistical inference from a random sample of sentences. These probabilistic models predict that learners should be sensitive to the way in which sentences are sampled. There are two main types of sampling assumptions that can operate in language learning: strong and weak sampling. Strong sampling, as assumed by probabilistic models, assumes the learning input is drawn from a distribution of grammatical samples from the underlying language and aims to learn this distribution. Thus, under strong sampling, the absence of a sentence construction from the input provides evidence that it has low or zero probability of grammaticality. Weak sampling does not make assumptions about the distribution from which the input is drawn, and thus the absence of a construction from the input as not used as evidence of its ungrammaticality. We demonstrate in a series of artificial language learning experiments that adults can produce behavior consistent with both sets of sampling assumptions, depending on how the learning problem is presented. These results suggest that people use information about the way in which linguistic input is sampled to guide their learning
Towards the ontology-based approach for factual information matching
Factual information is information based on facts or relating to facts. The reliability of automatically extracted facts is the main problem of processing factual information. The fact retrieval system remains one of the most effective tools for identifying the information for decision-making. In this work, we explore how can natural language processing methods and problem domain ontology help to check contradictions and mismatches in facts automatically
Speaker-sex discrimination for voiced and whispered vowels at short durations
Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women’s but not for men’s voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it’s spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels
Mapping cognitive ontologies to and from the brain
Imaging neuroscience links brain activation maps to behavior and cognition
via correlational studies. Due to the nature of the individual experiments,
based on eliciting neural response from a small number of stimuli, this link is
incomplete, and unidirectional from the causal point of view. To come to
conclusions on the function implied by the activation of brain regions, it is
necessary to combine a wide exploration of the various brain functions and some
inversion of the statistical inference. Here we introduce a methodology for
accumulating knowledge towards a bidirectional link between observed brain
activity and the corresponding function. We rely on a large corpus of imaging
studies and a predictive engine. Technically, the challenges are to find
commonality between the studies without denaturing the richness of the corpus.
The key elements that we contribute are labeling the tasks performed with a
cognitive ontology, and modeling the long tail of rare paradigms in the corpus.
To our knowledge, our approach is the first demonstration of predicting the
cognitive content of completely new brain images. To that end, we propose a
method that predicts the experimental paradigms across different studies.Comment: NIPS (Neural Information Processing Systems), United States (2013
- …