18,394 research outputs found
The Unsupervised Acquisition of a Lexicon from Continuous Speech
We present an unsupervised learning algorithm that acquires a
natural-language lexicon from raw speech. The algorithm is based on the optimal
encoding of symbol sequences in an MDL framework, and uses a hierarchical
representation of language that overcomes many of the problems that have
stymied previous grammar-induction procedures. The forward mapping from symbol
sequences to the speech stream is modeled using features based on articulatory
gestures. We present results on the acquisition of lexicons and language models
from raw speech, text, and phonetic transcripts, and demonstrate that our
algorithm compares very favorably to other reported results with respect to
segmentation performance and statistical efficiency.Comment: 27 page technical repor
Stabilizing knowledge through standards - A perspective for the humanities
It is usual to consider that standards generate mixed feelings among
scientists. They are often seen as not really reflecting the state of the art
in a given domain and a hindrance to scientific creativity. Still, scientists
should theoretically be at the best place to bring their expertise into
standard developments, being even more neutral on issues that may typically be
related to competing industrial interests. Even if it could be thought of as
even more complex to think about developping standards in the humanities, we
will show how this can be made feasible through the experience gained both
within the Text Encoding Initiative consortium and the International
Organisation for Standardisation. By taking the specific case of lexical
resources, we will try to show how this brings about new ideas for designing
future research infrastructures in the human and social sciences
Translating near-synonyms: Possibilities and preferences in the interlingua
This paper argues that an interlingual representation must explicitly
represent some parts of the meaning of a situation as possibilities (or
preferences), not as necessary or definite components of meaning (or
constraints). Possibilities enable the analysis and generation of nuance,
something required for faithful translation. Furthermore, the representation of
the meaning of words, especially of near-synonyms, is crucial, because it
specifies which nuances words can convey in which contexts.Comment: 8 pages, LaTeX2e, 1 eps figure, uses colacl.sty, epsfig.sty, avm.sty,
times.st
Bilingualism and the single route/dual route debate
The debate between single and dual route accounts of cognitive processes has been generated predominantly by the application of connectionist modeling techniques to two areas of psycholinguistics. This paper draws an analogy between this debate and bilingual language processing. A prominent question within bilingual word recognition is whether the bilingual has functionally separate lexicons for each language, or a single system able to recognize the words in both languages. Empirical evidence has been taken to support a model which includes two separate lexicons working in parallel (Smith, 1991; Gerard and Scarborough, 1989). However, a range of interference effects has been found between the bilingual’s two sets of lexical knowledge (Thomas, 1997a). Connectionist models have been put forward which suggest that a single representational resource may deal with these data, so long as words are coded according to language membership (Thomas, 1997a, 1997b, Dijkstra and van Heuven, 1998). This paper discusses the criteria which might be used to differentiate single route and dual route models. An empirical study is introduced to address one of these criteria, parallel access, with regard to bilingual word recognition. The study fails to find support for the dual route model
- …