18,394 research outputs found

    The Unsupervised Acquisition of a Lexicon from Continuous Speech

    Get PDF
    We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

    Stabilizing knowledge through standards - A perspective for the humanities

    Get PDF
    It is usual to consider that standards generate mixed feelings among scientists. They are often seen as not really reflecting the state of the art in a given domain and a hindrance to scientific creativity. Still, scientists should theoretically be at the best place to bring their expertise into standard developments, being even more neutral on issues that may typically be related to competing industrial interests. Even if it could be thought of as even more complex to think about developping standards in the humanities, we will show how this can be made feasible through the experience gained both within the Text Encoding Initiative consortium and the International Organisation for Standardisation. By taking the specific case of lexical resources, we will try to show how this brings about new ideas for designing future research infrastructures in the human and social sciences

    Translating near-synonyms: Possibilities and preferences in the interlingua

    Full text link
    This paper argues that an interlingual representation must explicitly represent some parts of the meaning of a situation as possibilities (or preferences), not as necessary or definite components of meaning (or constraints). Possibilities enable the analysis and generation of nuance, something required for faithful translation. Furthermore, the representation of the meaning of words, especially of near-synonyms, is crucial, because it specifies which nuances words can convey in which contexts.Comment: 8 pages, LaTeX2e, 1 eps figure, uses colacl.sty, epsfig.sty, avm.sty, times.st

    Bilingualism and the single route/dual route debate

    Get PDF
    The debate between single and dual route accounts of cognitive processes has been generated predominantly by the application of connectionist modeling techniques to two areas of psycholinguistics. This paper draws an analogy between this debate and bilingual language processing. A prominent question within bilingual word recognition is whether the bilingual has functionally separate lexicons for each language, or a single system able to recognize the words in both languages. Empirical evidence has been taken to support a model which includes two separate lexicons working in parallel (Smith, 1991; Gerard and Scarborough, 1989). However, a range of interference effects has been found between the bilingual’s two sets of lexical knowledge (Thomas, 1997a). Connectionist models have been put forward which suggest that a single representational resource may deal with these data, so long as words are coded according to language membership (Thomas, 1997a, 1997b, Dijkstra and van Heuven, 1998). This paper discusses the criteria which might be used to differentiate single route and dual route models. An empirical study is introduced to address one of these criteria, parallel access, with regard to bilingual word recognition. The study fails to find support for the dual route model
    • …
    corecore