6,980 research outputs found

    The organiation of alphabets of nucleic acids and proteins as a structural prototype of human language alphabets

    Get PDF
    [Abstract] An alphabet analysis of English, Spanish, Russian, and Vietnamese texts, which belong to different language groups, shows that among vowels there is a predominant group of four letters that is equal to the number of nucleotides of NA (nucleic acids) in the biological language alphabets. The quantity of consonants in these texts is equal to (in English and Russian) or near to (in Spanish) 20, which is the number of triplet groups of mRNA and the same number of protein amino acids. As is well known, DNA and RNA nucleotides combine the functions of energy carrier with the functions of information carrier. It was ascertained by a functional analysis of vowels that they perform the same functions. At the same time the consonants carry only information. Therefore consonants are similar not to nucleotides but to triplets of mRNA and to protein amino acid. We believe that the optimal number of letters in the alphabet of biogenetic languages was brought about by bio-evolution through the process of natural selection. The analysis of the alphabets of human languages leads us to believe that this achievement of bio-evolution was realized naturally in the evolution of human languages alphabets as well. It could therefore be concluded that the development of this approach could enlarge the spectrum of linguistic analysis methods of human languages

    Tone and intonation: introductory notes and practical recommendations

    Get PDF
    International audienceThe present article aims to propose a simple introduction to the topics of (i) lexical tone, (ii) intonation, and (iii) tone-intonation interactions, with practical recommendations for students. It builds on the authors' observations on various languages, tonal and non-tonal; much of the evidence reviewed concerns tonal languages of Asia. With a view to providing beginners with an adequate methodological apparatus for studying tone and intonation, the present notes emphasize two salient dimensions of linguistic diversity. The first is the nature of the lexical tones: we review the classical distinction between (i) contour tones that can be analyzed into sequences of level tones, and (ii) contour tones that are non-decomposable (phonetically complex). A second dimension of diversity is the presence or absence of intonational tones: tones of intonational origin that are formally identical with lexical (and morphological) tones

    Tone and phonation in Southeast Asian languages

    Get PDF

    Mimicking Word Embeddings using Subword RNNs

    Full text link
    Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.Comment: EMNLP 201
    • …
    corecore