33,701 research outputs found

    CONTRIBUTION OF MORPHOLOGICAL AWARENESS TO WORD SEGMENTATION AMONG ADULT L2 CHINESE SPEAKERS

    Get PDF
    The current study aims to investigate the causal correlation between Chinese morphological awareness and word segmentation among intermediate adult speakers who learn Chinese as their second language (L2). In particular, we intend to determine the role of a potential mediator, vocabulary knowledge in this relationship. A total of 45 intermediate adult L2 Chinese speakers participated in the experiment and finished three separate tasks on Chinese morphological awareness, Chinese word segmentation, and vocabulary size. The logistic regression on the results of Chinese morphological awareness task fails to prove that the L2 Chinese speakers are sensitive to the degree of compositionally of Chinese compounds. Multi-linear regressions were conducted to test the mediation effects, and the results demonstrate that: (1) Chinese morphological awareness didn’t directly predict participants performance in word segmentation; (2) Chinese morphological awareness didn’t indirectly exert a strong effect on word segmentation via vocabulary knowledge. Although the current study didn’t find evidence to verify the relationship between morphological awareness and word segmentation nor the mediation effects of vocabulary knowledge, it establishes a foundation for future research design and implementation.Master of Art

    Non-adjacent dependency learning in infancy, and its link to language development

    Get PDF
    To acquire language, infants must learn how to identify words and linguistic structure in speech. Statistical learning has been suggested to assist both of these tasks. However, infants’ capacity to use statistics to discover words and structure together remains unclear. Further, it is not yet known how infants’ statistical learning ability relates to their language development. We trained 17-month-old infants on an artificial language comprising non-adjacent dependencies, and examined their looking times on tasks assessing sensitivity to words and structure using an eye-tracked head-turn-preference paradigm. We measured infants’ vocabulary size using a Communicative Development Inventory (CDI) concurrently and at 19, 21, 24, 25, 27, and 30 months to relate performance to language development. Infants could segment the words from speech, demonstrated by a significant difference in looking times to words versus part-words. Infants’ segmentation performance was significantly related to their vocabulary size (receptive and expressive) both currently, and over time (receptive until 24 months, expressive until 30 months), but was not related to the rate of vocabulary growth. The data also suggest infants may have developed sensitivity to generalised structure, indicating similar statistical learning mechanisms may contribute to the discovery of words and structure in speech, but this was not related to vocabulary size

    Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

    Get PDF
    The necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the linguistic properties of words, which leads to interruptions in the word structure and causes semantic and syntactic losses. In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language. Our method is based on unsupervised morphology learning and can be, in principle, used for pre-processing any language pair. We also present an alternative word segmentation method based on supervised morphological analysis, which aids us in measuring the accuracy of our model. We evaluate our method in Turkish-to-English NMT task where the input language is morphologically rich and agglutinative. We analyze different representation methods in terms of translation accuracy as well as the semantic and syntactic properties of the generated output. Our method obtains a significant improvement of 2.3 BLEU points over the conventional vocabulary reduction technique, showing that it can provide better accuracy in open vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine Translation (EAMT), Research Paper, 12 page

    Segmenting DNA sequence into words based on statistical language model

    Get PDF
    This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach to word segmentation’ method to segment the DNA sequences. The benchmark of segmenting method is also proposed. In cross segmenting test, we find different genomes may use the similar language, but belong to different branches, just like the English and French/Latin. We present some possible applications of this method at last

    Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

    Get PDF
    Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

    Does Vocabulary Knowledge Affect Lexical Segmentation in Adverse Conditions?

    Get PDF
    There is significant variability in the ability of listeners to perceive degraded speech. Existing research has suggested that vocabulary knowledge is one factor that differentiates better listeners from poorer ones, though the reason for such a relationship is unclear. This study aimed to investigate whether a relationship exists between vocabulary knowledge and the type of lexical segmentation strategy listeners use in adverse conditions. This study conducted error pattern analysis using an existing dataset of 34 normal-hearing listeners (11 males, 23 females, aged 18 to 35) who participated in a speech recognition in noise task. Listeners were divided into a higher vocabulary (HV) and a lower vocabulary (LV) group based on their receptive vocabulary score on the Peabody Picture Vocabulary Test (PPVT). Lexical boundary errors (LBEs) were analysed to examine whether the groups showed differential use of syllabic strength cues for lexical segmentation. Word substitution errors (WSEs) were also analysed to examine patterns in phoneme identification. The type and number of errors were compared between the HV and LV groups. Simple linear regression showed a significant relationship between vocabulary and performance on the speech recognition task. Independent samples t-tests showed no significant differences between the HV and LV groups in Metrical Segmentation Strategy (MSS) ratio or number of LBEs. Further independent samples t-tests showed no significant differences between the WSEs produced by HV and LV groups in the degree of phonemic resemblance to the target. There was no significant difference in the proportion of target phrases to which HV and LV listeners responded. The results of this study suggest that vocabulary knowledge does not affect lexical segmentation strategy in adverse conditions. Further research is required to investigate why higher vocabulary listeners appear to perform better on speech recognition tasks
    • 

    corecore