76,521 research outputs found

    Real-Time Statistical Speech Translation

    Full text link
    This research investigates the Statistical Machine Translation approaches to translate speech in real time automatically. Such systems can be used in a pipeline with speech recognition and synthesis software in order to produce a real-time voice communication system between foreigners. We obtained three main data sets from spoken proceedings that represent three different types of human speech. TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for developmental tuning and testing of the translation system. We also conducted experiments involving part of speech tagging, compound splitting, linear language model interpolation, TrueCasing and morphosyntactic analysis. We evaluated the effects of variety of data preparations on the translation results using the BLEU, NIST, METEOR and TER metrics and tried to give answer which metric is most suitable for PL-EN language pair.Comment: machine translation, polish englis

    An Unsupervised Knowledge Free Algorithm for the Learning of Morphology in Natural Languages - Master\u27s Thesis, May 2002

    Get PDF
    This thesis describes an unsupervised system to learn natural language morphology, specifically suffix identification from unannotated text. The system is language independent, so that is can learn the morphology of any human language. For English this means identifying “-s”, “-ing”, “-ed”, “-tion” and many other suffixes, in addition to learning which stems they attach to. The system uses no prior knowledge, such as part of speech tags, and learns the morphology by simply reading in a body of unannotated text. The system consists of a generative probabilistic model which is used to evaluate hypotheses, and a directed search and a hill-climbing search which are used in conjunction to find a highly probably hypothesis. Experiments applying the system to English and Polish are described

    The Influence of Attention to Language Form on the Production of Weak Forms by Polish Learners of English

    Get PDF
    The paper discusses a study whose aim was to examine the impact of attention to language form and task type on the realisation of English function words by Polish learners of English. An additional goal was to investigate whether style-induced pronunciation shifts may depend on the degree of foreign accent. A large part of the paper concentrates on the issue of defining ‘weakness’ in English weak forms and considers priorities in English pronunciation teaching as far as the realisation of function words is concerned. The participants in the study were 12 advanced Polish learners of English, who were divided into two groups: 6 who were judged to speak with a slight degree of foreign accent and 6 who were judged to speak with a high degree of foreign accent. The subjects’ pronunciation was analysed in three situations in which we assume their attention was increasingly paid to speech form (spontaneous speech, prepared speech, reading). The results of the study suggest that increased attention to language form caused the participants to realise more function words as unstressed, although the effect was small. It was also found that one of the characteristics of English weak forms, the lack of stress, was realised correctly by the participants in the majority of cases. Finally, the results of the study imply that, in the case under investigation, the effect of attention to language form is weakly or not at all related to the degree of foreign accent

    Linguistic complexity: English vs. Polish, text vs. corpus

    Full text link
    We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant

    Measuring vowel duration variability in native English speakers and polish learners

    Get PDF
    This paper presents a set of simple statistical measures that illustrate the difference between native English speakers and Polish learners of English in varying the length of vocalic segments in read speech. Relative vowel duration and vowel length variation are widely used as basic criteria for establishing rhythmic differences between languages and dialects of a language. The parameter of vocalic duration is employed in popular measures such as ΔV (Ramus et al. 1999), VarcoV (Dellwo 2006, White and Mattys 2007), and PVI (Low et al. 2000, Grabe and Low 2002). Apart from rhythm studies, the processing of data concerning vowel duration can be used to establish the level of discrepancy between native speech and learner speech in investigating other temporal aspects of FL pronunciation, such as tense-lax vowel distinction, accentual lengthening or the degree of unstressed vowel reduction, which are often pointed out as serious problems in the acquisition of English pronunciation by Polish learners. Using descriptive statistics (relations between personal mean vowel duration and standard deviation), the author calculates several indices that demonstrate individual learners' (13 subjects) scores in relation to the native speakers' (12 subjects) score ranges. In some tested aspects, the results of the two groups of speakers are almost cleanly separated, which suggests not only the existence of specific didactic problems but also their actual scale

    The Relationship Between English and Polish Rhythm Measures in Polish Learners of English

    Get PDF
    This paper investigates native and non-native speech rhythm in the speech of Polish learners of English at an intermediate/upper-intermediate level. More specifically, it attempts to explore the relationship between rhythm measures scores in L1 Polish and L2 English within individual speakers. Phonological vowel reduction in terms of duration is present in English and crucial for the perception and acoustic measurements of linguistic rhythm. Polish, on the other hand, has no phonological reduction of that kind. The acquisition of L2 vowel reduction is highly determined by the level of language proficiency and influences non-native rhythmic patterns. The study tests six speech rhythm measures: %V, ΔV, ΔC, VarcoV, VarcoC and nPVI-V in two tempos: normal and fast. The results show that most of these measures are positively and significantly correlated with each other between L1 Polish and L2 English across the subjects and for two tempos, although to a different degree. Highly significantly correlation has been noted for %V and ΔC in fast tempo. Moderate significant correlations between the two languages are observed for ΔV, ΔC (normal tempo), VarcoV and nPVI in fast tempo

    Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking

    Get PDF
    Re-speaking is a mechanism for obtaining high quality subtitles for use in live broadcast and other public events. Because it relies on humans performing the actual re-speaking, the task of estimating the quality of the results is non-trivial. Most organisations rely on humans to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems, like Machine Translation. This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking.Comment: Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking. arXiv admin note: text overlap with arXiv:1509.0908
    corecore