76,521 research outputs found
Real-Time Statistical Speech Translation
This research investigates the Statistical Machine Translation approaches to
translate speech in real time automatically. Such systems can be used in a
pipeline with speech recognition and synthesis software in order to produce a
real-time voice communication system between foreigners. We obtained three main
data sets from spoken proceedings that represent three different types of human
speech. TED, Europarl, and OPUS parallel text corpora were used as the basis
for training of language models, for developmental tuning and testing of the
translation system. We also conducted experiments involving part of speech
tagging, compound splitting, linear language model interpolation, TrueCasing
and morphosyntactic analysis. We evaluated the effects of variety of data
preparations on the translation results using the BLEU, NIST, METEOR and TER
metrics and tried to give answer which metric is most suitable for PL-EN
language pair.Comment: machine translation, polish englis
An Unsupervised Knowledge Free Algorithm for the Learning of Morphology in Natural Languages - Master\u27s Thesis, May 2002
This thesis describes an unsupervised system to learn natural language morphology, specifically suffix identification from unannotated text. The system is language independent, so that is can learn the morphology of any human language. For English this means identifying “-s”, “-ing”, “-ed”, “-tion” and many other suffixes, in addition to learning which stems they attach to. The system uses no prior knowledge, such as part of speech tags, and learns the morphology by simply reading in a body of unannotated text. The system consists of a generative probabilistic model which is used to evaluate hypotheses, and a directed search and a hill-climbing search which are used in conjunction to find a highly probably hypothesis. Experiments applying the system to English and Polish are described
The Influence of Attention to Language Form on the Production of Weak Forms by Polish Learners of English
The paper discusses a study whose aim was to examine the impact of attention to language form and task type on the realisation of English function words by Polish learners of English. An additional goal was to investigate whether style-induced pronunciation shifts may depend on the degree of foreign accent. A large part of the paper concentrates on the issue of defining ‘weakness’ in English weak forms and considers priorities in English pronunciation teaching as far as the realisation of function words is concerned. The participants in the study were 12 advanced Polish learners of English, who were divided into two groups: 6 who were judged to speak with a slight degree of foreign accent and 6 who were judged to speak with a high degree of foreign accent. The subjects’ pronunciation was analysed in three situations in which we assume their attention was increasingly paid to speech form (spontaneous speech, prepared speech, reading). The results of the study suggest that increased attention to language form caused the participants to realise more function words as unstressed, although the effect was small. It was also found that one of the characteristics of English weak forms, the lack of stress, was realised correctly by the participants in the majority of cases. Finally, the results of the study imply that, in the case under investigation, the effect of attention to language form is weakly or not at all related to the degree of foreign accent
Linguistic complexity: English vs. Polish, text vs. corpus
We analyze the rank-frequency distributions of words in selected English and
Polish texts. We show that for the lemmatized (basic) word forms the
scale-invariant regime breaks after about two decades, while it might be
consistent for the whole range of ranks for the inflected word forms. We also
find that for a corpus consisting of texts written by different authors the
basic scale-invariant regime is broken more strongly than in the case of
comparable corpus consisting of texts written by the same author. Similarly,
for a corpus consisting of texts translated into Polish from other languages
the scale-invariant regime is broken more strongly than for a comparable corpus
of native Polish texts. Moreover, we find that if the words are tagged with
their proper part of speech, only verbs show rank-frequency distribution that
is almost scale-invariant
Measuring vowel duration variability in native English speakers and polish learners
This paper presents a set of simple statistical measures that illustrate the difference between native English speakers and Polish learners of English in varying the length of vocalic segments in read speech. Relative vowel duration and vowel length variation are widely used as basic criteria for establishing rhythmic differences between languages and dialects of a language. The parameter of vocalic duration is employed in popular measures such as ΔV (Ramus et al. 1999), VarcoV (Dellwo 2006, White and Mattys 2007), and PVI (Low et al. 2000, Grabe and Low 2002). Apart from rhythm studies, the processing of data concerning vowel duration can be used to establish the level of discrepancy between native speech and learner speech in investigating other temporal aspects of FL pronunciation, such as tense-lax vowel distinction, accentual lengthening or the degree of unstressed vowel reduction, which are often pointed out as serious problems in the acquisition of English pronunciation by Polish learners. Using descriptive statistics (relations between personal mean vowel duration and standard deviation), the author calculates several indices that demonstrate individual learners' (13 subjects) scores in relation to the native speakers' (12 subjects) score ranges. In some tested aspects, the results of the two groups of speakers are almost cleanly separated, which suggests not only the existence of specific didactic problems but also their actual scale
The Relationship Between English and Polish Rhythm Measures in Polish Learners of English
This paper investigates native and non-native speech rhythm in the speech of Polish learners of English at an intermediate/upper-intermediate level. More specifically, it attempts to explore the relationship between rhythm measures scores in L1 Polish and L2 English within individual speakers. Phonological vowel reduction in terms of duration is present in English and crucial for the perception and acoustic measurements of linguistic rhythm. Polish, on the other hand, has no phonological reduction of that kind. The acquisition of L2 vowel reduction is highly determined by the level of language proficiency and influences non-native rhythmic patterns. The study tests six speech rhythm measures: %V, ΔV, ΔC, VarcoV, VarcoC and nPVI-V in two tempos: normal and fast. The results show that most of these measures are positively and significantly correlated with each other between L1 Polish and L2 English across the subjects and for two tempos, although to a different degree. Highly significantly correlation has been noted for %V and ΔC in fast tempo. Moderate significant correlations between the two languages are observed for ΔV, ΔC (normal tempo), VarcoV and nPVI in fast tempo
Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking
Re-speaking is a mechanism for obtaining high quality subtitles for use in
live broadcast and other public events. Because it relies on humans performing
the actual re-speaking, the task of estimating the quality of the results is
non-trivial. Most organisations rely on humans to perform the actual quality
assessment, but purely automatic methods have been developed for other similar
problems, like Machine Translation. This paper will try to compare several of
these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES. These will
then be matched to the human-derived NER metric, commonly used in re-speaking.Comment: Comparison and Adaptation of Automatic Evaluation Metrics for Quality
Assessment of Re-Speaking. arXiv admin note: text overlap with
arXiv:1509.0908
- …