1,849 research outputs found

    Rule Based Transliteration Scheme for English to Punjabi

    Get PDF
    Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi

    The interaction of syllabification and voicing perception in american english

    Get PDF
    The current paper explores these two sorts of phonetic explanations of the relationship between syllabic position and the voicing contrast in American English. It has long been observed that the contrast between, for example, /p/ and /b/ is expressed differently, depending on the position of the stop with respect to the vowel. Preceding a vowel within a syllable, the contrast is largely one of aspiration. /p/ is aspirated, while /b/ is voiceless, or in some dialects voiced or even an implosive. Following a vowel within a syllable, both /p/ and /b/ both tend to lack voicing in the closure and the contrast is expressed largely by dynamic differences in the transition between the previous vowel and the stop. Here, vowel and closure duration are negatively correlated such that the /p/ has a shorter vowel and longer closure duration. This difference is often enhanced by the addition of glottalization to /p/. In addition to these differences, there are additional differences connected to higher-level organization involving stress and feet edges. To make the current discussion more tractable, we will restrict ourselves to the two conditions (CV and VC) laid out above

    ‘Pitch accent’ and prosodic structure in Scottish Gaelic: Reassessing the role of contact

    Get PDF
    This paper considers the origin of ‘pitch accents’ in Scottish Gaelic with a view to evaluating the hypothesis that this feature was borrowed from North Germanic varieties spoken by Norse settlers in medieval Scotland. It is shown that the ‘pitch accent’ system in Gaelic is tightly bound with metrical structure (more precisely syllable count), certainly diachronically, and probably (at least in some varieties) synchronically. Gaelic ‘pitch accent’ is argued to be a plausible internal development, parallel to similar phenomena in other branches of Celtic (specifically in Breton), as well as in Germanic. This conclusion may appear to undermine the contact hypothesis, especially in the absence of reliable written sources; nevertheless, a certain role for Norse-Gaelic contact in the appearance of the pitch accent system cannot be completely exclude

    Towards a corpus-based, statistical approach of translation quality : measuring and visualizing linguistic deviance in student translations

    Get PDF
    In this article we present a corpus-based statistical approach to measuring translation quality, more particularly translation acceptability, by comparing the features of translated and original texts. We discuss initial findings that aim to support and objectify formative quality assessment. To that end, we extract a multitude of linguistic and textual features from both student and professional translation corpora that consist of many different translations by several translators in two different genres (fiction, news) and in two translation directions (English to French and French to Dutch). The numerical information gathered from these corpora is exploratively analysed with Principal Component Analysis, which enables us to identify stable, language-independent linguistic and textual indicators of student translations compared to translations produced by professionals. The differences between these types of translation are subsequently tested by means of ANOVA. The results clearly indicate that the proposed methodology is indeed capable of distinguishing between student and professional translations. It is claimed that this deviant behaviour indicates an overall lower translation quality in student translations: student translations tend to score lower at the acceptability level, that is, they deviate significantly from target-language norms and conventions. In addition, the proposed methodology is capable of assessing the acceptability of an individual student’s translation – a smaller linguistic distance between a given student translation and the norm set by the professional translations correlates with higher quality. The methodology is also able to provide objective and concrete feedback about the divergent linguistic dimensions in their text

    Temiar Reduplication in One-Level Prosodic Morphology

    Full text link
    Temiar reduplication is a difficult piece of prosodic morphology. This paper presents the first computational analysis of Temiar reduplication, using the novel finite-state approach of One-Level Prosodic Morphology originally developed by Walther (1999b, 2000). After reviewing both the data and the basic tenets of One-level Prosodic Morphology, the analysis is laid out in some detail, using the notation of the FSA Utilities finite-state toolkit (van Noord 1997). One important discovery is that in this approach one can easily define a regular expression operator which ambiguously scans a string in the left- or rightward direction for a certain prosodic property. This yields an elegant account of base-length-dependent triggering of reduplication as found in Temiar.Comment: 9 pages, 2 figures. Finite-State Phonology: SIGPHON-2000, Proceedings of the Fifth Workshop of the ACL Special Interest Group in Computational Phonology, pp.13-21. Aug. 6, 2000. Luxembour

    Neutralization in Aztec Phonology – the Case of Classical Nahuatl Nasals

    Get PDF
    This article investigates nasal assimilation in Classical Nahuatl. The distribution of nasal consonants is shown to be the result of coda neutralization. It is argued that generalizations made for root and word level are disproportionate and cannot be explained through the means of rule-based phonology. It is shown that the process responsible for nasal distribution can only be accounted for by introducing derivational levels in Optimality Theor

    Morphological Analysis as Classification: an Inductive-Learning Approach

    Full text link
    Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro
    • 

    corecore