8,940 research outputs found

    Using Global Constraints and Reranking to Improve Cognates Detection

    Full text link
    Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant performance improvements beyond current state of the art performance on publicly available datasets with different language pairs and various conditions such as different levels of baseline state of the art performance and different data size conditions, including with more realistic large data size conditions than have been evaluated with in the past.Comment: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 201

    Graphonological Levenshtein Edit Distance: Application for Automated Cognate Identification

    Get PDF
    This paper presents a methodology for calculating a modified Levenshtein edit distance between character strings, and applies it to the task of automated cognate identification from non-parallel (comparable) corpora. This task is an important stage in developing MT systems and bilingual dictionaries beyond the coverage of traditionally used aligned parallel corpora, which can be used for finding translation equivalents for the ‘long tail’ in Zipfian distribution: low-frequency and usually unambiguous lexical items in closely-related languages (many of those often under-resourced). Graphonological Levenshtein edit distance relies on editing hierarchical representations of phonological features for graphemes (graphonological representations) and improves on phonological edit distance proposed for measuring dialectological variation. Graphonological edit distance works directly with character strings and does not require an intermediate stage of phonological transcription, exploiting the advantages of historical and morphological principles of orthography, which are obscured if only phonetic principle is applied. Difficulties associated with plain feature representations (unstructured feature sets or vectors) are addressed by using linguistically-motivated feature hierarchy that restricts matching of lower-level graphonological features when higher-level features are not matched. The paper presents an evaluation of the graphonological edit distance in comparison with the traditional Levenshtein edit distance from the perspective of its usefulness for the task of automated cognate identification. It discusses the advantages of the proposed method, which can be used for morphology induction, for robust transliteration across different alphabets (Latin, Cyrillic, Arabic, etc.) and robust identification of words with non-standard or distorted spelling, e.g., in user-generated content on the web such as posts on social media, blogs and comments. Software for calculating the modified feature-based Levenshtein distance, and the corresponding graphonological feature representations (vectors and the hierarchies of graphemes’ features) are released on the author’s webpage: http://corpus.leeds.ac.uk/bogdan/phonologylevenshtein/. Features are currently available for Latin and Cyrillic alphabets and will be extended to other alphabets and languages

    Translating Linguistic Metaphors in Both Directions: A Process-Oriented Study on English-Chinese Translation

    Get PDF
    Distinguished from conceptual metaphor, linguistic metaphor refers to metaphor in fixed linguistic form (words, phrases or sentences) of expression. (Lakoff 1993, pp. 202-203) With the development of modern technology, researchers started to investigate the translation process of linguistic metaphor from empirical approaches (e.g. Sjþrup, 2013; Zheng and Xiang, 2011 etc.). However, one critical issue remains unexplored: the relationship between translation directionality and the process of linguistic metaphor translation. To fill this gap on the language pair Chinese and English, this study is designed to investigate the impact of linguistic metaphor on cognitive effort, and whether this impact is affected by directionality. Thirty-eight novice translators performed a series of translation tasks (first language (L1): Chinese; second language (L2): English), and their performances were recorded by eye tracking, key logging and cue-based Retrospective Think Aloud devices. For objective description, four eye-key combination indicators are calculated in Generalised Linear Models to demonstrate translators’ allocation of cognitive resources, namely, Total Attentional Duration (TA duration), AU count, AU duration and pupil dilation. The findings suggest that: for the sequential and parallel coordination of Source Text (ST) processing and Target Text (TT) processing, TT processing receives significantly more cognitive effort than ST processing and parallel processing, which partially confirms that Carl and Dragsted (2012) and Hvelplund (2011)’s views on translators’ allocation of cognitive resources are valid for the language pair English and Chinese. Furthermore, it is discovered that the qualitative data from the subjective reflection vary with the quantitative results in this study. For metaphor’s impact on cognitive effort, expression type (linguistic metaphor) can significantly affect participants’ allocation of cognitive resources in both translation directions (Sjþrup, 2013; Dagut, 1987; Newmark, 1988), but the results of different indicators are not consistent. And there is also a significant difference between eye-key data and participants’ subjective self-reflections. For the translation directionality, the results partially confirm that the “translation asymmetry” (Chang, 2011) is valid on metaphor related processing: at some perspectives, the translation directionality can significantly affect the relationship between metaphor related expression types and attention-distribution pattern of translation process

    Translationese and post-editese : how comparable is comparable quality?

    Get PDF
    Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as ‘Translationese’. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text

    Empirical modelling of translation and interpreting

    Get PDF
    "Empirical research is carried out in a cyclic way: approaching a research area bottom-up, data lead to interpretations and ideally to the abstraction of laws, on the basis of which a theory can be derived. Deductive research is based on a theory, on the basis of which hypotheses can be formulated and tested against the background of empirical data. Looking at the state-of-the-art in translation studies, either theories as well as models are designed or empirical data are collected and interpreted. However, the final step is still lacking: so far, empirical data has not lead to the formulation of theories or models, whereas existing theories and models have not yet been comprehensively tested with empirical methods. This publication addresses these issues from several perspectives: multi-method product- as well as process-based research may gain insights into translation as well as interpreting phenomena. These phenomena may include cognitive and organizational processes, procedures and strategies, competence and performance, translation properties and universals, etc. Empirical findings about the deeper structures of translation and interpreting will reduce the gap between translation and interpreting practice and model and theory building. Furthermore, the availability of more large-scale empirical testing triggers the development of models and theories concerning translation and interpreting phenomena and behavior based on quantifiable, replicable and transparent data.

    Computational Investigations on Polymerase Actions in Gene Transcription and Replication Combining Physical Modeling and Atomistic Simulations

    Full text link
    Polymerases are protein enzymes that move along nucleic acid chains and catalyze template-based polymerization reactions during gene transcription and replication. The polymerases also substantially improve transcription or replication fidelity through the non-equilibrium enzymatic cycles. We briefly review computational efforts that have been made toward understanding mechano-chemical coupling and fidelity control mechanisms of the polymerase elongation. The polymerases are regarded as molecular information motors during the elongation process. It requires a full spectrum of computational approaches from multiple time and length scales to understand the full polymerase functional cycle. We keep away from quantum mechanics based approaches to the polymerase catalysis due to abundant former surveys, while address only statistical physics modeling approach and all-atom molecular dynamics simulation approach. We organize this review around our own modeling and simulation practices on a single-subunit T7 RNA polymerase, and summarize commensurate studies on structurally similar DNA polymerases. For multi-subunit RNA polymerases that have been intensively studied in recent years, we leave detailed discussions on the simulation achievements to other computational chemical surveys, while only introduce very recently published representative studies, including our own preliminary work on structure-based modeling on yeast RNA polymerase II. In the end, we quickly go through kinetic modeling on elongation pauses and backtracking activities. We emphasize the fluctuation and control mechanisms of the polymerase actions, highlight the non-equilibrium physical nature of the system, and try to bring some perspectives toward understanding replication and transcription regulation from single molecular details to a genome-wide scale
    • 

    corecore