8,940 research outputs found
Using Global Constraints and Reranking to Improve Cognates Detection
Global constraints and reranking have not been used in cognates detection
research to date. We propose methods for using global constraints by performing
rescoring of the score matrices produced by state of the art cognates detection
systems. Using global constraints to perform rescoring is complementary to
state of the art methods for performing cognates detection and results in
significant performance improvements beyond current state of the art
performance on publicly available datasets with different language pairs and
various conditions such as different levels of baseline state of the art
performance and different data size conditions, including with more realistic
large data size conditions than have been evaluated with in the past.Comment: 10 pages, 6 figures, 6 tables; published in the Proceedings of the
55th Annual Meeting of the Association for Computational Linguistics, pages
1983-1992, Vancouver, Canada, July 201
Graphonological Levenshtein Edit Distance: Application for Automated Cognate Identification
This paper presents a methodology for calculating a modified Levenshtein edit distance between character strings, and applies it to the task of automated cognate identification from non-parallel (comparable) corpora. This task is an important stage in developing MT systems and bilingual dictionaries beyond the coverage of traditionally used aligned parallel corpora, which can be used for finding translation equivalents for the âlong tailâ in Zipfian distribution: low-frequency and usually unambiguous lexical items in closely-related languages (many of those often under-resourced). Graphonological Levenshtein edit distance relies on editing hierarchical representations of phonological features for graphemes (graphonological representations) and improves on phonological edit distance proposed for measuring dialectological variation. Graphonological edit distance works directly with character strings and does not require an intermediate stage of phonological transcription, exploiting the advantages of historical and morphological principles of orthography, which are obscured if only phonetic principle is applied. Difficulties associated with plain feature representations (unstructured feature sets or vectors) are addressed by using linguistically-motivated feature hierarchy that restricts matching of lower-level graphonological features when higher-level features are not matched. The paper presents an evaluation of the graphonological edit distance in comparison with the traditional Levenshtein edit distance from the perspective of its usefulness for the task of automated cognate identification. It discusses the advantages of the proposed method, which can be used for morphology induction, for robust transliteration across different alphabets (Latin, Cyrillic, Arabic, etc.) and robust identification of words with non-standard or distorted spelling, e.g., in user-generated content on the web such as posts on social media, blogs and comments. Software for calculating the modified feature-based Levenshtein distance, and the corresponding graphonological feature representations (vectors and the hierarchies of graphemesâ features) are released on the authorâs webpage: http://corpus.leeds.ac.uk/bogdan/phonologylevenshtein/. Features are currently available for Latin and Cyrillic alphabets and will be extended to other alphabets and languages
Translating Linguistic Metaphors in Both Directions: A Process-Oriented Study on English-Chinese Translation
Distinguished from conceptual metaphor, linguistic metaphor refers to metaphor in fixed linguistic form (words, phrases or sentences) of expression. (Lakoff 1993, pp. 202-203) With the development of modern technology, researchers started to investigate the translation process of linguistic metaphor from empirical approaches (e.g. SjĂžrup, 2013; Zheng and Xiang, 2011 etc.). However, one critical issue remains unexplored: the relationship between translation directionality and the process of linguistic metaphor translation.
To fill this gap on the language pair Chinese and English, this study is designed to investigate the impact of linguistic metaphor on cognitive effort, and whether this impact is affected by directionality. Thirty-eight novice translators performed a series of translation tasks (first language (L1): Chinese; second language (L2): English), and their performances were recorded by eye tracking, key logging and cue-based Retrospective Think Aloud devices. For objective description, four eye-key combination indicators are calculated in Generalised Linear Models to demonstrate translatorsâ allocation of cognitive resources, namely, Total Attentional Duration (TA duration), AU count, AU duration and pupil dilation.
The findings suggest that: for the sequential and parallel coordination of Source Text (ST) processing and Target Text (TT) processing, TT processing receives significantly more cognitive effort than ST processing and parallel processing, which partially confirms that Carl and Dragsted (2012) and Hvelplund (2011)âs views on translatorsâ allocation of cognitive resources are valid for the language pair English and Chinese. Furthermore, it is discovered that the qualitative data from the subjective reflection vary with the quantitative results in this study. For metaphorâs impact on cognitive effort, expression type (linguistic metaphor) can significantly affect participantsâ allocation of cognitive resources in both translation directions (SjĂžrup, 2013; Dagut, 1987; Newmark, 1988), but the results of different indicators are not consistent. And there is also a significant difference between eye-key data and participantsâ subjective self-reflections. For the translation directionality, the results partially confirm that the âtranslation asymmetryâ (Chang, 2011) is valid on metaphor related processing: at some perspectives, the translation directionality can significantly affect the relationship between metaphor related expression types and attention-distribution pattern of translation process
Translationese and post-editese : how comparable is comparable quality?
Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as âTranslationeseâ. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text
Empirical modelling of translation and interpreting
"Empirical research is carried out in a cyclic way: approaching a research area bottom-up, data lead to interpretations and ideally to the abstraction of laws, on the basis of which a theory can be derived. Deductive research is based on a theory, on the basis of which hypotheses can be formulated and tested against the background of empirical data. Looking at the state-of-the-art in translation studies, either theories as well as models are designed or empirical data are collected and interpreted. However, the final step is still lacking: so far, empirical data has not lead to the formulation of theories or models, whereas existing theories and models have not yet been comprehensively tested with empirical methods. This publication addresses these issues from several perspectives: multi-method product- as well as process-based research may gain insights into translation as well as interpreting phenomena. These phenomena may include cognitive and organizational processes, procedures and strategies, competence and performance, translation properties and universals, etc. Empirical findings about the deeper structures of translation and interpreting will reduce the gap between translation and interpreting practice and model and theory building. Furthermore, the availability of more large-scale empirical testing triggers the development of models and theories concerning translation and interpreting phenomena and behavior based on quantifiable, replicable and transparent data.
Computational Investigations on Polymerase Actions in Gene Transcription and Replication Combining Physical Modeling and Atomistic Simulations
Polymerases are protein enzymes that move along nucleic acid chains and
catalyze template-based polymerization reactions during gene transcription and
replication. The polymerases also substantially improve transcription or
replication fidelity through the non-equilibrium enzymatic cycles. We briefly
review computational efforts that have been made toward understanding
mechano-chemical coupling and fidelity control mechanisms of the polymerase
elongation. The polymerases are regarded as molecular information motors during
the elongation process. It requires a full spectrum of computational approaches
from multiple time and length scales to understand the full polymerase
functional cycle. We keep away from quantum mechanics based approaches to the
polymerase catalysis due to abundant former surveys, while address only
statistical physics modeling approach and all-atom molecular dynamics
simulation approach. We organize this review around our own modeling and
simulation practices on a single-subunit T7 RNA polymerase, and summarize
commensurate studies on structurally similar DNA polymerases. For multi-subunit
RNA polymerases that have been intensively studied in recent years, we leave
detailed discussions on the simulation achievements to other computational
chemical surveys, while only introduce very recently published representative
studies, including our own preliminary work on structure-based modeling on
yeast RNA polymerase II. In the end, we quickly go through kinetic modeling on
elongation pauses and backtracking activities. We emphasize the fluctuation and
control mechanisms of the polymerase actions, highlight the non-equilibrium
physical nature of the system, and try to bring some perspectives toward
understanding replication and transcription regulation from single molecular
details to a genome-wide scale
- âŠ