1,231 research outputs found

    Limitations and possibilities of machine translation: a case study

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e ExpressĂŁo. Programa de PĂłs-Graduação em Letras/InglĂȘs e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglĂȘs it para o portuguĂȘs. Apresenta tambĂ©m um breve panorama geral do desenvolvimento da tradução de mĂĄquina desde seu inĂ­cio atĂ© a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das lĂ­nguas de partida e chegada foi coletado. TambĂ©m foi utilizado um esquema de anotação especificamente desenvolvido para os propĂłsitos deste estudo, a fim de classificar as 305 ocorrĂȘncias do pronome it. Os elementos que compĂ”em a anotação sĂŁo: função sintĂĄtica, tipo de antecedente e estratĂ©gia de processamento, os quais sĂŁo discutidos nesta dissertação. Os resultados sĂŁo comparados a traduçÔes de sistemas comerciais de tradução de mĂĄquina, tendo como parĂąmetro soluçÔes apresentadas por tradutores humanos no corpus. SugestĂ”es sĂŁo feitas quanto a possĂ­veis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus sĂŁo comparados com os princĂ­pios das presentes abordagens de tradução de mĂĄquina, numa tentativa de enriquecer a discussĂŁo sobre as atuais tendĂȘncias nesta ĂĄrea

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Abstract pronominal anaphors and label nouns in German and English: Selected case studies and quantitative investigations

    Get PDF
    Abstract anaphors refer to abstract referents, such as facts or events. This paper presents a corpus-based comparative study of German and English abstract anaphors. Parallel bi-directional texts from the Europarl Corpus were annotated with functional and morpho-syntactic information, focusing on the pronouns ‘it’, ‘this’, and ‘that’, as well as demonstrative noun phrases headed by “label nouns”, such as ‘this event’, ‘that issue’, etc., and their German counterparts. We induce information about the cross-linguistic realization of abstract anaphors from the parallel texts. The contrastive findings are then controlled for translation-specific characteristics by examination of the differences between the original text and the translated text in each of the languages. In selected case studies, we investigate in detail “translation mismatches”, including changes in grammatical category (from pronouns to full noun phrases, and vice versa), grammatical function, or clausal position, addition or omission of modifying adjectives, changes in the lexical realization of head nouns, and transpositions of the demonstrative determiner. In some of these cases, the specificity of the abstract noun phrase is altered by the translation process

    Towards the creation of a CNL adapted to requirements writing by combining writing recommendations and spontaneous regularities : example in a Space Project

    Get PDF
    International audienceThe Quality Department of the French National Space Agency (CNES, Centre National d’Études Spatiales) wishes to design a writing guide based on the real and regular writing of requirements. As a first step in this project, the present article proposes a linguistic analysis of requirements written in French by CNES engineers. One of our goals is to determine to what extent they conform to several rules laid down in two existing Controlled Natural Languages (CNLs), namely the Simplified Technical English developed by the AeroSpace and Defense Industries Association of Europe and the Guide for Writing Requirements proposed by the International Council on Systems Engineering. Indeed, although CNES engineers are not obliged to follow any controlled language in their writing of requirements, we believe that language regularities are likely to emerge from this task, mainly due to the writers’ experience. We are seeking to identify these regularities in order to use them as a basis for a new CNL for the writing of requirements. The issue is approached using natural language processing tools to identify sentences that do not comply with the rules or contain specific linguistic phenomena. We further review these sentences to understand why the recommendations cannot (or should not) always be applied when specifying large-scale projects

    'A man who revels in his own ignorance, racism and misogyny': Identifiable referents trump indefinite grammar

    Get PDF
    Typically, a noun phrase beginning with the indefinite article introduces a referent assumed to be unknown to the addressee. But in newspaper opinion journalism, this is not always the case. In ‘instead of hailing its first female president, it [the US] seems poised to hand the awesome power of its highest office to a man who revels in his own ignorance, racism and misogyny’ (The Guardian, 9/11/16), ‘a man who
’ can be understood as a new referent or type. But once seen in context, where the identity of the man is known, it becomes clear that it is signalling something different. This paper examines how this sort of reference works by challenging existing accounts of ‘late’ indefinites and the meaning relation of co-extension. It is shown that lexical cohesive ties between the expression and preceding text and context create a shared space which allows these expressions to function ‘definitely’

    Resolving pronominal anaphora using commonsense knowledge

    Get PDF
    Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web

    The Generation of Compound Nominals to Represent the Essence of Text The COMMIX System

    Get PDF
    This thesis concerns the COMMIX system, which automatically extracts information on what a text is about, and generates that information in the highly compacted form of compound nominal expressions. The expressions generated are complex and may include novel terms which do not appear themselves in the input text. From the practical point of view, the work is driven by the need for better representations of content: for representations which are shorter and more concise than would appear in an abstract, yet more informative and representative of the actual aboutness than commonly occurs in indexing expressions and key terms. This additional layer of representation is referred to in this work as pertaining to the essence of a particular text. From a theoretical standpoint, the thesis shows how the compound nominal as a construct can be successfully employed in these highly informative representations. It involves an exploration of the claim that there is sufficient semantic information contained within the standard dictionary glosses for individual words to enable the construction of useful and highly representative novel compound nominal expressions, without recourse to standard syntactic and statistical methods. It shows how a shallow semantic approach to content identification which is based on lexical overlap can produce some very encouraging results. The methodology employed, and described herein, is domain-independent, and does not require the specification of templates with which the input text must comply. In these two respects, the methodology developed in this work avoids two of the most common problems associated with information extraction. As regards the evaluation of this type of work, the thesis introduces and utilises the notion of percentage attainment value, which is used in conjunction with subjects' opinions about the degree to which the aboutness terms succeed in indicating the subject matter of the texts for which they were generated
    • 

    corecore