1,327 research outputs found

    Coreference chains in Czech, English and Russian: Preliminary findings

    Get PDF
    Tento článek je pilotní srovnavací výzkum koreferenčních řetězců v češtině, angličtině a ruštině. Podrobili jsme analýze 16 srovnatelných textů ve třech jazycích. Naší motivací bylo zjistit lingvistickou strukturu koreferenčních řetězců v těchto jazycích a určit, které faktory ovlivňují tuto strukturu

    Corpora for Computational Linguistics

    Get PDF
    Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed

    An Empirical Approach to Temporal Reference Resolution

    Full text link
    This paper presents the results of an empirical investigation of temporal reference resolution in scheduling dialogs. The algorithm adopted is primarily a linear-recency based approach that does not include a model of global focus. A fully automatic system has been developed and evaluated on unseen test data with good results. This paper presents the results of an intercoder reliability study, a model of temporal reference resolution that supports linear recency and has very good coverage, the results of the system evaluated on unseen test data, and a detailed analysis of the dialogs assessing the viability of the approach.Comment: 13 pages, latex using aclap.st

    Limitations and possibilities of machine translation: a case study

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão. Programa de Pós-Graduação em Letras/Inglês e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglês it para o português. Apresenta também um breve panorama geral do desenvolvimento da tradução de máquina desde seu início até a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das línguas de partida e chegada foi coletado. Também foi utilizado um esquema de anotação especificamente desenvolvido para os propósitos deste estudo, a fim de classificar as 305 ocorrências do pronome it. Os elementos que compõem a anotação são: função sintática, tipo de antecedente e estratégia de processamento, os quais são discutidos nesta dissertação. Os resultados são comparados a traduções de sistemas comerciais de tradução de máquina, tendo como parâmetro soluções apresentadas por tradutores humanos no corpus. Sugestões são feitas quanto a possíveis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus são comparados com os princípios das presentes abordagens de tradução de máquina, numa tentativa de enriquecer a discussão sobre as atuais tendências nesta área

    Building a Diverse Document Leads Corpus Annotated with Semantic Relations

    Get PDF

    Resolving pronominal anaphora using commonsense knowledge

    Get PDF
    Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web
    corecore