1,327 research outputs found
Coreference chains in Czech, English and Russian: Preliminary findings
Tento článek je pilotní srovnavací výzkum koreferenčních řetězců v češtině, angličtině a ruštině. Podrobili jsme analýze 16 srovnatelných textů ve třech jazycích. Naší motivací bylo zjistit lingvistickou strukturu koreferenčních řetězců v těchto jazycích a určit, které faktory ovlivňují tuto strukturu
Corpora for Computational Linguistics
Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction.
Their influence on other fields is also briefly discussed
An Empirical Approach to Temporal Reference Resolution
This paper presents the results of an empirical investigation of temporal
reference resolution in scheduling dialogs. The algorithm adopted is primarily
a linear-recency based approach that does not include a model of global focus.
A fully automatic system has been developed and evaluated on unseen test data
with good results. This paper presents the results of an intercoder reliability
study, a model of temporal reference resolution that supports linear recency
and has very good coverage, the results of the system evaluated on unseen test
data, and a detailed analysis of the dialogs assessing the viability of the
approach.Comment: 13 pages, latex using aclap.st
Limitations and possibilities of machine translation: a case study
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão. Programa de Pós-Graduação em Letras/Inglês e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglês it para o português. Apresenta também um breve panorama geral do desenvolvimento da tradução de máquina desde seu início até a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das línguas de partida e chegada foi coletado. Também foi utilizado um esquema de anotação especificamente desenvolvido para os propósitos deste estudo, a fim de classificar as 305 ocorrências do pronome it. Os elementos que compõem a anotação são: função sintática, tipo de antecedente e estratégia de processamento, os quais são discutidos nesta dissertação. Os resultados são comparados a traduções de sistemas comerciais de tradução de máquina, tendo como parâmetro soluções apresentadas por tradutores humanos no corpus. Sugestões são feitas quanto a possíveis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus são comparados com os princípios das presentes abordagens de tradução de máquina, numa tentativa de enriquecer a discussão sobre as atuais tendências nesta área
Resolving pronominal anaphora using commonsense knowledge
Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web
- …