    Pronominal Anaphora in Basque: computational point of view and the development of a corpus

    This paper describes the process of annotating pronominal anaphor in a corpus of Basque which consists of 54.000 words. Our aim is to use this annotation as a basis for later computational processing. The linguistic study carried out and the criteria defined for the tagging process are also presented in the pape

    Corpora for Computational Linguistics

    Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed

    Linguistics parameters for zero anaphora resolution

    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    Optimization issues in machine learning of coreference resolution

    Gramatika jaietan Patxi Goenagaren omenez

    Aurkibidea / Índice / Index:- Hitzaurrea.- Curriculum vitae Patxi Goenaga Mendizabal.- Axun Aierbe Mendizabal: Euskal estilo-liburuetako gramatika-arloko itzulpengomendioez.- Gontzal Aldai: Patxi Goenagari 30 mila esker.- Izaskun Aldezabal Roteta: Aditz-azpikategorizazioa.- Iñaki Amundarain: Behar izan + partizipioa: geroaldiko balioaz.- M. J. Aranzabe, J. M. Arriola and Arantza Diaz de Ilarraza: Theoretical and.- methodological issues of tagging noun phrases structures following dependency grammar formalism.- Xabier Artiagoitia: Some arguments for complement-head order in Basque DPs.- Miren Azkarate Villar: Gertaera- eta emaitza-izenak.- Andoni Barreña, Marijose Ezeizabarrena eta Iñaki García: Entzundako hizkuntzaren eragina haur euskaldun txikien gramatika-garapenean.- Gidor Bilbao: Claude Maugerren eskuliburua Urteren eredu.- Klara Ceberio, Itziar Aduriz, Arantza Diaz de Ilarraza eta Inés M. Garcia Azkoaga: Erreferentziakidetasunaren azterketa eta anotazioa euskarazko corpus batean.- Karlos Cid Abasolo: Gramatika Atxagaren literatur bideetan (I).- Maia Duguine eta Aritz Irurtzun: Ohar batzuk nafar-lapurterazko galdera eta galdegai indartuez.- Luis Eguren: Clíticos léxicos y elipsis nominal.- José Luis Erdozia: Burundako hizkera, Arabako ekialdekoaren hondar euskalkia.- Maitena Etxebarria Arostegui: Análisis y evaluación de la vitalidad sociolingüística del euskera en la C.A.V.- Urtzi Etxeberria eta Ricardo Etxepare: Izen eta gertakarien gaineko kuantifikazioa.- Ricardo Etxepare and Myriam Uribe-Etxebarria: On negation and focus in Spanish and Basque.- Juan Garzia: Bada arazorik etik arazoak daude raino: existentzia-predikazioa eta inespezifikotasuna.- Ricardo Gómez: Euskal gramatikagintza zaharraren historia laburra: xvii-xviii.- mendeak.- Lluïsa Gràcia y Berta Crous: Sobre algunos predicados con fer y tenir en catalán: fer un infart vs. tenir un infart.- Bill Haddican and Paul Foulkes: Mid Vowel Raising and Second Vowel Deletion in Oiartzun Basque.- José Ignacio Hualde eta Oihana Lujanbio: Goizuetako azentuera.- Orreaga Ibarra Murillo: Sobre estrategias discursivas del lenguaje de los jóvenes vascoparlantes: aspectos pragmáticos y discursivos (conectores, marcadores).- Itziar Idiazabal: Gramatika eta hiz kun tzaren didaktika.- Itziar Laka: Senezkotasuna hizkuntzan: Gramatika Unibertsalaren inguruko hausnarketa.- Joseba A. Lakarra: Aitzineuskararen gramatikarantz (malkar eta osinetan zehar).- Mikel Lersundi, Igone Zabala eta Agurtzane Elordui: Aditzetiko izenen emankortasunaren azterketa morfopragmatikoa euskarazko corpus orokor eta berezituetan.- Ángel López García: Sobre una propiedad superestructural de la lengua vasca.- Juan Karlos López-Mugartza Iriarte: Erronkaribarko oikonimia, mitoak eta elezaharra.- Jesus Mari Makazaga Eizagirre: Ahozko jarduna komunikazioaren lagungarri: ekarpen bat ahozkoaren estrategia komunikatiboez.- Roger Martin and Juan Uriagereka: Competence for preferences.- Juan Carlos Moreno Cabrera: Alokutibotasunari buruzko zenbait hausnarketa hizkuntzalaritza orokorraren ikuspegitik.- Céline Mounole: Sintaxi diakronikoa eta aditz multzoaren garapena: Inperfektibozko perifrasiaren sorreraz.- José Antonio Mujika: Adlatiboaren berbalizazioaz.- Juan Carlos Odriozola Pereira: Quantifying compounds.- Miren Lourdes Oñederra: Izan edo ez izan: Fonologiak fonetikari ordaintzen diona afrikatuekin.- Javier Ormazabal: Kausatibo aldizkatzeak euskaraz eta inguruko hizkuntzetan.- B. Oyharçabal: Naturalist conceptions about agglutinative languages: Vinson’s ideas about Basque and linguistic Darwinism.- Georges Rebuschi: On older Northern Basque exclamatives in ala.- Milan Rezac: The forms of dative displacement: From Basauri to Itelmen.- Patxi Salaberri: Satznamen direlakoen inguruan. Erlatibozko perpausetan jatorri duten toponimoak aztergai.- Pello Salaburu: Hiztegi kontuak Baztan aldean.- Itziar San Martín: Defective domains in Basque nominalized dependants.- Ibon Sarasola: Iparraldeko hiztegigintza Larramendiren paradigmaren garaian.- Esther Torrego: Revisiting Romance SE.- Itziar Túrrez: Ideas acerca de la lengua de Tomás Tamayo de Vargas: una lectura de sus Anotaciones a Garcilaso.- Blanca Urgell: Berriemaileen gaitasuna eta eredu lexikografikoaren eragina Landucciren hiztegian.- Vidal Valmala: Topic, focus and quantifier float.- Koldo Zuazo: Euskara (batu)aren historiarako.- Juan Joxe Zubiri eta Patxi Salaberri: Zenbait irain-hitzen erabilera. Deklinabide-kasu hautsiak