    Learning discourse-new references in portuguese texts

    This work presents the evaluation of a discourse status classifier for the Portuguese language. It considers two distinguished classes of discourse novelty: Brand-new and New references. An evaluation of the relevant features according to different linguistic levels are presented in detail.IFIP International Conference on Artificial Intelligence in Theory and Practice - Speech and Natural LanguageRed de Universidades con Carreras en Informática (RedUNCI

    Análise da capacidade de identificação de paráfrase em ferramentas de resolução de correferência

    Os fenômenos linguísticos de correferência e paráfrase compartilham certos aspectos. É comum, por exemplo, referir-se a uma mesma entidade de maneiras diferentes em um mesmo contexto, assim, a resolução de correferências pode auxiliar o processo de identificação de paráfrases. Este artigo apresenta uma análise das capacidades da ferramenta de resolução de correferência CORP, para Português, no contexto de identificação de paráfrases nos níveis de sentença e de sintagma

    Extra??o de rela??es do dom?nio de organiza??es para o portugu?s

    Made available in DSpace on 2015-04-14T14:50:10Z (GMT). No. of bitstreams: 1 457562.pdf: 2425407 bytes, checksum: fefac4edf439614aa48e880ee5b36971 (MD5) Previous issue date: 2014-01-16The task of Relation Extraction from texts is one of the main challenges in the area of Information Extraction, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This task aims at identifying and classifying semantic relations that occur between entities recognized in a given text. For example, the sentence Next Saturday, Ronaldo Lemos, director of Creative Commons, will participate in a debate [...]" expresses a institutionalbond" relation that occurs between the named entities Ronaldo Lemos" and Creative Commons". This thesis proposes a process for extraction of relation descriptors, which describes the explicit relations between named entities in the Organization domain (Person, Organization and Location) by applying, to texts in Portuguese, Conditional Random Fields (CRF), a probabilistic model that has been used in various tasks e⇥ciently in processing sequential text, including the task of Relation Extraction. In order to implement the proposed process, a reference corpus for extracting relations, necessary for learning, was manually annotated based on a reference corpus for named entities (HAREM). Based on an extensive literature review on the automatic extraction of relations task, features of different types were defined. An experimental evaluation was performed to evaluate the learned model utilizing the defined features. Different input feature configurations for CRF were evaluated. Among them, the highlight was the inclusion of the semantic feature based on the named entity category, since this feature could express, in a better way, the kind of relationship between the pair of named entities we want to identify. Finally, the best results correspond to the extraction of relations between the named entities of Organization and Person categories, in which the F -measure rates were 57% and 63%, considering the correct and partially correct extractions, respectively.A tarefa de Extra??o de Rela??es a partir de textos ? um dos principais desafios da ?rea de Extra??o de Informa??o, tendo em vista o conhecimento lingu?stico exigido e a sofistica??o das t?cnicas de processamento da l?ngua empregados. Essa tarefa visa identificar e classificar rela??es sem?nticas que ocorrem entre entidades reconhecidas em um determinado texto. Por exemplo, o trecho No pr?ximo S?bado, Ronaldo Lemos, diretor da Creative Commons, ir? participar de um debate (...)" expressa uma rela??o de v?nculo-institucional" que ocorre entre as entidades nomeadas Ronaldo Lemos" e Creative Commons". Esta tese prop?e um processo para extra??o de descritores de rela??o, os quais descrevem rela??es expl?citas entre entidades nomeadas do dom?nio de Organiza??es (Pessoa, Organiza??o e Local) utilizando o modelo probabil?stico Conditional Random Fields (CRF), e sua aplica??o em textos da L?ngua Portuguesa. O modelo probabil?stico CRF tem sido aplicado eficientemente em diversas tarefas de processamento de texto sequencial, incluindo recentemente a tarefa de Extra??o de Rela??es. A fim de aplicar o processo proposto, um corpus de refer?ncia para extra??o de rela??es, necess?rio para o aprendizado, foi anotado manualmente, tomando como base um corpus de refer?ncia para entidades nomeadas (HAREM). Com base em uma extensa revis?o da literatura sobre a tarefa de extra??o autom?tica de rela??es, features de diferentes naturezas foram definidas. Uma avalia??o experimental foi realizada com o objetivo de avaliar o modelo aprendido utilizando as features definidas. Diferentes configura??es de features de entrada para o CRF foram avaliadas. Dentre elas, destacou-se a inclus?o da feature sem?ntica baseada na categoria da entidade nomeada, j? que essa feature conseguiu expressar melhor o tipo de rela??o que se deseja identificar entre o par de entidades nomeadas. Por fim, os melhores resultados obtidos correspondem ? extra??o de rela??es entre as entidades nomeadas das categorias Organiza??o e Pessoa, na qual as taxas de F-measure foram de 57% e 63%, considerando as extra??es corretas e parcialmente corretas, respectivamente

    Corref-PT:A Semi-Automatic Annotated Portuguese Coreference Corpus

    This paper describes the Portuguese core- ference corpus Corref-PT, annotated semi-automatically using the coreference annotation tool CORP, and manually revised with the editing tool CorrefVisual. It includes a total of 182 texts, mostly news (corpus CSTNews, corpus LE-PAROLE, FAPESP magazine) but also articles from Wikipedia. The result is a corpus that includes a total of 3898 reference chains. We present the coreference annotation tool CORP, which was built on the basis of deterministic rules, and the editor CorrefVisual used for manual revision. We report on the annotation agreement and on the feedback provided by the annotators regarding the editor and the complexity of the task. Examples of technical and linguistic issues encountered during the annotation are given and the pros and cons of such approach for corpus construction are discussed. Our motivation was to use of a semi-automatic approach to increase the set of available resources for coreference resolution applications for Portuguese

    Relation Extraction for Competitive Intelligence

    Competitive intelligence (CI) has become one of the major subjects for strategic process in an organization in the recent years. CI gives support to the strategic business area and works as a sensor, showing managers how to position their organization as competitive in the market. In this paper, we show how Relation Extraction supports CI to collect and organize external information from unstructured data collected from newspaper, blogs, magazines and informational portals.CEECIND/01997/2017, UIDB/00057/202

    Cross-Framework Evaluation for Portuguese POS Taggers and Parsers

    This work compares POS and parsing systems for the Por- tuguese language. We analyse available features, tagsets, and compare the results of POS tagging, and syntactic structure identification by means of both intrinsic and extrinsic evaluation methods. For such, we use in this work well-known metric for parser evaluation such as bracket cross, leaf ancestor for intrinsic evaluation, as well as the application of such parsers to the task of noun phrase identification, for extrinsic eval- uation. The comparison proposed in this work takes into account the different linguistic theories and frameworks each parser subscribes to, but it is not dependent of any particular one