121 research outputs found

    Verso la costruzione di una biblioteca digitale.

    Get PDF
    A data base of the "Antonio Zampolli Fund" has been created and the respective catalogue has been published1. The work of analysis and selection of texts for cataloguing helped in creating this bibliography, in large part built on references extracted by books and journals. Very old bibliographical references have also been retrieved by curricula prepared by Professor Zampolli for various projects and commissions

    A preliminary study in zero anaphora coreference resolution for Polish

    Get PDF
    A preliminary study in zero anaphora coreference resolution for PolishZero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities. Wstępne studium rozwiązywania problemu koreferencji anafory zerowej w języku polskimKoreferencja zerowa, w języku polskim, jest jednym z zagadnień rozpoznawania koreferencji. Dotychczas nie była ona bezpośrednim przedmiotem badań, gdyż ze względu na jej złożoność była pomijana i odsuwana na dalsze etapy badań. Artykuł prezentuje wstępne studium problemu, jakim jest rozpoznawanie koreferencji zerowej. Przedstawiamy podejście wykorzystujące techniki uczenia maszynowego oraz proces tworzenia cech w oparciu o analizę lingwistyczną korpusu KPWr. W przedstawionej pracy wykorzystujemy istniejące narzędzia do rozpoznawania koreferencji dla pozostałych rodzajów wzmianek (tj. nazwy własne, frazy rzeczownikowe oraz zaimki) jako źródło częściowych zbiorów wzmianek odnoszących się do tego samego obiektu, a także jako punkt odniesienia dla uzyskanych przez nas wyników. Ocena skupia się nie tylko na poprawności uzyskanych zbiorów wzmianek, bez względu na ich typ, co odzwierciedlają wyniki podane dla standardowych metryk CoNLL-2012, ale także na wartości informacji, która zostaje uzyskana w wyniku rozpoznania koreferencji. W nawiązaniu do założeń anotacji korpusu KPWr, jedynie nazwy własne traktowane są jako wzmianki, które zawierają w sobie wystarczająco szczegółową informację, aby można było powiązać je z obiektami rzeczywistymi. W konsekwencji dostarczamy także ocenę opartą na wartości informacji dla podmiotów domyślnych połączonych relacją koreferencji z nazwami własnymi. Z tą samą motywacją rozpatrujemy jedynie zbiory wzmianek koreferencyjnych zbudowane wokół nazw własnych

    Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

    Get PDF
    Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Open-domain web-based multiple document : question answering for list questions with support for temporal restrictors

    Get PDF
    Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2015With the growth of the Internet, more people are searching for information on the Web. The combination of web growth and improvements in Information Technology has reignited the interest in Question Answering (QA) systems. QA is a type of information retrieval combined with natural language processing techniques that aims at finding answers to natural language questions. List questions have been widely studied in the QA field. These are questions that require a list of correct answers, making the task of correctly answering them more complex. In List questions, the answers may lie in the same document or spread over multiple documents. In the latter case, a QA system able to answer List questions has to deal with the fusion of partial answers. The current Question Answering state-of-the-art does not provide yet a good way to tackle this complex problem of collecting the exact answers from multiple documents. Our goal is to provide better QA solutions to users, who desire direct answers, using approaches that deal with the complex problem of extracting answers found spread over several documents. The present dissertation address the problem of answering Open-domain List questions by exploring redundancy and combining it with heuristics to improve QA accuracy. Our approach uses the Web as information source, since it is several orders of magnitude larger than other document collections. Besides handling List questions, we develop an approach with special focus on questions that include temporal information. In this regard, the current work addresses a topic that was lacking specific research. A additional purpose of this dissertation is to report on important results of the research combining Web-based QA, List QA and Temporal QA. Besides the evaluation of our approach itself we compare our system with other QA systems in order to assess its performance relative to the state-of-the-art. Finally, our approaches to answer List questions and List questions with temporal information are implemented into a fully-fledged Open-domain Web-based Question Answering System that provides answers retrieved from multiple documents.Com o crescimento da Internet cada vez mais pessoas buscam informações usando a Web. A combinação do crescimento da Internet com melhoramentos na Tecnologia da Informação traz como consequência o renovado interesse em Sistemas de Respostas a Perguntas (SRP). SRP combina técnicas de recuperação de informação com ferramentas de apoio à linguagem natural com o objetivo de encontrar respostas para perguntas em linguagem natural. Perguntas do tipo lista têm sido largamente estudadas nesta área. Neste tipo de perguntas é esperada uma lista de respostas corretas, o que torna a tarefa de responder a perguntas do tipo lista ainda mais complexa. As respostas para este tipo de pergunta podem ser encontradas num único documento ou espalhados em múltiplos documentos. No último caso, um SRP deve estar preparado para lidar com a fusão de respostas parciais. Os SRP atuais ainda não providenciam uma boa forma de lidar com este complexo problema de coletar respostas de múltiplos documentos. Nosso objetivo é prover melhores soluções para utilizadores que desejam buscar respostas diretas usando abordagens para extrair respostas de múltiplos documentos. Esta dissertação aborda o problema de responder a perguntas de domínio aberto explorando redundância combinada com heurísticas. Nossa abordagem usa a Internet como fonte de informação uma vez que a Web é a maior coleção de documentos da atualidade. Para além de responder a perguntas do tipo lista, nós desenvolvemos uma abordagem para responder a perguntas com restrição temporal. Neste sentido, o presente trabalho aborda este tema onde há pouca investigação específica. Adicionalmente, esta dissertação tem o propósito de informar sobre resultados importantes desta pesquisa que combina várias áreas: SRP com base na Web, SRP especialmente desenvolvidos para responder perguntas do tipo lista e também com restrição temporal. Além da avaliação da nossa própria abordagem, comparamos o nosso sistema com outros SRP, a fim de avaliar o seu desempenho em relação ao estado da arte. Por fim, as nossas abordagens para responder a perguntas do tipo lista e perguntas do tipo lista com informações temporais são implementadas em um Sistema online de Respostas a Perguntas de domínio aberto que funciona diretamente sob a Web e que fornece respostas extraídas de múltiplos documentos.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/65647/2009; European Commission, projeto QTLeap (Quality Translation by Deep Language Engineering Approache
    corecore