3 research outputs found
Rapport : a fact-based question answering system for portuguese
Question answering is one of the longest-standing problems in natural language processing. Although natural language interfaces for computer systems can be considered
more common these days, the same still does not happen regarding access to specific
textual information. Any full text search engine can easily retrieve documents containing user specified or closely related terms, however it is typically unable to answer user
questions with small passages or short answers.
The problem with question answering is that text is hard to process, due to its syntactic structure and, to a higher degree, to its semantic contents. At the sentence level,
although the syntactic aspects of natural language have well known rules, the size and
complexity of a sentence may make it difficult to analyze its structure. Furthermore, semantic aspects are still arduous to address, with text ambiguity being one of the hardest
tasks to handle. There is also the need to correctly process the question in order to define its target, and then select and process the answers found in a text. Additionally, the
selected text that may yield the answer to a given question must be further processed
in order to present just a passage instead of the full text. These issues take also longer
to address in languages other than English, as is the case of Portuguese, that have a lot
less people working on them.
This work focuses on question answering for Portuguese. In other words, our field
of interest is in the presentation of short answers, passages, and possibly full sentences,
but not whole documents, to questions formulated using natural language. For that purpose, we have developed a system, RAPPORT, built upon the use of open information
extraction techniques for extracting triples, so called facts, characterizing information
on text files, and then storing and using them for answering user queries done in natural language. These facts, in the form of subject, predicate and object, alongside other
metadata, constitute the basis of the answers presented by the system. Facts work both
by storing short and direct information found in a text, typically entity related information, and by containing in themselves the answers to the questions already in the
form of small passages. As for the results, although there is margin for improvement,
they are a tangible proof of the adequacy of our approach and its different modules for
storing information and retrieving answers in question answering systems.
In the process, in addition to contributing with a new approach to question answering for Portuguese, and validating the application of open information extraction to
question answering, we have developed a set of tools that has been used in other natural language processing related works, such as is the case of a lemmatizer, LEMPORT,
which was built from scratch, and has a high accuracy. Many of these tools result from
the improvement of those found in the Apache OpenNLP toolkit, by pre-processing their
input, post-processing their output, or both, and by training models for use in those
tools or other, such as MaltParser. Other tools include the creation of interfaces for
other resources containing, for example, synonyms, hypernyms, hyponyms, or the creation of lists of, for instance, relations between verbs and agents, using rules
Open-domain web-based multiple document : question answering for list questions with support for temporal restrictors
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2015With the growth of the Internet, more people are searching for information on the Web. The combination of web growth and improvements in Information Technology has reignited the interest in Question Answering (QA) systems. QA is a type of information retrieval combined with natural language processing techniques that aims at finding answers to natural language questions. List questions have been widely studied in the QA field. These are questions that require a list of correct answers, making the task of correctly answering them more complex. In List questions, the answers may lie in the same document or spread over multiple documents. In the latter case, a QA system able to answer List questions has to deal with the fusion of partial answers. The current Question Answering state-of-the-art does not provide yet a good way to tackle this complex problem of collecting the exact answers from multiple documents. Our goal is to provide better QA solutions to users, who desire direct answers, using approaches that deal with the complex problem of extracting answers found spread over several documents. The present dissertation address the problem of answering Open-domain List questions by exploring redundancy and combining it with heuristics to improve QA accuracy. Our approach uses the Web as information source, since it is several orders of magnitude larger than other document collections. Besides handling List questions, we develop an approach with special focus on questions that include temporal information. In this regard, the current work addresses a topic that was lacking specific research. A additional purpose of this dissertation is to report on important results of the research combining Web-based QA, List QA and Temporal QA. Besides the evaluation of our approach itself we compare our system with other QA systems in order to assess its performance relative to the state-of-the-art. Finally, our approaches to answer List questions and List questions with temporal information are implemented into a fully-fledged Open-domain Web-based Question Answering System that provides answers retrieved from multiple documents.Com o crescimento da Internet cada vez mais pessoas buscam informações usando a Web. A combinação do crescimento da Internet com melhoramentos na Tecnologia da Informação traz como consequência o renovado interesse em Sistemas de Respostas a Perguntas (SRP). SRP combina técnicas de recuperação de informação com ferramentas de apoio à linguagem natural com o objetivo de encontrar respostas para perguntas em linguagem natural. Perguntas do tipo lista têm sido largamente estudadas nesta área. Neste tipo de perguntas é esperada uma lista de respostas corretas, o que torna a tarefa de responder a perguntas do tipo lista ainda mais complexa. As respostas para este tipo de pergunta podem ser encontradas num único documento ou espalhados em múltiplos documentos. No último caso, um SRP deve estar preparado para lidar com a fusão de respostas parciais. Os SRP atuais ainda não providenciam uma boa forma de lidar com este complexo problema de coletar respostas de múltiplos documentos. Nosso objetivo é prover melhores soluções para utilizadores que desejam buscar respostas diretas usando abordagens para extrair respostas de múltiplos documentos. Esta dissertação aborda o problema de responder a perguntas de domínio aberto explorando redundância combinada com heurísticas. Nossa abordagem usa a Internet como fonte de informação uma vez que a Web é a maior coleção de documentos da atualidade. Para além de responder a perguntas do tipo lista, nós desenvolvemos uma abordagem para responder a perguntas com restrição temporal. Neste sentido, o presente trabalho aborda este tema onde há pouca investigação específica. Adicionalmente, esta dissertação tem o propósito de informar sobre resultados importantes desta pesquisa que combina várias áreas: SRP com base na Web, SRP especialmente desenvolvidos para responder perguntas do tipo lista e também com restrição temporal. Além da avaliação da nossa própria abordagem, comparamos o nosso sistema com outros SRP, a fim de avaliar o seu desempenho em relação ao estado da arte. Por fim, as nossas abordagens para responder a perguntas do tipo lista e perguntas do tipo lista com informações temporais são implementadas em um Sistema online de Respostas a Perguntas de domínio aberto que funciona diretamente sob a Web e que fornece respostas extraídas de múltiplos documentos.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/65647/2009; European Commission, projeto QTLeap (Quality Translation by Deep Language Engineering Approache
Uma Abordagem ao Págico baseada no Processamento e Análise de Sintagmas dos Tópicos
Este artigo descreve a abordagem ao <strong>Págico</strong> seguida pelo sistema <strong>Rapportágico</strong>. Trata-se de uma abordagem centrada na indexação dos artigos da Wikipédia, na identificação de sintagmas nas frases dos tópicos dados, e no seu posterior processamento e análise, de forma a facilitar a correspondência entre tópicos e artigos que lhes possam servir de resposta. Os sintagmas facilitam a identificação de pequenas estruturas com diferentes papéis dentro da frase. Antes de serem utilizados para consulta, alguns sintagmas sofrem manipulações, como, por exemplo, a expansão das palavras que os constituem em palavras de significado semelhante (sinónimos). Embora haja ainda um longo caminho a percorrer, o sucesso da abordagem traduziu-se, em termos de resultados, na obtenção de uma pontuação com algum destaque entre todas as participações no <strong>Págico</strong>, especialmente naquelas automáticas