2,161 research outputs found

    Is question answering fit for the Semantic Web? A survey

    Get PDF
    With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open userfriendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    Universal Workload-based Graph Partitioning and Storage Adaption for Distributed RDF Stores

    Get PDF
    The publication of machine-readable information has been significantly increasing both in the magnitude and complexity of the embedded relations. The Resource Description Framework(RDF) plays a big role in modeling and linking web data and their relations. In line with that important role, dedicated systems were designed to store and query the RDF data using a special queering language called SPARQL similar to the classic SQL. However, due to the high size of the data, several federated working nodes were used to host a distributed RDF store. The data needs to be partitioned, assigned, and stored in each working node. After partitioning, some of the data needs to be replicated in order to avoid the communication cost, and balance the loads for better system throughput. Since replications require more storage space, the important two questions are: what data to replicate? And how much? The answer to the second question is related to other storage-space requirements at each working node like indexes and cache. In order to efficiently answer SPARQL queries, each working node needs to put its share of data into multiple indexes. Those indexes have a data-wide size and consume a considerable amount of storage space. In this context, the same two questions about replications are also raised about indexes. The third storage-consuming structure is the join cache. It is a special index where the frequent join results are cached and save a considerable amount of running time on the cost of high storage space consumption. Again, the same two questions of replication and indexes are applicable to the join-cache. In this thesis, we present a universal adaption approach to the storage of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. To achieve this, we present a cost model based on the workload that often contains frequent patterns. The workload is dynamically analyzed to evaluate predefined rules. Those rules tell the system about the benefits and costs of assigning which data to what structure. The objective is to have better query execution time. Besides the storage adaption, the system adapts its processing resources with the queries' arrival rate. The aim of this adaption is to have better parallelization per query while still provides high system throughput

    Enabling Complex Semantic Queries to Bioinformatics Databases through Intuitive Search Over Data

    Get PDF
    Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data already available publicly. However, the heterogene- ity of the existing data sources still poses significant challenges for achieving interoperability among biological databases. Furthermore, merely solving the technical challenges of data in- tegration, for example through the use of common data representation formats, leaves open the larger problem. Namely, the steep learning curve required for understanding the data models of each public source, as well as the technical language through which the sources can be queried and joined. As a consequence, most of the available biological data remain practically unexplored today. In this thesis, we address these problems jointly, by first introducing an ontology-based data integration solution in order to mitigate the data source heterogeneity problem. We illustrate through the concrete example of Bgee, a gene expression data source, how relational databases can be exposed as virtual Resource Description Framework (RDF) graphs, through relational-to-RDF mappings. This has the important advantage that the original data source can remain unmodified, while still becoming interoperable with external RDF sources. We complement our methods with applied case studies designed to guide domain experts in formulating expressive federated queries targeting the integrated data across the domains of evolutionary relationships and gene expression. More precisely, we introduce two com- parative analyses, first within the same domain (using orthology data from multiple, inter- operable, data sources) and second across domains, in order to study the relation between expression change and evolution rate following a duplication event. Finally, in order to bridge the semantic gap between users and data, we design and im- plement Bio-SODA, a question answering system over domain knowledge graphs, that does not require training data for translating user questions to SPARQL. Bio-SODA uses a novel ranking approach that combines syntactic and semantic similarity, while also incorporating node centrality metrics to rank candidate matches for a given user question. Our results in testing Bio-SODA across several real-world databases that span multiple domains (both within and outside bioinformatics) show that it can answer complex, multi-fact queries, be- yond the current state-of-the-art in the more well-studied open-domain question answering. -- L’intĂ©gration des donnĂ©es promet d’ĂȘtre l’un des principaux catalyseurs permettant d’extraire des nouveaux aperçus de la richesse des donnĂ©es biologiques dĂ©jĂ  disponibles publiquement. Cependant, l’hĂ©tĂ©rogĂ©nĂ©itĂ© des sources de donnĂ©es existantes pose encore des dĂ©fis importants pour parvenir Ă  l’interopĂ©rabilitĂ© des bases de donnĂ©es biologiques. De plus, en surmontant seulement les dĂ©fis techniques de l’intĂ©gration des donnĂ©es, par exemple grĂące Ă  l’utilisation de formats standard de reprĂ©sentation de donnĂ©es, on laisse ouvert un problĂšme encore plus grand. À savoir, la courbe d’apprentissage abrupte nĂ©cessaire pour comprendre la modĂ©li- sation des donnĂ©es choisie par chaque source publique, ainsi que le langage technique par lequel les sources peuvent ĂȘtre interrogĂ©s et jointes. Par consĂ©quent, la plupart des donnĂ©es biologiques publiquement disponibles restent pratiquement inexplorĂ©s aujourd’hui. Dans cette thĂšse, nous abordons l’ensemble des deux problĂšmes, en introduisant d’abord une solution d’intĂ©gration de donnĂ©es basĂ©e sur ontologies, afin d’attĂ©nuer le problĂšme d’hĂ©tĂ©- rogĂ©nĂ©itĂ© des sources de donnĂ©es. Nous montrons, Ă  travers l’exemple de Bgee, une base de donnĂ©es d’expression de gĂšnes, une approche permettant les bases de donnĂ©es relationnelles d’ĂȘtre publiĂ©s sous forme de graphes RDF (Resource Description Framework) virtuels, via des correspondances relationnel-vers-RDF (« relational-to-RDF mappings »). Cela prĂ©sente l’important avantage que la source de donnĂ©es d’origine peut rester inchangĂ©, tout en de- venant interopĂ©rable avec les sources RDF externes. Nous complĂ©tons nos mĂ©thodes avec des Ă©tudes de cas appliquĂ©es, conçues pour guider les experts du domaine dans la formulation de requĂȘtes fĂ©dĂ©rĂ©es expressives, ciblant les don- nĂ©es intĂ©grĂ©es dans les domaines des relations Ă©volutionnaires et de l’expression des gĂšnes. Plus prĂ©cisĂ©ment, nous introduisons deux analyses comparatives, d’abord dans le mĂȘme do- maine (en utilisant des donnĂ©es d’orthologie provenant de plusieurs sources de donnĂ©es in- teropĂ©rables) et ensuite Ă  travers des domaines interconnectĂ©s, afin d’étudier la relation entre le changement d’expression et le taux d’évolution suite Ă  une duplication de gĂšne. Enfin, afin de mitiger le dĂ©calage sĂ©mantique entre les utilisateurs et les donnĂ©es, nous concevons et implĂ©mentons Bio-SODA, un systĂšme de rĂ©ponse aux questions sur des graphes de connaissances domaine-spĂ©cifique, qui ne nĂ©cessite pas de donnĂ©es de formation pour traduire les questions des utilisateurs vers SPARQL. Bio-SODA utilise une nouvelle ap- proche de classement qui combine la similaritĂ© syntactique et sĂ©mantique, tout en incorporant des mĂ©triques de centralitĂ© des nƓuds, pour classer les possibles candidats en rĂ©ponse Ă  une question utilisateur donnĂ©e. Nos rĂ©sultats suite aux tests effectuĂ©s en utilisant Bio-SODA sur plusieurs bases de donnĂ©es Ă  travers plusieurs domaines (tantĂŽt liĂ©s Ă  la bioinformatique qu’extĂ©rieurs) montrent que Bio-SODA rĂ©ussit Ă  rĂ©pondre Ă  des questions complexes, en- gendrant multiples entitĂ©s, au-delĂ  de l’état actuel de la technique en matiĂšre de systĂšmes de rĂ©ponses aux questions sur les donnĂ©es structures, en particulier graphes de connaissances

    Biomedical semantic question and answering system

    Get PDF
    Tese de mestrado, InformĂĄtica, Universidade de Lisboa, Faculdade de CiĂȘncias, 2017Os sistemas de Question Answering sĂŁo excelentes ferramentas para a obtenção de respostas simples e em vĂĄrios formatos de uma maneira tambÂŽem simples, sendo de grande utilidade na ĂĄrea de Information Retrieval, para responder a perguntas da comunidade online, e tambĂ©m para fins investigativos ou de prospeção de informação. A ĂĄrea da saĂșde tem beneficiado muito com estes avanços, auxiliados com o progresso da tecnologia e de ferramentas delas provenientes, que podem ser usadas nesta ĂĄrea, resultando na constante informatização destas ĂĄreas. Estes sistemas tĂȘm um grande potencial, uma vez que eles acedem a grandes conjuntos de dados estruturados e nĂŁo estruturados, como por exemplo, a Web ou a grandes repositĂłrios de informação provenientes de lĂĄ, de forma a obter as suas respostas, e no caso da comunidade de perguntas e respostas, fĂłruns online de perguntas e respostas em threads por temĂĄtica. Os dados nĂŁo estruturados fornecem um maior desafio, apesar dos dados estruturados de certa maneira limitar o leque de opçÔes transformativas sobre os mesmos. A mesma disponibilização de tais conjuntos de dados de forma pĂșblica em formato digital oferecem uma maior liberdade para o pĂșblico, e mais especificamente os investigadores das ĂĄreas especĂ­ficas envolvidas com estes dados, permitindo uma fĂĄcil partilha das mesmas entre os vĂĄrios interessados. De um modo geral, tais sistemas nĂŁo estĂŁo disponĂ­veis para reutilização pĂșblica, porque estĂŁo limitados ao campo da investigação, para provar conceitos de algoritmos especĂ­ficos, sĂŁo de difĂ­cil reutilização por parte de um pĂșblico mais alargado, ou sĂŁo ainda de difĂ­cil manutenção, pois rapidamente podem ficar desatualizados, principalmente nas tecnologias usadas, que podem deixar de ter suporte. O objetivo desta tese Ă© desenvolver um sistema que colmate algumas destas falhas, promovendo a modularidade entre os mĂłdulos, o equilĂ­brio entre a implementação e a facilidade de utilização, desempenho dos sub-mĂłdulos, com o mĂ­nimo de prĂ©-requisitos possĂ­veis, tendo como resultado final um sistema de QA base adapaptado para um domĂ­nio de conhecimento. Tal sistema serĂĄ constituĂ­do por subsistemas provados individualmente. Nesta tese, sĂŁo descritobos vĂĄrios tipos de sistemas, como os de prospecção de informação e os baseados em conhecimento, com enfoque em dois sistemas especĂ­ficos desta ĂĄrea, o YodaQA e o OAQA. SĂŁo apresentadas tambĂ©m vĂĄrias ferramentas Ășteis e que sĂŁo recorridas em vĂĄrios destes sistemas que recorrem a tĂ©cnicas de Text Classification, que vĂŁo desde o processamento de linguagem natural, ao Tokenizatioin, ao Part-of-speech tagging, como a exploração de tĂ©cnicas de aprendizagem automĂĄtica (Machine Learning) recorrendo a algoritmos supervisionados e nĂŁo supervisionados, a semelhança textual (Pattern Matching) e semelhança semĂąntica (Semantic Similarity). De uma forma geral, a partir destas tĂ©cnicas Ă© possĂ­vel atravĂ©s de trechos de texto fornecidos, obter informação adicional acerca desses mesmos trechos. SĂŁo ainda abordadas vĂĄrias ferramentas que utilizam as tĂ©cnicas descritas, como algumas de anotação, outras de semelhança semĂąntica e ainda outras num contexto de organização, ordenação e pesquisa de grandes quantidades de informação de forma escalĂĄveis que sĂŁo Ășteis e utilizadas neste tipo de aplicaçÔes. Alguns dos principais conjuntos de dados sĂŁo tambĂ©m descritos e abordados. A framework desenvolvida resultou em dois sistemas com uma arquitetura modular em pipeline, composta por mĂłdulos distintos consoante a tarefa desenvolvida. Estes mĂłdulos tinham bem definido os seus parĂąmetros de entrada como o que devolviam. O primeiro sistema tinha como entrada um conjunto de threads de perguntas e respostas em comentĂĄrio e devolvia cada conjunto de dez comentĂĄrios a uma pergunta ordenada e com um valor que condizia com a utilidade desse comentĂĄrio para com a resposta. Este sistema denominou-se por MoRS e foi a prova de conceito modular do sistema final a desenvolver. O segundo sistema tem como entrada variadas perguntas da ĂĄrea da biomĂ©dica restrita a quatro tipos de pergunta, devolvendo as respectivas respostas, acompanhadas de metadata utilizada na anĂĄlise dessa pergunta. Foram feitas algumas variaçÔes deste sistema, por forma a poder aferir se as escolhas de desenvolvimento iam sendo correctas, utilizando sempre a mesma framework (MoQA) e culminando com o sistema denominado MoQABio. Os principais mĂłdulos que compĂ”em estes sistemas incluem, por ordem de uso, um mĂłdulo para o reconhecimento de entidades (tambĂ©m biomĂ©dicas), utilizando uma das ferramentas jĂĄ investigadas no capĂ­tulo do trabalho relacionado. TambĂ©m um mĂłdulo denominado de Combiner, em que a cada documento recolhido a partir do resultado do mĂłdulo anterior, sĂŁo atribuĂ­dos os resultados de vĂĄrias mĂ©tricas, que servirĂŁo para treinar, no mĂłdulo seguinte, a partir da aplicação de algoritmos de aprendizagem automĂĄtica de forma a gerar um modelo de reconhecimento baseado nestes casos. ApĂłs o treino deste modelo, serĂĄ possĂ­vel utilizar um classificador de bons e maus artigos. Os modelos foram gerados na sua maioria a partir de Support Vector Machine, havendo tambĂ©m a opção de utilização de Multi-layer Perceptron. Desta feita, dos artigos aprovados sĂŁo retirados metadata, por forma a construir todo o resto da resposta, que incluia os conceitos, referencia dos documentos, e principais frases desses documentos. No mĂłdulo do sistema final do Combiner, existem avaliaçÔes que vĂŁo desde o jĂĄ referido Pattern Matching, com medidas como o nĂșmero de entidades em comum entre a questĂŁo e o artigo, de Semantic Similarity usando mĂ©tricas providenciadas pelos autores da biblioteca Sematch, incluindo semelhança entre conceitos e entidades do DBpedia e outras medidas de semelhança semĂąntica padrĂŁo, como Resnik ou Wu-Palmer. Outras mĂ©tricas incluem o comprimento do artigo, uma mĂ©trica de semelhança entre duas frases e o tempo em milisegundos desse artigo. Apesar de terem sido desenvolvidos dois sistemas, as variaçÔes desenvolvidas a partir do MoQA, Ă© que tĂȘm como prĂ©-requisitos conjuntos de dados provenientes de vĂĄrias fontes, entre elas o ficheiro de treino e teste de perguntas, o repositĂłrio PubMed, que tem inĂșmeros artigos cientĂ­ficos na ĂĄrea da biomĂ©dica, dos quais se vai retirar toda a informação utilizada para as respostas. AlĂ©m destas fontes locais, existe o OPENphacts, que Ă© externa, que fornecerĂĄ informação sobre vĂĄrias expressĂ”es da ĂĄrea biomĂ©dica detectadas no primeiro mĂłdulo. No fim dos sistemas cujo ancestral foi o MoQA estarem prontos, Ă© possĂ­vel os utilizadores interagirem com este sistema atravĂ©s de uma aplicação web, a partir da qual, ao inserirem o tipo de resposta que pretendem e a pergunta que querem ver respondida, essa pergunta Ă© passada pelo sistema e devolvida Ă  aplicação web a resposta, e respectiva metadata. Ao investigar a metadata, Ă© possĂ­vel aceder Ă  informação original. O WS4A participou no BioASQ de 2016, desenvolvida pela equipa ULisboa, o MoRS participou do SemEval Task 3 de 2017 e foi desenvolvida pelo prÂŽoprio, e por fim oMoQA da mesma autoria do segundo e cujo desempenho foi avaliado consoante os mesmos dados e mĂ©tricas do WS4A. Enquanto que no caso do BioASQ, era abordado o desempenho de um sistema de Question Answering na Ă rea da biomĂ©dica, no SemEval era abordado um sistema de ordenação de comentĂĄrios para com uma determinada pergunta, sendo os sistemas submetidos avaliados oficialmente usando as medidas como precision, recall e F-measure. De forma a comparar o impacto das caracterĂ­sticas e ferramentas usadas em cada um dos modelos de aprendizagem automĂĄtica construĂ­dos, estes foram comparados entre si, assim como a melhoria percentual entre os sistemas desenvolvidos ao longo do tempo. AlĂ©m das avaliaçÔes oficiais, houve tambĂ©m avaliaçÔes locais que permitiram explorar ainda mais a progressĂŁo dos sistemas ao longo do tempo, incluindo os trĂȘs sistemas desenvolvidos a partir do MoQA. Este trabalho apresenta um sistema que apesar de usar tĂ©cnicas state of the art com algumas adaptaçÔes, conseguiu atingir uma melhoria desempenho relevante face ao seu predecessor e resultados equiparados aos melhores do ano da competição cujos dados utilizou, possuindo assim um grande potencial para atingir melhores resultados. Alguns dos seus contributos jĂĄ vĂȘm desde Fevereiro de 2016, com o WS4A [86], que participou no BioASQ 2016, com o passo seguinte no MoRS [85], que por sua vez participou no SemEval 2017, findando pelo MoQA, com grandes melhorias e disponĂ­vel ao pĂșblico em https://github.com/lasigeBioTM/MoQA. Como trabalho futuro, propĂ”em-se sugestĂ”es, começando por melhorar a robustez do sistema, exploração adicional da metadata para melhor direcionar a pesquisa de respostas, a adição e exploração de novas caracterĂ­sticas do modelo a desenvolver e a constante renovação de ferramentas utilizadas TambĂ©m a incorporação de novas mĂ©tricas fornecidas pelo Sematch, o melhoramento da formulação de queries feitas ao sistema sĂŁo medidas a ter em atenção, dado que Ă© preciso pesar o desempenho e o tempo de resposta a uma pergunta.Question Answering systems have been of great use and interest in our times. They are great tools for acquiring simple answers in a simple way, being of great utility in the area of information retrieval, and also for community question answering. Such systems have great potential, since they access large sets of data, for example from the Web, to acquire their answers, and in the case of community question answering, forums. Such systems are not available for public reuse because they are only limited for researching purposes or even proof-of-concept systems of specific algorithms, with researchers repeating over and over again the same r very similar modules frequently, thus not providing a larger public with a tool which could serve their purposes. When such systems are made available, are of cumbersome installation or configuration, which includes reading the documentation and depending on the researchers’ programming ability. In this thesis, the two best available systems in these situations, YodaQA and OAQA are described. A description of the main modules is given, with some sub-problems and hypothetical solutions, also described. Many systems, algorithms (i.e. learning, ranking) were also described. This work presents a modular system, MoQA (which is available at https:// github.com/lasigeBioTM/MoQA), that solves some of these problems by creating a framework that comes with a baseline QA system for general purpose local inquiry, but which is a highly modular system, built with individually proven subsystems, and using known tools such as Sematch, It is a descendant of WS4A [86] and MoRS [85], which took part in BioASQ 2016 (with recognition) and SemEval 2017 repectively. Machine Learning algorithms and Stanford Named Entity Recognition. Its purpose is to have a performance as high as possible while keeping the prerequisites, edition, and the ability to change such modules to the users’ wishes and researching purposes while providing an easy platform through which the final user may use such framework. MoQA had three variants, which were compared with each other, with MoQABio, with the best results among them, by using different tools than the other systems, focusing on the biomedical domain knowledge

    AH 2003 : workshop on adaptive hypermedia and adaptive web-based systems

    Get PDF

    AH 2003 : workshop on adaptive hypermedia and adaptive web-based systems

    Get PDF
    • 

    corecore