2,665 research outputs found

    Towards Dynamic Composition of Question Answering Pipelines

    Get PDF
    Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting. QA systems implement a pipeline connecting a sequence of QA components for translating an input question into its corresponding formal query (e.g. SPARQL); this query will be executed over a knowledge graph in order to produce the answer of the question. Recent empirical studies have revealed that albeit overall effective, the performance of QA systems and QA components depends heavily on the features of input questions, and not even the combination of the best performing QA systems or individual QA components retrieves complete and correct answers. Furthermore, these QA systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardised interfaces and are often not open source or available as Web services. All these drawbacks of the state of the art that prevents many of these approaches to be employed in real-world applications. In this thesis, we tackle the problem of QA over knowledge graph and propose a generic approach to promote reusability and build question answering systems in a collaborative effort. Firstly, we define qa vocabulary and Qanary methodology to develop an abstraction level on existing QA systems and components. Qanary relies on qa vocabulary to establish guidelines for semantically describing the knowledge exchange between the components of a QA system. We implement a component-based modular framework called "Qanary Ecosystem" utilising the Qanary methodology to integrate several heterogeneous QA components in a single platform. We further present Qaestro framework that provides an approach to semantically describing question answering components and effectively enumerates QA pipelines based on a QA developer requirements. Qaestro provides all valid combinations of available QA components respecting the input-output requirement of each component to build QA pipelines. Finally, we address the scalability of QA components within a framework and propose a novel approach that chooses the best component per task to automatically build QA pipeline for each input question. We implement this model within FRANKENSTEIN, a framework able to select QA components and compose pipelines. FRANKENSTEIN extends Qanary ecosystem and utilises qa vocabulary for data exchange. It has 29 independent QA components implementing five QA tasks resulting 360 unique QA pipelines. Each approach proposed in this thesis (Qanary methodology, Qaestro, and FRANKENSTEIN) is supported by extensive evaluation to demonstrate their effectiveness. Our contributions target a broader research agenda of offering the QA community an efficient way of applying their research to a research field which is driven by many different fields, consequently requiring a collaborative approach to achieve significant progress in the domain of question answering

    A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases

    Get PDF
    PhDData stored in individual autonomous databases often needs to be combined and interrelated. For example, in the Inland Water (IW) environment monitoring domain, the spatial and temporal variation of measurements of different water quality indicators stored in different databases are of interest. Data from multiple data sources is more complex to combine when there is a lack of metadata in a computation forin and when the syntax and semantics of the stored data models are heterogeneous. The main types of information retrieval (IR) requirements are query transparency and data harmonisation for data interoperability and support for multiple user views. A combined Semantic Web based and Agent based distributed system framework has been developed to support the above IR requirements. It has been implemented using the Jena ontology and JADE agent toolkits. The semantic part supports the interoperability of autonomous data sources by merging their intensional data, using a Global-As-View or GAV approach, into a global semantic model, represented in DAML+OIL and in OWL. This is used to mediate between different local database views. The agent part provides the semantic services to import, align and parse semantic metadata instances, to support data mediation and to reason about data mappings during alignment. The framework has applied to support information retrieval, interoperability and multi-lateral viewpoints for four European environmental agency databases. An extended GAV approach has been developed and applied to handle queries that can be reformulated over multiple user views of the stored data. This allows users to retrieve data in a conceptualisation that is better suited to them rather than to have to understand the entire detailed global view conceptualisation. User viewpoints are derived from the global ontology or existing viewpoints of it. This has the advantage that it reduces the number of potential conceptualisations and their associated mappings to be more computationally manageable. Whereas an ad hoc framework based upon conventional distributed programming language and a rule framework could be used to support user views and adaptation to user views, a more formal framework has the benefit in that it can support reasoning about the consistency, equivalence, containment and conflict resolution when traversing data models. A preliminary formulation of the formal model has been undertaken and is based upon extending a Datalog type algebra with hierarchical, attribute and instance value operators. These operators can be applied to support compositional mapping and consistency checking of data views. The multiple viewpoint system was implemented as a Java-based application consisting of two sub-systems, one for viewpoint adaptation and management, the other for query processing and query result adjustment

    Why reinvent the wheel: Let's build question answering systems together

    Get PDF
    Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines

    Coordination of DWH Long-Term Data Management: The Path Forward Workshop Report

    Get PDF
    Following the 2010 DWH Oil Spill a vast amount of environmental data was collected (e.g., 100,000+ environmental samples, 15 million+ publicly available records). The volume of data collected introduced a number of challenges including: data quality assurance, data storage, data integration, and long-term preservation and availability of the data. An effort to tackle these challenges began in June 2014, at a workshop focused on environmental disaster data management (EDDM) with respect to response and subsequent restoration. The previous EDDM collaboration improved communication and collaboration among a range of government, industry and NGO entities involved in disaster management. In June 2017, the first DWH Long-Term Data Management (LTDM) workshop focused on reviewing existing data management systems, opportunities to advance integration of these systems, the availability of data for restoration planning, project implementation and restoration monitoring efforts, and providing a platform for increased communication among the various data GOM entities. The June 2017 workshop resulted in the formation of three working groups: Data Management Standards, Interoperability and Discovery/Searchability. These working groups spent 2018 coordinating and addressing various complex topics related to DWH LTDM. On December 4th and 5th, 2018 the Coastal Response Research Center (CRRC), NOAA Office of Response and Restoration (ORR) and NOAA National Marine Fisheries Service (NFMS) Restoration Center (RC), co-sponsored a workshop entitled Deepwater Horizon Oil Spill (DWH) Long-Term Data Management (LTDM): The Path Forward at the NOAA Gulf of Mexico (GOM) Disaster Response Center (DRC) in Mobile, AL

    Implementing OBDA for an end-user query answering service on an educational ontology

    Get PDF
    In the age where productivity of society is no longer defined by the amount of information generated, but from the quality and assertiveness that a set of data may potentially hold, the right questions to do depends on the semantic awareness capability that an information system could evolve into. To address this challenge, in the last decade, exhaustive research has been done in the Ontology Based Data Access (OBDA) paradigm. A conspectus of the most promising technologies with data integration capabilities and the foundations where they rely are documented in this memory as a point of reference for choosing tools that supports the incorporation of a conceptual model under a OBDA method. The present study provides a practical approach for implementing an ontology based data access service, to educational context users of a Learning Analytics initiative, by means of allowing them to formulate intuitive enquiries with a familiar domain terminology on top of a Learning Management System. The ontology used was completely transformed to semantic linked data standards and some data mappings for testing were included. Semantic Linked Data technologies exposed in this document may exert modernization to environments in which object oriented and relational paradigms may propagate heterogeneous and contradictory requirements. Finally, to validate the implementation, a set of queries were constructed emulating the most relevant dynamics of the model regarding the dataset nature

    Ontologies for the Interoperability of Heterogeneous Multi-Agent Systems in the scope of Energy and Power Systems

    Get PDF
    Tesis por compendio de publicaciones[ES]El sector eléctrico, tradicionalmente dirigido por monopolios y poderosas empresas de servicios públicos, ha experimentado cambios significativos en las últimas décadas. Los avances más notables son una mayor penetración de las fuentes de energía renovable (RES por sus siglas en inglés) y la generación distribuida, que han llevado a la adopción del paradigma de las redes inteligentes (SG por sus siglas en inglés) y a la introducción de enfoques competitivos en los mercados de electricidad (EMs por sus siglas en inglés) mayoristas y algunos minoristas. Las SG emergieron rápidamente de un concepto ampliamente aceptado en la realidad. La intermitencia de las fuentes de energía renovable y su integración a gran escala plantea nuevas limitaciones y desafíos que afectan en gran medida las operaciones de los EMs. El desafiante entorno de los sistemas de potencia y energía (PES por sus siglas en inglés) refuerza la necesidad de estudiar, experimentar y validar operaciones e interacciones competitivas, dinámicas y complejas. En este contexto, la simulación, el apoyo a la toma de decisiones, y las herramientas de gestión inteligente, se vuelven imprescindibles para estudiar los diferentes mecanismos del mercado y las relaciones entre los actores involucrados. Para ello, la nueva generación de herramientas debe ser capaz de hacer frente a la rápida evolución de los PES, proporcionando a los participantes los medios adecuados para adaptarse, abordando nuevos modelos y limitaciones, y su compleja relación con los desarrollos tecnológicos y de negocios. Las plataformas basadas en múltiples agentes son particularmente adecuadas para analizar interacciones complejas en sistemas dinámicos, como PES, debido a su naturaleza distribuida e independiente. La descomposición de tareas complejas en asignaciones simples y la fácil inclusión de nuevos datos y modelos de negocio, restricciones, tipos de actores y operadores, y sus interacciones, son algunas de las principales ventajas de los enfoques basados en agentes. En este dominio, han surgido varias herramientas de modelado para simular, estudiar y resolver problemas de subdominios específicos de PES. Sin embargo, existe una limitación generalizada referida a la importante falta de interoperabilidad entre sistemas heterogéneos, que impide abordar el problema de manera global, considerando todas las interrelaciones relevantes existentes. Esto es esencial para que los jugadores puedan aprovechar al máximo las oportunidades en evolución. Por lo tanto, para lograr un marco tan completo aprovechando las herramientas existentes que permiten el estudio de partes específicas del problema global, se requiere la interoperabilidad entre estos sistemas. Las ontologías facilitan la interoperabilidad entre sistemas heterogéneos al dar un significado semántico a la información intercambiada entre las distintas partes. La ventaja radica en el hecho de que todos los involucrados en un dominio particular los conocen, comprenden y están de acuerdo con la conceptualización allí definida. Existen, en la literatura, varias propuestas para el uso de ontologías dentro de PES, fomentando su reutilización y extensión. Sin embargo, la mayoría de las ontologías se centran en un escenario de aplicación específico o en una abstracción de alto nivel de un subdominio de los PES. Además, existe una considerable heterogeneidad entre estos modelos, lo que complica su integración y adopción. Es fundamental desarrollar ontologías que representen distintas fuentes de conocimiento para facilitar las interacciones entre entidades de diferente naturaleza, promoviendo la interoperabilidad entre sistemas heterogéneos basados en agentes que permitan resolver problemas específicos de PES. Estas brechas motivan el desarrollo del trabajo de investigación de este doctorado, que surge para brindar una solución a la interoperabilidad de sistemas heterogéneos dentro de los PES. Las diversas aportaciones de este trabajo dan como resultado una sociedad de sistemas multi-agente (MAS por sus siglas en inglés) para la simulación, estudio, soporte de decisiones, operación y gestión inteligente de PES. Esta sociedad de MAS aborda los PES desde el EM mayorista hasta el SG y la eficiencia energética del consumidor, aprovechando las herramientas de simulación y apoyo a la toma de decisiones existentes, complementadas con las desarrolladas recientemente, asegurando la interoperabilidad entre ellas. Utiliza ontologías para la representación del conocimiento en un vocabulario común, lo que facilita la interoperabilidad entre los distintos sistemas. Además, el uso de ontologías y tecnologías de web semántica permite el desarrollo de herramientas agnósticas de modelos para una adaptación flexible a nuevas reglas y restricciones, promoviendo el razonamiento semántico para sistemas sensibles al contexto
    corecore