10 research outputs found

    Towards Dynamic Composition of Question Answering Pipelines

    Get PDF
    Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting. QA systems implement a pipeline connecting a sequence of QA components for translating an input question into its corresponding formal query (e.g. SPARQL); this query will be executed over a knowledge graph in order to produce the answer of the question. Recent empirical studies have revealed that albeit overall effective, the performance of QA systems and QA components depends heavily on the features of input questions, and not even the combination of the best performing QA systems or individual QA components retrieves complete and correct answers. Furthermore, these QA systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardised interfaces and are often not open source or available as Web services. All these drawbacks of the state of the art that prevents many of these approaches to be employed in real-world applications. In this thesis, we tackle the problem of QA over knowledge graph and propose a generic approach to promote reusability and build question answering systems in a collaborative effort. Firstly, we define qa vocabulary and Qanary methodology to develop an abstraction level on existing QA systems and components. Qanary relies on qa vocabulary to establish guidelines for semantically describing the knowledge exchange between the components of a QA system. We implement a component-based modular framework called "Qanary Ecosystem" utilising the Qanary methodology to integrate several heterogeneous QA components in a single platform. We further present Qaestro framework that provides an approach to semantically describing question answering components and effectively enumerates QA pipelines based on a QA developer requirements. Qaestro provides all valid combinations of available QA components respecting the input-output requirement of each component to build QA pipelines. Finally, we address the scalability of QA components within a framework and propose a novel approach that chooses the best component per task to automatically build QA pipeline for each input question. We implement this model within FRANKENSTEIN, a framework able to select QA components and compose pipelines. FRANKENSTEIN extends Qanary ecosystem and utilises qa vocabulary for data exchange. It has 29 independent QA components implementing five QA tasks resulting 360 unique QA pipelines. Each approach proposed in this thesis (Qanary methodology, Qaestro, and FRANKENSTEIN) is supported by extensive evaluation to demonstrate their effectiveness. Our contributions target a broader research agenda of offering the QA community an efficient way of applying their research to a research field which is driven by many different fields, consequently requiring a collaborative approach to achieve significant progress in the domain of question answering

    Why reinvent the wheel: Let's build question answering systems together

    Get PDF
    Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines

    Improvements to GeoQA, a Question Answering system for Geospatial Questions

    Get PDF
    Η παρούσα εργασία αποτελεί μια προσπάθεια για συγκέντρωση, μελέτη και σύγκριση συστημάτων απάντησης ερωτήσεων όπως τα QUINT, TEMPO και NEQA και του σκελετού συστημάτων απάντησης ερωτήσεων Frankenstein. Η μελέτη επικεντρώνεται στην απάντηση ερωτήσεων σε γεωχωρικά δεδομένα και πιο στο σύστημα GeoQA. Το σύστημα αυτό έχει προταθεί πρόσφατα και ειναι το πρώτο σύστημα απάντησης ερωτήσεων πάνω σε συνδεδεμένα γεωχωρικά δεδομένα βασιζόμενο σε πρότυπα. Βελτιώνουμε το παραπάνω σύστημα χρησιμοποιώντας τα δεδομένα για το σχήμα των βάσεων γνώσης του, προσθέτοντας πρότυπα για πιο σύνθετες ερωτήσεις και αναπτύσσοντας το υποσύστημα για την επεξεργασία φυσικής γλώσσας.We study the question-answering GeoQA which was proposed recently. GeoQA is the first template-based question answering system for linked geospatial data. We improve this system by exploiting the data schema information of the kb’s it’s using, adding more templates for more complex queries and by improving the natural language processing module in order to recognize the patterns. The current work is also an attempt to concentrate, study and compare some other question-answering systems like QUINT, Qanary methodology and Frankenstein framework for question answering systems

    Survey on Challenges of Question Answering in the Semantic Web

    Get PDF
    Höffner K, Walter S, Marx E, Usbeck R, Lehmann J, Ngomo A-CN. Survey on Challenges of Question Answering in the Semantic Web. Semantic Web Journal. 2017;8(6):895-920

    Spatial and Temporal information in the Semantic Web

    Get PDF
    Ο κύριος σκοπός της πτυχιακής εργασίας είναι η ενίσχυση του Σημασιολογικού Ιστού με χρονική και χωρική πληροφορία επεκτείνοντας τον γράφο γνώσης YAGO με τέτοια πληροφορία. Η εργασία αποτελείται από τρία μέρη. Το πρώτο μέρος αναφέρεται στην μετατροπή των δεδομένων του OpenStreetMap σε RDF τριπλέτες. Το OpenStreetMap είναι ένας χάρτης με ελεύθερη άδεια ο οποίος αναπτύσσεται από μια κοινότητα εθελοντών και περιέχει πληροφορίες για όλο τον κόσμο. Τα δεδομένα του είναι ιδιαίτερα χρήσιμα και απαραίτητα για πολλές εφαρμογές, και για αυτό η παροχή τους σε μορφή RDF είναι ιδιαίτερα σημαντική. Το δεύτερο σκέλος της πτυχιακής αφορά αφορά την μετατροπή μεγάλων χωρικών δεδομένων σε RDF τριπλέτες. Σε αυτήν την υλοποίηση επεκτείνουμε ένα ETL εργαλείο με την τεχνολογία Spark η οποία μας επιτρέπει να παραλλιλοποιήσουμε την μετατροπή των δεδομένων σε RDF με αποτέλεσμα να μειωθεί σημαντικά ο χρόνος εκτέλεσης. Το τρίτο κομμάτι έχει να κάνει με την επέκταση της γνωσιακής βάσης YAGO με χρονική και χωρική πληροφορία σχετικά με την πρώην διοικητική διαίρεση της Ελλάδας.The main purpose of this thesis is the enhancement of the Semantic Web with geospatial and temporal information by extending the YAGO knowledge graph with such information. It is composed of three parts. The first part refers to the conversion of OpenStreetMap data into RDF triples. OpenStreetMap is a collaborative project of a free editable map of the whole world. It contains a lot of useful information which is prerequisite for several applications and therefore its transformation into RDF triples is of significant importance. The second part concerns the conversion of big geospatial data in RDF triples. In this implementation, an ETL utility is extended to work on top of Spark which enables the parallelization of the conversion which results in the reduction of the execution cost. The third part is about the extension of the YAGO knowledge base with temporal and geospatial information the former administrative division of Greece

    Knowledge extraction from unstructured data

    Get PDF
    Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models

    Qanary - the fast track to creating a question answering system with linked data technology

    No full text
    Question answering (QA) systems focus on making sense out of data via an easy-to-use interface. However, these systems are very complex and integrate a lot of technology tightly. Previously presented QA systems are mostly singular and monolithic implementations. Hence, their reusability is limited. In contrast, we follow the research agenda of establishing an ecosystem for components of QA systems, which will enable the QA community to elevate the reusability of such components and to intensify their research activities. In this paper, we present a reference implementation of the Qanary methodology for creating QA systems. Qanary relies on linked data vocabularies and provides a fast track to integrating QA components into a light-weight, message-driven, component-oriented architecture
    corecore