9 research outputs found

    QAKiS @ QALD-2

    Get PDF
    International audienceWe present QAKiS, a system for Question Answering over linked data (in particular, DBpedia). The problem of question interpretation is addressed as the automatic identification of the set of relevant relations between entities in the natural language input question, matched against a repository of automatically collected relational patterns (i.e. the WikiFramework repository). Such patterns represent possible lexical-izations of ontological relations, and are associated to a SPARQL query derived from the linked data relational patterns. Wikipedia is used as the source of free text for the automatic extraction of the relational patterns, and DBpedia as the linked data resource to provide relational patterns and to be queried using a natural language interface

    Evaluating question answering over linked data

    Get PDF
    Lopez V, Unger C, Cimiano P, Motta E. Evaluating question answering over linked data. Web Semantics Science Services And Agents On The World Wide Web. 2013;21:3-13.The availability of large amounts of open, distributed, and structured semantic data on the web has no precedent in the history of computer science. In recent years, there have been important advances in semantic search and question answering over RDF data. In particular, natural language interfaces to online semantic data have the advantage that they can exploit the expressive power of Semantic Web data models and query languages, while at the same time hiding their complexity from the user. However, despite the increasing interest in this area, there are no evaluations so far that systematically evaluate this kind of systems, in contrast to traditional question answering and search interfaces to document spaces. To address this gap, we have set up a series of evaluation challenges for question answering over linked data. The main goal of the challenge was to get insight into the strengths, capabilities, and current shortcomings of question answering systems as interfaces to query linked data sources, as well as benchmarking how these interaction paradigms can deal with the fact that the amount of RDF data available on the web is very large and heterogeneous with respect to the vocabularies and schemas used. Here, we report on the results from the first and second of such evaluation campaigns. We also discuss how the second evaluation addressed some of the issues and limitations which arose from the first one, as well as the open issues to be addressed in future competitions. (C) 2013 Elsevier B.V. All rights reserved

    Survey on Challenges of Question Answering in the Semantic Web

    Get PDF
    Höffner K, Walter S, Marx E, Usbeck R, Lehmann J, Ngomo A-CN. Survey on Challenges of Question Answering in the Semantic Web. Semantic Web Journal. 2017;8(6):895-920

    Multilingual SPARQL Query Generation Using Lexico-Syntactic Patterns

    Get PDF
    Le Web Semantique et les technologies qui s’y rattachent ont permis la création d’un grand nombre de données disponibles publiquement sous forme de bases de connaissances. Toutefois, ces données nécessitent un langage de requêtes SPARQL qui n’est pas maitrisé par tous les usagers. Pour faciliter le lien entre les bases de connaissances comme DBpedia destinées à être utilisées par des machines et les utilisateurs humains, plusieurs systèmes de question-réponse ont été développés. Le but de tels systèmes est de retrouver dans les bases de connaissances des réponses à des questions posées avec un minimum d’effort demandé de la part des utilisateurs. Cependant, plusieurs de ces systèmes ne permettent pas des expressions en langage naturel et imposent des restrictions spécifiques sur le format des questions. De plus, les systèmes monolingues, très souvent en anglais, sont beaucoup plus populaires que les systèmes multilingues qui ont des performances moindres. Le but de ce travail est de développer un système de question-réponse multilingue capable de prendre des questions exprimées en langage naturel et d’extraire la réponse d’une base de connaissance. Ceci est effectué en transformant automatiquement la question posée en requêtes SPARQL. Cette génération de requêtes repose sur des patrons lexico-syntaxiques qui exploitent la spécificité syntaxique de chaque langue.----------ABSTRACT: The continuous work on the Semantic Web and its related technologies for the past few decades has lead to large amounts of publicly available data and a better way to access it. To bridge the gap between human users and large knowledge bases, such as DBpedia, designed for machines, various QA systems have been developed. These systems aim to answer users’ questions as accurately as possible with as little effort possible from the user. However, not all systems allow for full natural language questions and impose additional restrictions on the user’s input. In addition, monolingual systems are much more prevalent in the field with English being widely used while other languages lack behind. The objective of this work is to propose a multilingual QA system able to take full natural language questions and to retrieve information from a knowledge base. This is done by transforming the user’s question automatically into a SPARQL query that is sent to DBpedia. This work relies, among other aspects, on a set of lexico-syntactic patterns that leverage the power of language-specific syntax to generate more accurate queries

    «Ποιος θέλει να γίνει εκατομμυριούχος;» a la Ελληνικά

    Get PDF
    Αυτή η εργασία περιγράφει αναλυτικά τις τεχνικές που χρησιμοποιούνται για την κατασκευή ενός εικονικού παίκτη για το δημοφιλές τηλεοπτικό παιχνίδι «Ποιος θέλει να γίνει εκατομμυριούχος;» και βασίζεται πάνω στο αντίστοιχο άρθρο [1] στο οποίο έχει γίνει υλοποίηση για την αγγλική και την ιταλική έκδοση του παιχνιδιού. Επίσης σε αυτήν την εργασία έγινε μια προσπάθεια εφαρμογής των διαφόρων τεχνικών που περιγράφονται μέσα στο άρθρο. H υλοποίηση του εικονικού παίκτη για την ελληνική έκδοση του παιχνιδιού έγινε στην γλώσσα προγραμματισμού Java και θα παρουσιαστεί αναλυτικά. Ο εικονικός παίκτης πρέπει να απαντήσει σε μια σειρά από ερωτήσεις πολλαπλής επιλογής που τίθενται σε φυσική γλώσσα, επιλέγοντας τη σωστή απάντηση μεταξύ τεσσάρων διαφορετικών επιλογών. Εάν δεν είναι σίγουρος για κάποια απάντηση μπορεί να χρησιμοποιήσει τις σανίδες σωτηρίας (lifelines) ή να αποχωρήσει από το παιχνίδι. Η αρχιτεκτονική του εικονικού παίκτη αποτελείται από 1) μια μονάδα (module) Απάντησης Ερωτημάτων (Question Answering) (QA), η οποία αξιοποιεί την μηχανή αναζήτησης της Google για να ανακτήσει τα πιο σχετικά χωρία κειμένου που είναι χρήσιμα στο να προσδιοριστεί η σωστή απάντηση σε μία ερώτηση, 2) μια μονάδα Βαθμολόγησης Απαντήσεων (Answer Scoring) (AS), η οποία αποδίδει μια βαθμολογία σε κάθε υποψήφια απάντηση σύμφωνα με διαφορετικά κριτήρια με βάση τα αποσπάσματα των κειμένων που ανακτώνται από την μονάδα QA, και 3) μια μονάδα Λήψης Αποφάσεων (Decision Making) (DM), η οποία επιλέγει τη στρατηγική για το παιχνίδι σύμφωνα με συγκεκριμένους κανόνες, και σύμφωνα με τις βαθμολογίες που αποδίδονται στις υποψήφιες απαντήσεις. Τέλος στην εργασία αξιολογούνται τόσο η ακρίβεια του εικονικού παίκτη να απαντήσει σωστά στις ερωτήσεις του παιχνιδιού, όσο και η ικανότητά του να παίζει πραγματικά παιχνίδια για να κερδίσει χρήματα. Τα πειράματα έχουν διεξαχθεί με ερωτήσεις που προέρχονται από την ελληνική έκδοση του επιτραπέζιου παιχνιδιού. Σε γενικές γραμμές παρατηρείται ότι η μέση ακρίβεια του εικονικού παίκτη είναι σημαντικά καλύτερη από την απόδοση των ανθρώπινων παικτών. Όσον αφορά τη δυνατότητα να παίξει πραγματικά παιχνίδια, το οποίο περιλαμβάνει τον ορισμό μιας κατάλληλης στρατηγικής για τη χρήση των σανίδων σωτηρίας προκειμένου να αποφασίσει είτε να απαντήσει σε μια ερώτηση ακόμη και σε μια κατάσταση αβεβαιότητας ή να αποσυρθεί από το παιχνίδι παίρνοντας τα χρήματα που έχει κερδίσει μέχρι τώρα, ο εικονικός παίκτης κερδίζει κατά μέσο όρο περισσότερα χρήματα από το μέσο ποσό που κέρδισαν οι ανθρώπινοι παίκτες.This work describes in detail the techniques used to build a virtual player for the popular TV game “Who Wants to Be a Millionaire?” and is based on the corresponding article [1] in which the virtual player has been implemented for the English and the Italian versions of the game. Also in this work an attempt was made to apply the various techniques described in the article. The implementation of the virtual player for the Greek version of the game was made using the programming language Java and will be presented in detail. The virtual player must answer a series of multiple-choice questions posed in natural language by selecting the correct answer among four different choices. If he is not sure about an answer he can use the lifelines or quit the game. The architecture of the virtual player consists of 1) a Question Answering (QA) module, which leverages the use of Google search engine to retrieve the most relevant passages of text useful to identify the correct answer to a question, 2) an Answer Scoring (AS) module, which assigns a score to each candidate answer according to different criteria based on the passages of text retrieved by the QA module, and 3) a Decision Making (DM) module, which chooses the strategy for playing the game according to specific rules as well as to the scores assigned to the candidate answers. Finally, in this work both the accuracy of the virtual player to answer correctly the questions of the game, and its ability to play real games in order to earn money are evaluated. The experiments have been conducted with questions derived from the Greek version of the board game. Generally, it is observed that the average accuracy of the virtual player is significantly better that the performance of the human players. Regarding the ability to play real games, which involves the definition of a proper strategy for the usage of lifelines in order to decide whether to answer a question even in a condition of uncertainty or to retire from the game by taking the earned money, the virtual player wins on average more money than the average amount earned by human players

    Semantic Interpretation of User Queries for Question Answering on Interlinked Data

    Get PDF
    The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data

    Multimodal Legal Information Retrieval

    Get PDF
    The goal of this thesis is to present a multifaceted way of inducing semantic representation from legal documents as well as accessing information in a precise and timely manner. The thesis explored approaches for semantic information retrieval (IR) in the Legal context with a technique that maps specific parts of a text to the relevant concept. This technique relies on text segments, using the Latent Dirichlet Allocation (LDA), a topic modeling algorithm for performing text segmentation, expanding the concept using some Natural Language Processing techniques, and then associating the text segments to the concepts using a semi-supervised text similarity technique. This solves two problems, i.e., that of user specificity in formulating query, and information overload, for querying a large document collection with a set of concepts is more fine-grained since specific information, rather than full documents is retrieved. The second part of the thesis describes our Neural Network Relevance Model for E-Discovery Information Retrieval. Our algorithm is essentially a feature-rich Ensemble system with different component Neural Networks extracting different relevance signal. This model has been trained and evaluated on the TREC Legal track 2010 data. The performance of our models across board proves that it capture the semantics and relatedness between query and document which is important to the Legal Information Retrieval domain
    corecore