23 research outputs found
QAnswer -Enhanced Entity Matching for Question Answering over Linked Data
Abstract. QAnswer is a question answering system that uses DBpedia as a knowledge base and converts natural language questions into a SPARQL query. In order to improve the match between entities and relations and natural language text, we make use of Wikipedia to extract lexicalizations of the DBpedia entities and then match them with the question. These entities are validated on the ontology, while missing ones can be inferred. The proposed system was tested in the QALD-5 challenge and it obtained a F1 score of 0.30, which placed QAnswer in the second position in the challenge, despite the fact that the system used only a small subset of the properties in DBpedia, due to the long extraction process
Answering Count Questions with Structured Answers from Text
In this work we address the challenging case of answering count queries in web search, such as ``number of songs by John Lennon''. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike previous systems, our method infers final answers from multiple observations, supports semantic qualifiers for the counts, and provides evidence by enumerating representative instances. Experiments with a wide variety of queries, including existing benchmark show the benefits of our method, and the influence of specific parameter settings. Our code, data and an interactive system demonstration are publicly available at https://github.com/ghoshs/CoQEx and https://nlcounqer.mpi-inf.mpg.de/
Systematic review of question answering over knowledge bases
Over the years, a growing number of semantic data repositories have been made available on the web. However, this has created new challenges in exploiting these resources efficiently. Querying services require knowledge beyond the typical userâs expertise, which is a critical issue in adopting semantic information solutions. Several proposals to overcome this dif- ficulty have suggested using question answering (QA) systems to provide userâfriendly interfaces and allow natural language use. Because question answering over knowledge bases (KBQAs) is a very active research topic, a comprehensive view of the field is essential. The purpose of this study was to conduct a systematic review of methods and systems for KBQAs to identify their main advantages and limitations. The inclusion criteria rationale was English fullâtext articles published since 2015 on methods and systems for KBQAs.info:eu-repo/semantics/publishedVersio
Easing the questioning of semantic biomedical data
Researchers have been using semantic technologies
as essential tools to structure knowledge. This is particularly
relevant in the biomedical domain, where large dataset are
continuously generated. Semantic technologies offer the ability
to describe data and to map and linking distributed repositories,
creating a network where the searching interface is a single entry
point. However, the increasing number of semantic data repositories
that are publicly available is creating new challenges related
to its exploration. Despite being human and machine-readable,
these technologies are much more challenging for end-users.
Querying services usually require mastering formal languages
and that knowledge is beyond the typical userâs expertise, being
a critical issue in adopting semantic web information systems. In
particular, the questioning of biomedical data presents specific
challenges for which there are still no mature proposals for
production environments. This paper presents a solution to
query biomedical semantic databases using natural language. The
system is at the intersection between semantic parsing and the
use of templates. It makes it possible to extract information in a
friendly way for users who are not experts in semantic queries.FCT - Portuguese Foundation for Science and Technology
supports Arnaldo Pereira (Ph.D. Grant PD/BD/142877/2018).info:eu-repo/semantics/publishedVersio
Entities with quantities : extraction, search, and ranking
Quantities are more than numeric values. They denote measures of the worldâs entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.Zahlen sind mehr als nur numerische Werte. Sie beschreiben MaĂe von EntitĂ€ten wie die Höhe von GebĂ€uden, die Laufzeit von Sportlern, die Energieeffizienz von Automodellen oder die Energieerzeugung von Kraftwerken - jeweils ausgedrĂŒckt durch Zahlen mit zugehörigen Einheiten. EntitĂ€tszentriete Anfragen und direktes Question-Answering werden von Suchmaschinen hĂ€ufig gut unterstĂŒtzt. Sie funktionieren jedoch nicht gut, wenn die Fragen Zahlenfilter beinhalten, wie z. B. die Suche nach Sportlern, die 200m unter 20 Sekunden gelaufen sind, oder nach Unternehmen mit einem Quartalsumsatz von ĂŒber 2 Milliarden US-Dollar. Selbst moderne Systeme schaffen es nicht, QuantitĂ€ten, einschlieĂlich der genannten Bedingungen (weniger als, ĂŒber, etc.), der MaĂeinheiten (Sekunden, Dollar, etc.) und des Kontexts (200-Meter-Rennen, Quartalsumsatz usw.), zu verstehen. Auch QA-Systeme, die auf strukturierten Wissensbanken (âKnowledge Basesâ, KBs) aufgebaut sind, versagen, da quantitative Eigenschaften von modernen KBs kaum erfasst werden. In dieser Dissertation werden neue Methoden entwickelt, um den Stand der Technik zur Wissensextraktion und -suche von QuantitĂ€ten voranzutreiben. Unsere HauptbeitrĂ€ge sind die folgenden: âą ZunĂ€chst prĂ€sentieren wir Qsearch [Ho et al., 2019, Ho et al., 2020] â ein System, das mit erweiterten Fragen mit QuantitĂ€tsfiltern umgehen kann, indem es Hinweise verwendet, die sowohl in der Frage als auch in den Textquellen vorhanden sind. Qsearch umfasst zwei HauptbeitrĂ€ge. Der erste Beitrag ist ein tiefes neuronales Netzwerkmodell, das fĂŒr die Extraktion quantitĂ€tszentrierter Tupel aus Textquellen entwickelt wurde. Der zweite Beitrag ist ein neuartiges Query-Matching-Modell zum Finden und zur Reihung passender Tupel. âą Zweitens, um beim Vorgang heterogene Tabellen einzubinden, stellen wir QuTE [Ho et al., 2021a, Ho et al., 2021b] vor â ein System zum Extrahieren von QuantitĂ€tsinformationen aus Webquellen, insbesondere Ad-hoc Webtabellen in HTML-Seiten. Der Beitrag von QuTE umfasst eine Methode zur VerknĂŒpfung von QuantitĂ€ts- und EntitĂ€tsspalten, fĂŒr die externe Textquellen genutzt werden. Zur Beantwortung von Fragen kontextualisieren wir die extrahierten EntitĂ€ts-QuantitĂ€ts-Paare mit informativen Hinweisen aus der Tabelle und stellen eine neue Methode zur Konsolidierung und verbesserteer Reihung von Antwortkandidaten durch Inter-Fakten-Konsistenz vor. âą Drittens stellen wir QL [Ho et al., 2022] vor â eine Recall-orientierte Methode zur Anreicherung von Knowledge Bases (KBs) mit quantitativen Fakten. Moderne KBs wie Wikidata oder YAGO decken viele EntitĂ€ten und ihre relevanten Informationen ab, ĂŒbersehen aber oft wichtige quantitative Eigenschaften. QL ist frage-gesteuert und basiert auf iterativem Lernen mit zwei HauptbeitrĂ€gen, um die KB-Abdeckung zu verbessern. Der erste Beitrag ist eine Methode zur Expansion von Fragen, um einen gröĂeren Pool an Faktenkandidaten zu erfassen. Der zweite Beitrag ist eine Technik zur Selbstkonsistenz durch BerĂŒcksichtigung der Werteverteilungen von QuantitĂ€ten
Multilingual SPARQL Query Generation Using Lexico-Syntactic Patterns
Le Web Semantique et les technologies qui sây rattachent ont permis la crĂ©ation dâun grand nombre de donnĂ©es disponibles publiquement sous forme de bases de connaissances. Toutefois, ces donnĂ©es nĂ©cessitent un langage de requĂȘtes SPARQL qui nâest pas maitrisĂ© par tous les usagers. Pour faciliter le lien entre les bases de connaissances comme DBpedia destinĂ©es Ă ĂȘtre utilisĂ©es par des machines et les utilisateurs humains, plusieurs systĂšmes de question-rĂ©ponse ont Ă©tĂ© dĂ©veloppĂ©s. Le but de tels systĂšmes est de retrouver dans les bases de connaissances des rĂ©ponses Ă des questions posĂ©es avec un minimum dâeffort demandĂ© de la part des utilisateurs. Cependant, plusieurs de ces systĂšmes ne permettent pas des expressions en langage naturel et imposent des restrictions spĂ©cifiques sur le format des questions. De plus, les systĂšmes monolingues, trĂšs souvent en anglais, sont beaucoup plus populaires que les systĂšmes multilingues qui ont des performances moindres. Le but de ce travail est de dĂ©velopper un systĂšme de question-rĂ©ponse multilingue capable de prendre des questions exprimĂ©es en langage naturel et dâextraire la rĂ©ponse dâune base de connaissance. Ceci est effectuĂ© en transformant automatiquement la question posĂ©e en requĂȘtes SPARQL. Cette
gĂ©nĂ©ration de requĂȘtes repose sur des patrons lexico-syntaxiques qui exploitent la spĂ©cificitĂ© syntaxique de chaque langue.----------ABSTRACT: The continuous work on the Semantic Web and its related technologies for the past few
decades has lead to large amounts of publicly available data and a better way to access it. To bridge the gap between human users and large knowledge bases, such as DBpedia, designed for machines, various QA systems have been developed. These systems aim to answer usersâ questions as accurately as possible with as little effort possible from the user. However, not all systems allow for full natural language questions and impose additional restrictions on
the userâs input. In addition, monolingual systems are much more prevalent in the field with English being widely used while other languages lack behind. The objective of this work is to propose a multilingual QA system able to take full natural language questions and to retrieve information from a knowledge base. This is done by transforming the userâs question automatically into a SPARQL query that is sent to DBpedia. This work relies, among other aspects, on a set of lexico-syntactic patterns that leverage the power of language-specific syntax to generate more accurate queries