321 research outputs found
Semantic Interpretation of User Queries for Question Answering on Interlinked Data
The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data
Energy-Based Sliced Wasserstein Distance
The sliced Wasserstein (SW) distance has been widely recognized as a
statistically effective and computationally efficient metric between two
probability measures. A key component of the SW distance is the slicing
distribution. There are two existing approaches for choosing this distribution.
The first approach is using a fixed prior distribution. The second approach is
optimizing for the best distribution which belongs to a parametric family of
distributions and can maximize the expected distance. However, both approaches
have their limitations. A fixed prior distribution is non-informative in terms
of highlighting projecting directions that can discriminate two general
probability measures. Doing optimization for the best distribution is often
expensive and unstable. Moreover, designing the parametric family of the
candidate distribution could be easily misspecified. To address the issues, we
propose to design the slicing distribution as an energy-based distribution that
is parameter-free and has the density proportional to an energy function of the
projected one-dimensional Wasserstein distance. We then derive a novel sliced
Wasserstein metric, energy-based sliced Waserstein (EBSW) distance, and
investigate its topological, statistical, and computational properties via
importance sampling, sampling importance resampling, and Markov Chain methods.
Finally, we conduct experiments on point-cloud gradient flow, color transfer,
and point-cloud reconstruction to show the favorable performance of the EBSW.Comment: 36 pages, 7 figures, 6 table
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)
SPARQL basic graph pattern (BGP) (a.k.a. SQL inner-join) query optimization
is a well researched area. However, optimization of OPTIONAL pattern queries
(a.k.a. SQL left-outer-joins) poses additional challenges, due to the
restrictions on the \textit{reordering} of left-outer-joins. The occurrence of
such queries tends to be as high as 50% of the total queries (e.g., DBPedia
query logs).
In this paper, we present \textit{Left Bit Right} (LBR), a technique for
\textit{well-designed} nested BGP and OPTIONAL pattern queries. Through LBR, we
propose a novel method to represent such queries using a graph of
\textit{supernodes}, which is used to aggressively prune the RDF triples, with
the help of compressed indexes. We also propose novel optimization strategies
-- first of a kind, to the best of our knowledge -- that combine together the
characteristics of \textit{acyclicity} of queries, \textit{minimality}, and
\textit{nullification}, \textit{best-match} operators. In this paper, we focus
on OPTIONAL patterns without UNIONs or FILTERs, but we also show how UNIONs and
FILTERs can be handled with our technique using a \textit{query rewrite}. Our
evaluation on RDF graphs of up to and over one billion triples, on a commodity
laptop with 8 GB memory, shows that LBR can process \textit{well-designed}
low-selectivity complex queries up to 11 times faster compared to the
state-of-the-art RDF column-stores as Virtuoso and MonetDB, and for highly
selective queries, LBR is at par with them.Comment: SIGMOD 201
Torpedo: Improving the State-of-the-Art RDF Dataset Slicing
Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach
Deep Learning -Powered Computational Intelligence for Cyber-Attacks Detection and Mitigation in 5G-Enabled Electric Vehicle Charging Station
An electric vehicle charging station (EVCS) infrastructure is the backbone of transportation electrification. However, the EVCS has various cyber-attack vulnerabilities in software, hardware, supply chain, and incumbent legacy technologies such as network, communication, and control. Therefore, proactively monitoring, detecting, and defending against these attacks is very important. The state-of-the-art approaches are not agile and intelligent enough to detect, mitigate, and defend against various cyber-physical attacks in the EVCS system. To overcome these limitations, this dissertation primarily designs, develops, implements, and tests the data-driven deep learning-powered computational intelligence to detect and mitigate cyber-physical attacks at the network and physical layers of 5G-enabled EVCS infrastructure. Also, the 5G slicing application to ensure the security and service level agreement (SLA) in the EVCS ecosystem has been studied. Various cyber-attacks such as distributed denial of services (DDoS), False data injection (FDI), advanced persistent threats (APT), and ransomware attacks on the network in a standalone 5G-enabled EVCS environment have been considered. Mathematical models for the mentioned cyber-attacks have been developed. The impact of cyber-attacks on the EVCS operation has been analyzed. Various deep learning-powered intrusion detection systems have been proposed to detect attacks using local electrical and network fingerprints. Furthermore, a novel detection framework has been designed and developed to deal with ransomware threats in high-speed, high-dimensional, multimodal data and assets from eccentric stakeholders of the connected automated vehicle (CAV) ecosystem. To mitigate the adverse effects of cyber-attacks on EVCS controllers, novel data-driven digital clones based on Twin Delayed Deep Deterministic Policy Gradient (TD3) Deep Reinforcement Learning (DRL) has been developed. Also, various Bruteforce, Controller clones-based methods have been devised and tested to aid the defense and mitigation of the impact of the attacks of the EVCS operation. The performance of the proposed mitigation method has been compared with that of a benchmark Deep Deterministic Policy Gradient (DDPG)-based digital clones approach. Simulation results obtained from the Python, Matlab/Simulink, and NetSim software demonstrate that the cyber-attacks are disruptive and detrimental to the operation of EVCS. The proposed detection and mitigation methods are effective and perform better than the conventional and benchmark techniques for the 5G-enabled EVCS
Reachability-based Trajectory Design with Neural Implicit Safety Constraints
Generating safe motion plans in real-time is a key requirement for deploying
robot manipulators to assist humans in collaborative settings. In particular,
robots must satisfy strict safety requirements to avoid self-damage or harming
nearby humans. Satisfying these requirements is particularly challenging if the
robot must also operate in real-time to adjust to changes in its
environment.This paper addresses these challenges by proposing
Reachability-based Signed Distance Functions (RDFs) as a neural implicit
representation for robot safety. RDF, which can be constructed using supervised
learning in a tractable fashion, accurately predicts the distance between the
swept volume of a robot arm and an obstacle. RDF's inference and gradient
computations are fast and scale linearly with the dimension of the system;
these features enable its use within a novel real-time trajectory planning
framework as a continuous-time collision-avoidance constraint. The planning
method using RDF is compared to a variety of state-of-the-art techniques and is
demonstrated to successfully solve challenging motion planning tasks for
high-dimensional systems faster and more reliably than all tested methods
Facilitating the Exploitation of Linked Open Statistical Data: JSON-QB API Requirements and Design Criteria
Recently, many organizations have opened up their data for others to reuse. A major part of these data concern statistics such as demographic and social indicators. Linked Data is a promising paradigm for opening data because it facilitates data integration on the Web. Re- cently, a growing number of organizations adopted linked data paradigm and provided Linked Open Statistical Data (LOSD). These data can be exploited to create added value services and applications that require integrated data from multiple sources. In this paper, we suggest that in order to unleash the full potential of LOSD we need to facilitate the interaction with LOSD and hide most of the complexity. Moreover, we describe the requirements and design criteria of a JSON-QB API that (i) facilitates the development of LOSD tools through a style of interaction familiar to web developers and (ii) offers a uniform way to access LOSD. A proof of concept implementation of the JSON-QB API demonstrates part of the proposed functionality
- …