11 research outputs found

    pSPARQL: A Querying Language for Probabilistic RDF (Extended Abstract)

    Get PDF
    Abstract. In this paper, we present a querying language for probabilistic RDF databases, where each triple has a probability, called pSRARQL, built on SPAR-QL, recommended by W3C as a querying language for RDF databases. Firstly, we present the syntax and semantics of pSPARQL. Secondly, we define the query problem of pSPARQL corresponding to probabilities of solutions. Finally, we show that the query evaluation of general pSPARQL patterns is PSPACEcomplete

    A Novel Trip Planner Using Effective Indexing Structure

    Get PDF
    ABSTRACT: The administration of transportation frameworks has ended up progressively imperative in numerous genuine applications such as area based administrations, production network administration, movement control, et cetera. These applications normally include questions over spatial street systems with powerfully changing and confused activity conditions. In this paper, we model such a system by a probabilistic time-dependent graph (PTGraph), whose edges are connected with unverifiable postponement capacities. We propose a valuable inquiry in the PT-Graph, in particular an Trip planner query (TPQ), which recovers excursion arranges that cross a set of inquiry focuses in PT-Graph, having the base voyaging time with high certainty. To handle the proficiency issue, we display the pruning systems time interim pruning and probabilistic pruning to viably discount bogus alerts of trek arrangements. Besides, we outline a pre computation method in view of the expense model and develop a list structure over the pre computed information to empower the pruning by means of the file. We coordinate our proposed pruning techniques into a productive question system to answer TPQs. Through far reaching tests, we exhibit the proficiency and adequacy of our TPQ question noting methodology

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201

    Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty

    Get PDF
    There is a growing need for methods which can capture uncertainties and answer queries over graph-structured data. Two common types of uncertainty are uncertainty over the attribute values of nodes and uncertainty over the existence of edges. In this paper, we combine those with identity uncertainty. Identity uncertainty represents uncertainty over the mapping from objects mentioned in the data, or references, to the underlying real-world entities. We propose the notion of a probabilistic entity graph (PEG), a probabilistic graph model that defines a distribution over possible graphs at the entity level. The model takes into account node attribute uncertainty, edge existence uncertainty, and identity uncertainty, and thus enables us to systematically reason about all three types of uncertainties in a uniform manner. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop highly efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: context-aware path indexing and reduction by join-candidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude

    Efficient query processing over uncertain road networks

    Get PDF
    One of the fundamental problems on spatial road networks has been the shortest traveling time query, with applications such as location-based services (LBS) and trip planning. Algorithms have been made for the shortest time queries in deterministic road networks, in which vertices and edges are known with certainty. Emerging technologies are available and make it easier to acquire information about the traffic. In this paper, we consider uncertain road networks, in which speeds of vehicles are imprecise and probabilistic. We will focus on one important query type, continuous probabilistic shortest traveling time query (CPSTTQ), which retrieves sets of objects that have the smallest traveling time to a moving query point q from point s to point e on road networks with high confidences. We propose effective pruning methods to prune the search space of our CPSTTQ query, and design an efficient query procedure to answer CPSTTQ via an index structure

    Probabilistic Shortest Time Queries Over Uncertain Road Networks

    Get PDF
    In many real applications such as location-based services (LBS), map utilities, trip planning, and transportation systems, it is very useful and important to provide query services over spatial road networks. Nowadays we can easily obtain rich traffic information such as the speeds of vehicles on roads. However, due to the inaccuracy of devices or integration in consistencies, the traffic data (i.e., speeds) are often imprecise and uncertain. In this paper, we model road networks by uncertain graphs, which contain edges that are associated with probabilistic velocities. We formalize the problem of probabilistic shortest time query, and we propose time bound pruning and probabilistic bound pruning to filter out false alarms. Moreover, we design offline pre-computation to facilitate PSTQ processing

    Knowledge Graph Completion via Complex Tensor Factorization

    Get PDF
    In statistical relational learning, knowledge graph completion deals with automatically understanding the structure of large knowledge graphs—labeled directed graphs—and predicting missing relationships—labeled edges. State-of-the-art embedding models propose different trade-offs between modeling expressiveness, and time and space complexity. We reconcile both expressiveness and complexity through the use of complex-valued embeddings and explore the link between such complex-valued embeddings and unitary diagonalization. We corroborate our approach theoretically and show that all real square matrices—thus all possible relation/adjacency matrices—are the real part of some unitarily diagonalizable matrix. This results opens the door to a lot of other applications of square matrices factorization. Our approach based on complex embeddings is arguably simple, as it only involves a Hermitian dot product, the complex counterpart of the standard dot product between real vectors, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed complex embeddings are scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks

    The Fourth International VLDB Workshop on Management of Uncertain Data

    Get PDF

    Methods and tools for temporal knowledge harvesting

    Get PDF
    To extend the traditional knowledge base with temporal dimension, this thesis offers methods and tools for harvesting temporal facts from both semi-structured and textual sources. Our contributions are briefly summarized as follows. 1. Timely YAGO: A temporal knowledge base called Timely YAGO (T-YAGO) which extends YAGO with temporal attributes is built. We define a simple RDF-style data model to support temporal knowledge. 2. PRAVDA: To be able to harvest as many temporal facts from free-text as possible, we develop a system PRAVDA. It utilizes a graph-based semi-supervised learning algorithm to extract fact observations, which are further cleaned up by an Integer Linear Program based constraint solver. We also attempt to harvest spatio-temporal facts to track a person’s trajectory. 3. PRAVDA-live: A user-centric interactive knowledge harvesting system, called PRAVDA-live, is developed for extracting facts from natural language free-text. It is built on the framework of PRAVDA. It supports fact extraction of user-defined relations from ad-hoc selected text documents and ready-to-use RDF exports. 4. T-URDF: We present a simple and efficient representation model for time- dependent uncertainty in combination with first-order inference rules and recursive queries over RDF-like knowledge bases. We adopt the common possible-worlds semantics known from probabilistic databases and extend it towards histogram-like confidence distributions that capture the validity of facts across time. All of these components are fully implemented systems, which together form an integrative architecture. PRAVDA and PRAVDA-live aim at gathering new facts (particularly temporal facts), and then T-URDF reconciles them. Finally these facts are stored in a (temporal) knowledge base, called T-YAGO. A SPARQL-like time-aware querying language, together with a visualization tool, are designed for T-YAGO. Temporal knowledge can also be applied for document summarization.Diese Dissertation zeigt Methoden und Werkzeuge auf, um traditionelle Wissensbasen um zeitliche Fakten aus semi-strukturierten Quellen und Textquellen zu erweitern. Unsere Arbeit lässt sich wie folgt zusammenfassen. 1. Timely YAGO: Wir konstruieren eine Wissensbasis, genannt ’Timely YAGO’ (T-YAGO), die YAGO um temporale Attribute erweitert. Zusätzlich definieren wir ein einfaches RDF-ähnliches Datenmodell, das temporales Wissen unterstützt. 2. PRAVDA: Um eine möglichst große Anzahl von temporalen Fakten aus Freitext extrahieren zu können, haben wir das PRAVDA-System entwickelt. Es verwendet einen auf Graphen basierenden halbüberwachten Lernalgorithmus, um Feststellungen über Fakten zu extrahieren, die von einem Constraint-Solver, der auf einem ganzzahligen linearen Programm beruht, bereinigt werden. Wir versuchen zudem räumlich-temporale Fakten zu extrahieren, um die Bewegungen einer Person zu verfolgen. 3. PRAVDA-live: Wir entwickeln ein benutzerorientiertes, interaktives Wissensextrahiersystem namens PRAVDA-live, das Fakten aus freier, natürlicher Sprache extrahiert. Es baut auf dem PRAVDA-Framework auf. PRAVDA-live unterstützt die Erkennung von benutzerdefinierten Relationen aus ad-hoc ausgewählten Textdokumenten und den Export der Daten im RDF-Format. 4. T-URDF: Wir stellen ein einfaches und effizientes Repräsentationsmodell für zeitabhängige Ungewissheit in Verbindung mit Deduktionsregeln in Prädikatenlogik erster Stufe und rekursive Anfragen über RDF-ähnliche Wissensbasen vor. Wir übernehmen die gebräuchliche Mögliche-Welten-Semantik, bekannt durch probabilistische Datenbanken und erweitern sie in Richtung histogrammähnlicher Konfidenzverteilungen, die die Gültigkeit von Fakten über die Zeit betrachtet darstellen. Alle Komponenten sind vollständig implementierte Systeme, die zusammen eine integrative Architektur bilden. PRAVDA und PRAVDA-live zielen darauf ab, neue Fakten (insbesondere zeitliche Fakten) zu sammeln, und T-URDF gleicht sie ab. Abschließend speichern wir diese Fakten in einer (zeitlichen) Wissensbasis namens T-YAGO ab. Eine SPARQL-ähnliche zeitunterstützende Anfragesprache wird zusammen mit einem Visualisierungswerkzeug für T-YAGO entwickelt. Temporales Wissen kann auch zur Dokumentzusammenfassung genutzt werden