Search CORE

13 research outputs found

Visually querying object-oriented databases

Author: Chavda Manoj
Publication venue: Department of Computer Science
Publication date: 01/01/1997
Field of study

Bibliography: pages 141-145.As database requirements increase, the ability to construct database queries efficiently becomes more important. The traditional means of querying a database is to write a textual query, such as writing in SQL to query a relational database. Visual query languages are an alternative means of querying a database; a visual query language can embody powerful query abstraction and user feedback techniques, thereby making them potentially easier to use. In this thesis, we develop a visual query system for ODMG-compliant object-oriented databases, called QUIVER. QUIVER has a comprehensive expressive power; apart from supporting data types such as sets, bags, arrays, lists, tuples, objects and relationships, it supports aggregate functions, methods and sub-queries. The language is also consistent, as constructs with similar functionality have similar visual representations. QUIVER uses the DOT layout engine to automatically layout a query; QUIVER queries are easily constructed, as the system does not constrain the spatial arrangement of query items. QUIVER also supports a query library, allowing queries to be saved, retrieved and shared among users. A substantial part of the design has been implemented using the ODMG-compliant database system O₂, and the usability of the interface as well as the query language itself is presented. Visual queries are translated to OQL, the standard query language proposed by the ODMG, and query answers are presented using O₂ Look. During the course of our investigation, we conducted a user evaluation to compare QUIVER and OQL. The results were extremely encouraging in favour of QUIVER

Cape Town University OpenUCT

High Level Efficiency in Database Languages

Author: Larsen Kim Skak
Publication venue: 'Aarhus University Library'
Publication date: 01/01/1993
Field of study

The subject of this Ph.D. thesis is the design and implementation of database languages. The thesis consists of five articles: [1] Joan F. Boyar and Kim S. Larsen. Efficient Rebalancing of Chromatic Search Trees. In O. Nurmi and E. Ukkonen, eds., LNCS 621: Algorithm Theory -- SWAT'92 , pp. 151-164. Springer-Verlag, 1992. [2] Kim S. Larsen. On Aggregation and Computation on Domain Values. PB-414, Computer Science Department, Aarhus University, 1992. [3] Kim S. Larsen. Strategies for Expression Evaluation Using Sort-Merge Algorithms. PB-415, Computer Science Department, Aarhus University, 1992. [4] Kim S. Larsen and Michael I. Schwartzbach. Injectivity of Unary Queries With Computation on Domain Values. Computer Science Department, Aarhus University, 1992. Revised version of PB-311. [5] Kim S. Larsen, Michael I. Schwartzbach and Erik M. Schmidt. A New Formalism for Relational Algebra. IPL , 41(3):163-168, 1992. and this survey paper. In [5], a new query language design is proposed. The expressive power of the language is determined in [2] and all reasonable extensions are considered. In [3, 4], we focus on the optimization issue of avoiding unnecessary sorting of relations. The results in these papers are directly applicable to any algebra-based query language. In addition to the query language part, a database system also has to offer update facilities. The theory of standard tuple based updates is quite well developed in the sequential case. In [1], we discuss a new concurrent implementation of balanced search trees for that purpose.This survey paper describes the results of the papers which form the thesis, and relates these results to each other and to the area in a broader sense than is customary in the introductions of individual papers. The paper is intended to be read in combination with the papers on which it is based

CiteSeerX

Tidsskrift.dk (Det Kongelige Bibliotek)

Provenance in Collaborative Data Sharing

Author: Karvounarakis Grigoris
Publication venue: ScholarlyCommons
Publication date: 01/07/2009
Field of study

This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques

ScholarlyCommons@Penn

Joining Extractions of Regular Expressions

Author: Freydenberger Dominik D.
Kimelfeld Benny
Peterfreund Liat
Publication venue
Publication date: 30/03/2017
Field of study

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

arXiv.org e-Print Archive

Crossref

Loughborough University Institutional Repository

Knowledge discovery in multi-relational graphs

Author: Almagro Blanco Pedro
Publication venue
Publication date: 30/06/2017
Field of study

Ante el reducido abanico de metodologías para llevar a cabo tareas de aprendizaje automático relacional, el objetivo principal de esta tesis es realizar un análisis de los métodos existentes, modificando u optimizando en la medida de lo posible algunos de ellos, y aportar nuevos métodos que proporcionen nuevas vías para abordar esta difícil tarea. Para ello, y sin nombrar objetivos relacionados con revisiones bibliográficas ni comparativas entre modelos e implementaciones, se plantean una serie de objetivos concretos a ser cubiertos: 1. Definir estructuras flexibles y potentes que permitan modelar fenómenos en base a los elementos que los componen y a las relaciones establecidas entre éstos. Dichas estructuras deben poder expresar de manera natural propiedades complejas (valores continuos o categóricos, vectores, matrices, diccionarios, grafos,...) de los elementos, así como relaciones heterogéneas entre éstos que a su vez puedan poseer el mismo nivel de propiedades complejas. Además, dichas estructuras deben permitir modelar fenómenos en los que las relaciones entre los elementos no siempre se dan de forma binaria (intervienen únicamente dos elementos), sino que puedan intervenir un número cualquiera de ellos. 2. Definir herramientas para construir, manipular y medir dichas estructuras. Por muy potente y flexible que sea una estructura, será de poca utilidad si no se poseen las herramientas adecuadas para manipularla y estudiarla. Estas herramientas deben ser eficientes en su implementación y cubrir labores de construcción y consulta. 3. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja negra. En aquellas tareas en las que nuestro objetivo no es obtener modelos explicativos, podremos permitirnos utilizar modelos de caja negra, sacrificando la interpretabilidad a favor de una mayor eficiencia computacional. 4. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja blanca. Cuando estamos interesados en una explicación acerca del funcionamiento de los sistemas que se analizan, buscaremos modelos de aprendizaje automático de caja blanca. 5. Mejorar las herramientas de consulta, análisis y reparación para bases de datos. Algunas de las consultas a larga distancia en bases de datos presentan un coste computacional demasiado alto, lo que impide realizar análisis adecuados en algunos sistemas de información. Además, las bases de datos en grafo carecen de métodos que permitan normalizar o reparar los datos de manera automática o bajo la supervisión de un humano. Es interesante aproximarse al desarrollo de herramientas que lleven a cabo este tipo de tareas aumentando la eficiencia y ofreciendo una nueva capa de consulta y normalización que permita curar los datos para un almacenamiento y una recuperación más óptimos. Todos los objetivos marcados son desarrollados sobre una base formal sólida, basada en Teoría de la Información, Teoría del Aprendizaje, Teoría de Redes Neuronales Artificiales y Teoría de Grafos. Esta base permite que los resultados obtenidos sean suficientemente formales como para que los aportes que se realicen puedan ser fácilmente evaluados. Además, los modelos abstractos desarrollados son fácilmente implementables sobre máquinas reales para poder verificar experimentalmente su funcionamiento y poder ofrecer a la comunidad científica soluciones útiles en un corto espacio de tiempo

idUS. Depósito de Investigación Universidad de Sevilla

Joining extractions of regular expressions

Author: Benny Kimelfeld (7169462)
Dominik Freydenberger (3718891)
Liat Peterfreund (7169465)
Publication venue
Publication date: 01/01/2018
Field of study

Regular expressions with capture variables, also known as “regex formulas,” extract relations of spans (interval positions) from text. These relations can be further manipulated via the relational Algebra as studied in the context of “document spanners,” Fagin et al.’s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. Such queries have been investigated in prior work on document spanners, but little is known about the (combined) complexity of their evaluation. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text. Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

Loughborough University Institutional Repository

A new Nested Graph Model for Data Integration

Author: Bergami Giacomo <1990>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 20/04/2018
Field of study

Despite graph data gained increasing interest in several fields, no data model suitable for both querying and integrating differently structured graph and (semi)structured data has been currently conceived. The lack of operators allowing combinations of (multiple) graphs in current graph query languages (graph joins), and on graph data structure allowing neither data integration nor nested multidimensional representations (graph nesting) are a possible motivation. In order to make such data integration possible, this thesis proposes a novel model (General Semistructured data Model) allowing the representation of both graphs and arbitrarily nested contents (e.g., one node can be contained by more than just one parent node), thus allowing the definition of a nested graph model, where both vertices and edges may include (overlapping) graphs. We provide two graph joins algorithms (Graph Conjunctive Equijoin Algorithm and Graph Conjunctive Less-equal Algorithm) and one graph nesting algorithm (Two HOp Separated Patterns). Their evaluation on top of our secondary memory representation showed the inefficiency of existing query languages’ query plan on top of their respective data models (relational, graph and document-oriented). In all three algorithms, the enhancement was possible by using an adjacency list graph representation, thus reducing the cost of joining the vertices with their respective outgoing (or ingoing) edges, and by associating hash values to both vertices and edges. As a secondary outcome of this thesis, a general data integration scenario is provided where both graph data and other semistructured and structured data could be represented and integrated into the General Semistructured data Model. A new query language outlines the feasibility of this approach (General Semistructured Query Language) over the former data model, also allowing to express both graph joins and graph nestings. This language is also capable of representing both traversal and data manipulation operators

AMS Tesi di Dottorato

Increasing productivity in High Energy Physics data mining with a Domain Specific Visual Query Language

Author: Amaral Vasco Miguel Moreira do
Publication venue: Universität Mannheim
Publication date: 01/01/2004
Field of study

Diese Arbeit entwickelt die erste anwendungsspezifische visuelle Anfragesprache für Hochenergiephysik. Nach dem aktuellen Stand der Technik ist Analyse von experimentellen Ergebnissen in der Hochenergiephysik ein sehr aufwendiger Vorgang. Die Verwendung allgemeiner höherer Programmiersprachen und komplexer Bibliotheken für die Erstellung und Wartung der Auswertungssoftware lenkt die Wissenschaftler von den Kernfragen ihres Gebiets ab. Unser Ansatz führt eine neue Abstraktionsebene in Form einer visuellen Programmiersprache ein, in der die Physiker die gewünschten Ergebnisse in einer ihrem Anwendungsgebiet nahen Notation formulieren können. Die Validierung der Hypothese erfolgte durch die Entwicklung einer Sprache und eines Software-Prototyps. Neben einer formalen Syntax wird die Sprache durch eine translationale Semantik definiert. Die Semantik wird dabei mittels einer Übersetzung in eine durch spezielle Gruppierungsoperatoren erweiterte NF2-Algebra spezifiziert. Die vom Benutzer erstellten visuellen Anfragen werden durch einen Compiler in Code für eine Zielplattform übersetzt. Die Benutzbarkeit der Sprache wurde durch eine Benutzerstudie validiert, deren qualitative und quantitative Ergebnisse vorgestellt werden

MAnnheim DOCument Server