171 research outputs found

    Indexing techniques for object-oriented databases.

    Get PDF
    by Frank Hing-Wah Luk.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 92-95).Abstract --- p.iiAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- The Problem in Object-Oriented Database Indexing --- p.2Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Object-oriented Data Model --- p.5Chapter 2.1 --- Object-oriented Data Model --- p.5Chapter 2.2 --- Object and Object Identifiers --- p.6Chapter 2.3 --- Complex Attributes and Methods --- p.6Chapter 2.4 --- Class --- p.8Chapter 2.4.1 --- Inheritance Hierarchy --- p.8Chapter 2.4.2 --- Aggregation Hierarchy --- p.8Chapter 2.5 --- Sample Object-Oriented Database Schema --- p.9Chapter 3 --- Indexing in Object-Oriented Databases --- p.10Chapter 3.1 --- Introduction --- p.10Chapter 3.2 --- Indexing on Inheritance Hierarchy --- p.10Chapter 3.3 --- Indexing on Aggregation Hierarchy --- p.13Chapter 3.4 --- Indexing on Integrated Support --- p.16Chapter 3.5 --- Indexing on Method Invocation --- p.18Chapter 3.6 --- Indexing on Overlapping Path Expressions --- p.19Chapter 4 --- Triple Node Hierarchy --- p.23Chapter 4.1 --- Introduction --- p.23Chapter 4.2 --- Triple Node --- p.25Chapter 4.3 --- Triple Node Hierarchy --- p.26Chapter 4.3.1 --- Construction of the Triple Node Hierarchy --- p.26Chapter 4.3.2 --- Updates in the Triple Node Hierarchy --- p.31Chapter 4.4 --- Cost Model --- p.33Chapter 4.4.1 --- Storage --- p.33Chapter 4.4.2 --- Query Cost --- p.35Chapter 4.4.3 --- Update Cost --- p.35Chapter 4.5 --- Evaluation --- p.37Chapter 4.6 --- Summary --- p.42Chapter 5 --- Triple Node Hierarchy in Both Aggregation and Inheritance Hierarchies --- p.43Chapter 5.1 --- Introduction --- p.43Chapter 5.2 --- Preliminaries --- p.44Chapter 5.3 --- Class-Hierarchy Tree --- p.45Chapter 5.4 --- The Nested CH-tree --- p.47Chapter 5.4.1 --- Construction --- p.47Chapter 5.4.2 --- Retrieval --- p.48Chapter 5.4.3 --- Update --- p.48Chapter 5.5 --- Cost Model --- p.49Chapter 5.5.1 --- Assumptions --- p.51Chapter 5.5.2 --- Storage --- p.52Chapter 5.5.3 --- Query Cost --- p.52Chapter 5.5.4 --- Update Cost --- p.53Chapter 5.6 --- Evaluation --- p.55Chapter 5.6.1 --- Storage Cost --- p.55Chapter 5.6.2 --- Query Cost --- p.57Chapter 5.6.3 --- Update Cost --- p.62Chapter 5.7 --- Summary --- p.63Chapter 6 --- Decomposition of Path Expressions --- p.65Chapter 6.1 --- Introduction --- p.65Chapter 6.2 --- Configuration on Path Expressions --- p.67Chapter 6.2.1 --- Single Path Expression --- p.67Chapter 6.2.2 --- Overlapping Path Expressions --- p.68Chapter 6.3 --- New Algorithm --- p.70Chapter 6.3.1 --- Example --- p.72Chapter 6.4 --- Evaluation --- p.75Chapter 6.5 --- Summary --- p.76Chapter 7 --- Conclusion and Future Research --- p.77Chapter 7.1 --- Conclusion --- p.77Chapter 7.2 --- Future Research --- p.78Chapter A --- Evaluation of some Parameters in Chapter5 --- p.79Chapter B --- Cost Model for Nested-Inherited Index --- p.82Chapter B.1 --- Storage --- p.82Chapter B.2 --- Query Cost --- p.84Chapter B.3 --- Update --- p.84Chapter C --- Algorithm constructing a minimum auxiliary set of J Is --- p.87Chapter D --- Estimation on the number of possible combinations --- p.89Bibliography --- p.9

    Concept Trees: Building Dynamic Concepts from Semi-Structured Data using Nature-Inspired Methods

    Full text link
    This paper describes a method for creating structure from heterogeneous sources, as part of an information database, or more specifically, a 'concept base'. Structures called 'concept trees' can grow from the semi-structured sources when consistent sequences of concepts are presented. They might be considered to be dynamic databases, possibly a variation on the distributed Agent-Based or Cellular Automata models, or even related to Markov models. Semantic comparison of text is required, but the trees can be built more, from automatic knowledge and statistical feedback. This reduced model might also be attractive for security or privacy reasons, as not all of the potential data gets saved. The construction process maintains the key requirement of generality, allowing it to be used as part of a generic framework. The nature of the method also means that some level of optimisation or normalisation of the information will occur. This gives comparisons with databases or knowledge-bases, but a database system would firstly model its environment or datasets and then populate the database with instance values. The concept base deals with a more uncertain environment and therefore cannot fully model it beforehand. The model itself therefore evolves over time. Similar to databases, it also needs a good indexing system, where the construction process provides memory and indexing structures. These allow for more complex concepts to be automatically created, stored and retrieved, possibly as part of a more cognitive model. There are also some arguments, or more abstract ideas, for merging physical-world laws into these automatic processes.Comment: Pre-prin

    Incremental characterization of RDF Triple Stores

    Get PDF
    Many semantic web applications integrate data from distributed triple stores and to be efficient, they need to know what kind of content each triple store holds in order to assess if it can contribute to its queries. We present an algorithm to build indexes summarizing the content of triple stores. We extended Depth-First Search coding to provide a canonical representation of RDF graphs and we introduce a new join operator between two graph codes to optimize the generation of an index. We provide an incremental update algorithm and conclude with tests on real datasets

    Persistent Data Structures for Incremental Join Indices

    Get PDF
    Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred. The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case. This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations

    DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores

    Get PDF
    International audienceSemantic web applications often access distributed triple stores relying on different ontologies and maintaining bases of RDF annotations about different domains. Use cases often involve queries which results combine pieces of annotations distributed over several bases maintained on different servers. In this context, one key issue is to characterize the content of RDF bases to be able to identify their potential contributions to the processing of a query. In this paper we propose an algorithm to extract a compact representation of the content of an RDF repository. We first improve the canonical representation of RDF graphs based on DFS code proposed in the literature. We then provide a join operator to significantly reduce the number of frequent graph patterns generated from the analysis of the content of the base, and we reduce the index size by keeping only the graph patterns with maximal coverage. Our algorithm has been tested on different data sets as discussed in conclusion

    Interaction-aware development environments: recording, mining, and leveraging IDE interactions to analyze and support the development flow

    Get PDF
    Nowadays, software development is largely carried out using Integrated Development Environments, or IDEs. An IDE is a collection of tools and facilities to support the most diverse software engineering activities, such as writing code, debugging, and program understanding. The fact that they are integrated enables developers to find all the tools needed for the development in the same place. Each activity is composed of many basic events, such as clicking on a menu item in the IDE, opening a new user interface to browse the source code of a method, or adding a new statement in the body of a method. While working, developers generate thousands of these interactions, that we call fine-grained IDE interaction data. We believe this data is a valuable source of information that can be leveraged to enable better analyses and to offer novel support to developers. However, this data is largely neglected by modern IDEs. In this dissertation we propose the concept of "Interaction-Aware Development Environments": IDEs that collect, mine, and leverage the interactions of developers to support and simplify their workflow. We formulate our thesis as follows: Interaction-Aware Development Environments enable novel and in- depth analyses of the behavior of software developers and set the ground to provide developers with effective and actionable support for their activities inside the IDE. For example, by monitoring how developers navigate source code, the IDE could suggest the program entities that are potentially relevant for a particular task. Our research focuses on three main directions: 1. Modeling and Persisting Interaction Data. The first step to make IDEs aware of interaction data is to overcome its ephemeral nature. To do so we have to model this new source of data and to persist it, making it available for further use. 2. Interpreting Interaction Data. One of the biggest challenges of our research is making sense of the millions of interactions generated by developers. We propose several models to interpret this data, for example, by reconstructing high-level development activities from interaction histories or measure the navigation efficiency of developers. 3. Supporting Developers with Interaction Data. Novel IDEs can use the potential of interaction data to support software development. For example, they can identify the UI components that are potentially unnecessary for the future and suggest developers to close them, reducing the visual cluttering of the IDE

    Business Intelligence on Non-Conventional Data

    Get PDF
    The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

    DescribeX: A Framework for Exploring and Querying XML Web Collections

    Full text link
    This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogeneous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX's light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page
    • …
    corecore