Search CORE

92 research outputs found

The representation and management of evolving features in geospatial databases

Author: Lohfink Alex
Publication venue: University of Glamorgan
Publication date: 01/01/2009
Field of study

Geographic features change over time, this change being the result of some kind of event or occurrence. It has been a research challenge to represent this data in a manner that reflects human perception. Most database systems used in geographic information systems (GIS) are relational, and change is either captured by exhaustively storing all versions of data, or updates replace previous versions. This stems from the inherent diffculty of modelling geographic objects in relational tables. This diffculty is compounded when the necessary time dimension is introduced to model how those objects evolve. There is little doubt that the object-oriented (OO) paradigm holds signi cant advantages over the relational model when it comes to modelling real-world entities and spatial data, and it is argued that this contention is particularly true when it comes to spatio-temporal data. This thesis describes an object-oriented approach to the design of a conceptual model for representing spatio-temporal geographic data, called the Feature Evolution Model (FEM), based on states and events. The model was used to implement a spatio-temporal database management system in Oracle Spatial, and an interface prototype is described that was used to evaluate the system by enabling querying and visualisation

University of South Wales Research Explorer

OpenGrey Repository

Glamorgan Dspace

Efficient Management of Short-Lived Data

Author: Jensen Christian S.
Schmidt Albrecht
Publication venue
Publication date: 01/01/2005
Field of study

Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl

arXiv.org e-Print Archive

VBN

Recommended from our members

The representation of time in data warehouses

Author: Todman Christopher Derek
Publication venue
Publication date: 01/01/1999
Field of study

This thesis researches the problems concerning the specification and implementation of the temporal requirements in data warehouses. The thesis focuses on two areas, firstly, the methods for identifying and capturing the business information needs and associated temporal requirements at the conceptual level and; secondly, methods for classifying and implementing the requirements at the logical level using the relational model. At the conceptual level, eight candidate methodologies were investigated to examine their suitability for the creation of data models that are appropriate for a data warehouse. The methods were evaluated to assess their representation of time, their ability to reflect the dimensional nature of data warehouse models and their simplicity of use. The research found that none of the methods under review fully satisfied the criteria. At the logical level, the research concluded that the methods widely used in current practice result in data structures that are either incapable of answering some very basic questions involving history or that return inaccurate results. Specific proposals are made in three areas. Firstly, a new conceptual model is described that is designed to capture the information requirements for dimensional models and has full support for time. Secondly, a new approach at the logical level is proposed. It provides the data structures that enable the requirements captured in the conceptual model to be implemented, thus enabling the historical questions to be answered simply and accurately. Thirdly, a set of rules is developed to help minimise the inaccuracy caused by time. A guide has been produced that provides practitioners with the tools and instructions on how to implement data warehouses using the methods developed in the thesis

Open Research Online (The Open University)

Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

Author: Bell Charles Andrew
Publication venue: VCU Scholars Compass
Publication date: 01/01/2005
Field of study

Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS

VCU Scholars Compass

Recommended from our members

Data Management Solutions for Tackling Big Data Variety

Author: Arora Vaibhav
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies(OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well asdiverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiplesensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications

eScholarship - University of California

Accessing multiversion data in database transactions

Author: Haapasalo Tuukka
Publication venue: Aalto-yliopiston teknillinen korkeakoulu
Publication date: 01/01/2010
Field of study

Many important database applications need to access previous versions of the data set, thus requiring that the data are stored in a multiversion database and indexed with a multiversion index, such as the multiversion B+-tree (MVBT) of Becker et al. The MVBT is optimal, so that any version of the database can be accessed as efficiently as with a single-version B+-tree that is used to index only the data items of that version, but it cannot be used in a full-fledged database system because it follows a single-update model, and the update cannot be rolled back. We have redesigned the MVBT index so that a single multi-action updating transaction can operate on the index structure concurrently with multiple concurrent read-only transactions. Data items created by the transaction become part of the same version, and the transaction can roll back. We call this structure the transactional MVBT (TMVBT). The TMVBT index remains optimal even in the presence of logical key deletions. Even though deletions in a multiversion index must not physically delete the history of the data items, queries and range scans can become more efficient, if the leaf pages of the index structure are merged to retain optimality. For the general transactional setting with multiple updating transactions, we propose a multiversion database structure called the concurrent MVBT (CMVBT), which stores the updates of active transactions in a separate main-memory-resident versioned B+-tree index. A system maintenance transaction is periodically run to apply the updates of committed transactions into the TMVBT index. We show how multiple updating transactions can operate on the CMVBT index concurrently, and our recovery algorithm is based on the standard ARIES recovery algorithm. We prove that the TMVBT index is asymptotically optimal, and show that the performance of the CMVBT index in general transaction processing is on par with the performance of the time-split B+-tree (TSB-tree) of Lomet and Salzberg. The TSB-tree does not merge leaf pages and is therefore not optimal if logical data-item deletions are allowed. Our experiments show that the CMVBT outperforms the TSB-tree with range queries in the presence of deletions

Aaltodoc Publication Archive

Towards a big data reference architecture

Author: Maier M.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

Rover-II: A Context-Aware Middleware for Pervasive Computing Environment

Author: Krishnamoorthy Shivsubramani
Publication venue
Publication date: 01/01/2012
Field of study

It is well recognized that context plays a significant role in all human endeavors. All decisions are based on information which has to be interpreted in context. By making information systems context-aware we can have systems that significantly enhance human capabilities to make critical decisions. A major challenge of context-aware systems is to balance usability with generality and extensibility. The relevant context changes depending on the particular application. The model used to represent the context and its relationship to entities must be general enough to allow additions of context categories without redesign while remaining usable across many applications. Also, while efforts are put in by application designers and developers to make applications context-aware, these efforts are customized to specific needs of the target application, and only certain common contexts like location and time are taken into account. Therefore, a general framework is called for that can (i) efficiently maintain, represent and integrate contextual information, (ii) act as an integration platform where different applications can share contexts and (iii) provide relevant services to make efficient use of the contextual information. This dissertation presents: * a generic and effective context model - Rover Context Model (RoCoM) that is structured around four primitives: entities, events, relationships, and activities; and practically usable through the concept of templates, * a flexible, extensible and generic ontology - Rover Context Model Ontology (RoCoMO) supporting the model, that addresses the shortcomings of existing ontologies, * an effective mechanism of modeling the context of a situation, through the concept of relevant context, with the help of situation graph, efficiently handling and making best use of context information, * a context middleware - Rover-II, which serves as a framework for contextual information integration, that could be used not just to store and compile the contextual information, but also integrate relevant services to enhance the context information; and more importantly, enable sharing of context among the applications subscribed to it, * the initial design and implementation of a distributed architecture for Rover-II, following a P2P arrangement inspired from Tapestry, The above concepts are illustrated through M-Urgency, a context-aware public safety system that has been deployed at the University of Maryland Police Department

Digital Repository at the University of Maryland

Query processing in temporal object-oriented databases

Author: Wang L.
Wang L.
Publication venue
Publication date: 01/01/1999
Field of study

This PhD thesis is concerned with historical data management in the context of objectoriented databases. An extensible approach has been explored to processing temporal object queries within a uniform query framework. By the uniform framework, we mean temporal queries can be processed within the existing object-oriented framework that is extended from relational framework, by extending the existing query processing techniques and strategies developed for OODBs and RDBs. The unified model of OODBs and RDBs in UmSQL/X has been adopted as a basis for this purpose. A temporal object data model is thereby defined by incorporating a time dimension into this unified model of OODBs and RDBs to form temporal relational-like cubes but with the addition of aggregation and inheritance hierarchies. A query algebra, that accesses objects through these associations of aggregation, inheritance and timereference, is then defined as a general query model /language. Due to the extensive features of our data model and reducibility of the algebra, a layered structure of query processor is presented that provides a uniforrn framework for processing temporal object queries. Within the uniform framework, query transformation is carried out based on a set of transformation rules identified that includes the known relational and object rules plus those pertaining to the time dimension. To evaluate a temporal query involving a path with timereference, a strategy of decomposition is proposed. That is, evaluation of an enhanced path, which is defined to extend a path with time-reference, is decomposed by initially dividing the path into two sub-paths: one containing the time-stamped class that can be optimized by making use of the ordering information of temporal data and another an ordinary sub-path (without time-stamped classes) which can be further decomposed and evaluated using different algorithms. The intermediate results of traversing the two sub-paths are then joined together to create the query output. Algorithms for processing the decomposed query components, i. e., time-related operation algorithms, four join algorithms (nested-loop forward join, sort-merge forward join, nested-loop reverse join and sort-merge reverse join) and their modifications, have been presented with cost analysis and implemented with stream processing techniques using C++. Simulation results are also provided. Both cost analysis and simulation show the effects of time on the query processing algorithms: the join time cost is linearly increased with the expansion in the number of time-epochs (time-dimension in the case of a regular TS). It is also shown that using heuristics that make use of time information can lead to a significant time cost saving. Query processing with incomplete temporal data has also been discussed

Middlesex University Research Repository