9 research outputs found

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    An Efficient Algorithm for Bulk-Loading xBR+ -trees

    Get PDF
    A major part of the interface to a database is made up of the queries that can be addressed to this database and answered (processed) in an efficient way, contributing to the quality of the developed software. Efficiently processed spatial queries constitute a fundamental part of the interface to spatial databases due to the wide area of applications that may address such queries, like geographical information systems (GIS), location-based services, computer visualization, automated mapping, facilities management, etc. Another important capability of the interface to a spatial database is to offer the creation of efficient index structures to speed up spatial query processing. The xBR + -tree is a balanced disk-resident quadtree-based index structure for point data, which is very efficient for processing such queries. Bulk-loading refers to the process of creating an index from scratch, when the dataset to be indexed is available beforehand, instead of creating the index gradually (and more slowly), when the dataset elements are inserted one-by-one. In this paper, we present an algorithm for bulk-loading xBR + -trees for big datasets residing on disk, using a limited amount of main memory. The resulting tree is not only built fast, but exhibits high performance in processing a broad range of spatial queries, where one or two datasets are involved. To justify these characteristics, using real and artificial datasets of various cardinalities, first, we present an experimental comparison of this algorithm vs. a previous version of the same algorithm and STR, a popular algorithm of bulk-loading R-trees, regarding tree creation time and the characteristics of the trees created, and second, we experimentally compare the query efficiency of bulk-loaded xBR + -trees vs. bulk-loaded R-trees, regarding I/O and execution time. Thus, this paper contributes to the implementation of spatial database interfaces and the efficient storage organization for big spatial data management

    A Framework for Spatio-Temporal Trajectory Data Segmentation and Query

    Get PDF
    Trajectory segmentation is a technique of dividing sequential trajectory data into segments. These segments are building blocks to various applications for big trajectory data. Hence a system framework is essential to support trajectory segment indexing, storage, and query. When the size of segments is beyond the computing capacity of a single processing node, a distributed solution is proposed. In this thesis, a distributed trajectory segmentation framework that includes a greedy-split segmentation method is created. This framework consists of distributed in-memory processing and a cluster of graph storage respectively. For fast trajectory queries, distributed spatial R-tree index of trajectory segments is applied. Using the trajectory indexes, this framework builds queries of segments from in-memory processing and from the graph storage. Based on this segmentation framework, two metrics to measure trajectory similarity and chance of collision are defined. These two metrics are further applied to identify moving groups of trajectories. This study quantitatively evaluates the effects of data partition, parallelism, and data size on the system. The study identifies the bottleneck factors at the data partition stage, and validate two mitigation solutions. The evaluation demonstrates the distributed segmentation method and the system framework scale as the growth of the workload and the size of the parallel cluster

    New Efficient Spatial Index Structures, PML-Tree and SMR-Tree, for Spatial Databases

    Get PDF
    Computer Scienc

    Multidimensional access methods

    Full text link

    A Population Analysis for Hierarchical Data Structures

    No full text
    A new method termed population analysis 1s presented for approxlmatmg the dlstrlbutlon of node occupancies m hierarchical data structures which store a variable number of geometric data items per node The basic idea 1s to describe a dynamic data structure as a set of populations which are per-mitted to transform mto one another according to certain rules The transformation rules are used to obtam a set of equations describing a population dlstrlbutlon which 1s stable under msertion of addttional mformation mto the structure These equations can then be solved, &her analytically or numerlcally, to obtain the population distribution Hierarclu-cal data structures are modeled by letting each population represent the nodes of a given occupancy A detailed analysis of quadtree data structures for storing point data IS presented, and the results are compared to experimental data Two phenomena referred to as agang and phasmg are defined and shown to account for the differences between the expert-mental results and those predicted by the model The popu-lation techmque IS compared with statistical methods of analyzing smular data structures CR Categories and Subject Descriptors E 1 [Data] Data Structures- trees, F 2 2 [Theory of Computation] Analysis of nonnumernzal algorithms and problems-Geometrical problems and computations, H 3 3 [ Informa-tion Storage and Retrieval] Content Analysis and Index-mg- mdexmg methods Key words and phrases file structures, bucketing methods, multidimensional attributes, hierarchical data structures, quadtrees Pernusslon to copy without fee all or part of this material IS granted provided that the copies are not made or chstrlbuted for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notlcc 1s given that copym

    A population analysis for hierarchical data structures

    No full text
    corecore