682 research outputs found

    Performance comparison of point and spatial access methods

    Get PDF
    In the past few years a large number of multidimensional point access methods, also called multiattribute index structures, has been suggested, all of them claiming good performance. Since no performance comparison of these structures under arbitrary (strongly correlated nonuniform, short "ugly") data distributions and under various types of queries has been performed, database researchers and designers were hesitant to use any of these new point access methods. As shown in a recent paper, such point access methods are not only important in traditional database applications. In new applications such as CAD/CIM and geographic or environmental information systems, access methods for spatial objects are needed. As recently shown such access methods are based on point access methods in terms of functionality and performance. Our performance comparison naturally consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in part I I spatial access methods for rectangles will be compared. In part I we present a survey and classification of existing point access methods. Then we carefully select the following four methods for implementation and performance comparison under seven different data files (distributions) and various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is robust under ugly data and queries. In part I I we compare spatial access methods for rectangles. After presenting a survey and classification of existing spatial access methods we carefully selected the following four methods for implementation and performance comparison under six different data files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree. This comparison is a first step towards a standardized testbed or benchmark. We offer our data and query files to each designer of a new point or spatial access method such that he can run his implementation in our testbed

    A Survey on Spatial Indexing

    Get PDF
    Spatial information processing has been a centre of attention of research in the previous decade. In spatial databases, data related with spatial coordinates and extents are retrieved based on spatial proximity. A large number of spatial indexes have been proposed to make ease of efficient indexing of spatial objects in large databases and spatial data retrieval. The goal of this paper is to review the advance techniques of the access methods. This paper tries to classify the existing multidimensional access methods, according to the types of indexing, and their performance over spatial queries. K-d trees out performs quad tress without requiring additional memory usage

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Portable High-Performance Indexing for Vector Product Format Spatial Databases

    Get PDF
    Geo-spatial databases have an overall performance problem because of their complexity and large size. For this reason, many researchers seek new ways to improve the overall performance of geo-spatial databases. Typically, these research efforts are focused on complex indexing structures and query processing methods to capture the relationships between the individual features of fully-functional geo-spatial databases. Visualization applications, such as combat simulators and mission planning tools, suffer from the general performance problems associated with geo-spatial databases. This research focuses on building a high-performance geo-spatial database for visualization applications. The main approach is to simplify the complex data model and to index it with high-performance indexing structures. Complex features are reduced to simple primitives, then indexed using a combination of a disk-based array and B(+)-Trees. Test results show that there is a significant performance improvement gained by the new data model and indexing schema for low to medium zoom levels. For high zoom levels, there is a performance drop due to the indexing schema\u27s overhead

    Implementation of a spatial database system

    Get PDF
    Computer Scienc

    Concurrent Image Processing Executive (CIPE)

    Get PDF
    The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented

    Stochastic Tree Models for Macroevolution: Development, Validation and Application

    Get PDF
    Phylogenetic trees capture the relationships between species and can be investigated by morphological and/or molecular data. When focusing on macroevolution, one considers the large-scale history of life with evolutionary changes affecting a single species of the entire clade leading to the enormous diversity of species obtained today. One major problem of biology is the explanation of this biodiversity. Therefore, one may ask which kind of macroevolutionary processes have given rise to observable tree shapes or patterns of species distribution which refers to the appearance of branching orders and time periods. Thus, with an increasing number of known species in the context of phylogenetic studies, testing hypotheses about evolution by analyzing the tree shape of the resulting phylogenetic trees became matter of particular interest. The attention of using those reconstructed phylogenies for studying evolutionary processes increased during the last decades. Many paleontologists (Raup et al., 1973; Gould et al., 1977; Gilinsky and Good, 1989; Nee, 2004) tried to describe such patterns of macroevolution by using models for growing trees. Those models describe stochastic processes to generate phylogenetic trees. Yule (1925) was the first who introduced such a model, the Equal Rate Markov (ERM) model, in the context of biological branching based on a continuous-time, uneven branching process. In the last decades, further dynamical models were proposed (Yule, 1925; Aldous, 1996; Nee, 2006; Rosen, 1978; Ford, 2005; Hernández-García et al., 2010) to address the investigation of tree shapes and hence, capture the rules of macroevolutionary forces. A common model, is the Aldous\\\'' Branching (AB) model, which is known for generating trees with a similar structure of \\\"real\\\" trees. To infer those macroevolutionary forces structures, estimated trees are analyzed and compared to simulated trees generated by models. There are a few drawbacks on recent models such as a missing biological motivation or the generated tree shape does not fit well to one observed in empirical trees. The central aim of this thesis is the development and study of new biologically motivated approaches which might help to better understand or even discover biological forces which lead to the huge diversity of organisms. The first approach, called age model, can be defined as a stochastic procedure which describes the growth of binary trees by an iterative stochastic attachment of leaves, similar to the ERM model. At difference with the latter, the branching rate at each clade is no longer constant, but decreasing in time, i.e., with the age. Thus, species involved in recent speciation events have a tendency to speciate again. The second introduced model, is a branching process which mimics the evolution of species driven by innovations. The process involves a separation of time scales. Rare innovation events trigger rapid cascades of diversification where a feature combines with previously existing features. The model is called innovation model. Three data sets of estimated phylogenetic trees are used to analyze and compare the produced tree shape of the new growth models. A tree shape statistic considering a variety of imbalance measurements is performed. Results show that simulated trees of both growth models fit well to the tree shape observed in real trees. In a further study, a likelihood analysis is performed in order to rank models with respect to their ability to explain observed tree shapes. Results show that the likelihoods of the age model and the AB model are clearly correlated under the trees in the databases when considering small and medium-sized trees with up to 19 leaves. For a data set, representing of phylogenetic trees of protein families, the age model outperforms the AB model. But for another data set, representing phylogenetic trees of species, the AB model performs slightly better. To support this observation a further analysis using larger trees is necessary. But an exact computation of likelihoods for large trees implies a huge computational effort. Therefore, an efficient method for likelihood estimation is proposed and compared to the estimation using a naive sampling strategy. Nevertheless, both models describe the tree generation process in a way which is easy to interpret biologically. Another interesting field of research in biology is the coevolution between species. This is the interaction of species across groups such that the evolution of a species from one group can be triggered by a species from another group. Most prominent examples are systems of host species and their associated parasites. One problem is the reconciliation of the common history of both groups of species and to predict the associations between ancestral hosts and their parasites. To solve this problem some algorithmic methods have been developed in recent years. But only a few host parasite systems have been analyzed in sufficient detail which makes an evaluation of these methods complex. Within the scope of coevolution, the proposed age model is applied to the generation of cophylogenies to evaluate such host parasite reconciliation methods. The presented age model as well as the innovation model produce tree shapes which are similar to obtained tree structures of estimated trees. Both models describe an evolutionary dynamics and might provide a further opportunity to infer macroevolutionary processes which lead to the biodiversity which can be obtained today. Furthermore with the application of the age model in the context of coevolution by generating a useful benchmark set of cophylogenies is a first step towards systematic studies on evaluating reconciliation methods

    Machine learning for particle identification & deep generative models towards fast simulations for the Alice Transition Radiation Detector at CERN

    Get PDF
    This Masters thesis outlines the application of machine learning techniques, predominantly deep learning techniques, towards certain aspects of particle physics. Its two main aims: particle identification and high energy physics detector simulations are pertinent to research avenues pursued by physicists working with the ALICE (A Large Ion Collider Experiment) Transition Radiation Detector (TRD), within the Large Hadron Collider (LHC) at CERN (The European Organization for Nuclear Research)

    Geometric Issues in Spatial Indexing

    Get PDF
    We address a number of geometric issues in spatial indexes. One area of interest is spherical data. Two main examples are the locations of stars in the sky and geodesic data. The first part of this dissertation addresses some of the challenges in handling spherical data with a spatial database. We show that a practical approach for integrating spherical data in a conventional spatial database is to use a suitable mapping from the unit sphere to a rectangle. This allows us to easily use conventional two-dimensional spatial data structures on spherical data. We further describe algorithms for handling spherical data. In the second part of the dissertation, we introduce the areal projection, a novel projection which is computationally efficient and has low distortion. We show that the areal projection can be utilized for developing an efficient method for low distortion quantization of unit normal vectors. This is helpful for compact storage of spherical data and has applications in computer graphics. We introduce the QuickArealHex algorithm, a fast algorithm for quantization of surface normal vectors with very low distortion. The third part of the dissertation deals with a CPU time analysis of TGS, an R-tree bulkloading algorithm. And finally, the fourth part of the dissertation analyzes the BV-tree, a data structure for storing multi-dimensional data on secondary storage. Contrary to the popular belief, we show that the BV-tree is only applicable to binary space partitioning of the underlying data space
    • …
    corecore