117,037 research outputs found

    Performance comparison of point and spatial access methods

    Get PDF
    In the past few years a large number of multidimensional point access methods, also called multiattribute index structures, has been suggested, all of them claiming good performance. Since no performance comparison of these structures under arbitrary (strongly correlated nonuniform, short "ugly") data distributions and under various types of queries has been performed, database researchers and designers were hesitant to use any of these new point access methods. As shown in a recent paper, such point access methods are not only important in traditional database applications. In new applications such as CAD/CIM and geographic or environmental information systems, access methods for spatial objects are needed. As recently shown such access methods are based on point access methods in terms of functionality and performance. Our performance comparison naturally consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in part I I spatial access methods for rectangles will be compared. In part I we present a survey and classification of existing point access methods. Then we carefully select the following four methods for implementation and performance comparison under seven different data files (distributions) and various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is robust under ugly data and queries. In part I I we compare spatial access methods for rectangles. After presenting a survey and classification of existing spatial access methods we carefully selected the following four methods for implementation and performance comparison under six different data files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree. This comparison is a first step towards a standardized testbed or benchmark. We offer our data and query files to each designer of a new point or spatial access method such that he can run his implementation in our testbed

    Network dependence in multi-indexed data on international trade flows

    Get PDF
    Faced with the problem that conventional multidimensional fixed effects models only focus on unobserved heterogeneity, but ignore any potential cross-sectional dependence due to network interactions, we introduce a model of trade flows between countries over time that allows for network dependence in flows, based on sociocultural connectivity structures. We show that conventional multidimensional fixed effects model specifications exhibit cross-sectional dependence between countries that should be modeled to avoid simultaneity bias. Given that the source of network interaction is unknown, we propose a panel gravity model that examines multiplenetwork interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo estimation methods that produce a Monte Carlo integration estimate of the log-marginal likelihood that can be used for model comparison. Application of the model to a panel of trade flows points to network spillover effects, suggesting the presence of network dependence and biased estimates from conventional trade flow specifications. The most important sources of network dependence were found to be membership in trade organizations, historical colonial ties, common currency, and spatial proximity of countries.Series: Working Papers in Regional Scienc

    Mixed Tree and Spatial Representation of Dissimilarity Judgments

    Get PDF
    Whereas previous research has shown that either tree or spatial representations of dissimilarity judgments may be appropriate, focussing on the comparative fit at the aggregate level, we investigate whether there is heterogeneity among subjects in the extent to which their dissimilarity judgments are better represented by ultrametric tree or spatial multidimensional scaling models. We develop a mixture model for the analysis of dissimilarity data, that is formulated in a stochastic context, and entails a representation and a measurement model component. The latter involves distributional assumptions on the measurement error, and enables estimation by maximum likelihood. The representation component allows dissimilarity judgments to be represented either by a tree structure or by a spatial configuration, or a mixture of both. In order to investigate the appropriateness of tree versus spatial representations, the model is applied to twenty empirical data sets. We compare the fit of our model with that of aggregate tree and spatial models, as well as with mixtures of pure trees and mixtures of pure spaces, respectively. We formulate some empirical generalizations on the relative importance of tree versus spatial structures in representing dissimilarity judgments at the individual level.Multidimensional scaling;tree models;mixture models;dissimilarity judgments

    Data Management and Mining in Astrophysical Databases

    Full text link
    We analyse the issues involved in the management and mining of astrophysical data. The traditional approach to data management in the astrophysical field is not able to keep up with the increasing size of the data gathered by modern detectors. An essential role in the astrophysical research will be assumed by automatic tools for information extraction from large datasets, i.e. data mining techniques, such as clustering and classification algorithms. This asks for an approach to data management based on data warehousing, emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Clustering and classification techniques, on large datasets, pose additional requirements: computational and memory scalability with respect to the data size, interpretability and objectivity of clustering or classification results. In this study we address some possible solutions.Comment: 10 pages, Late
    • 

    corecore