117,037 research outputs found
Performance comparison of point and spatial access methods
In the past few years a large number of multidimensional point access methods, also called
multiattribute index structures, has been suggested, all of them claiming good performance. Since no
performance comparison of these structures under arbitrary (strongly correlated nonuniform, short
"ugly") data distributions and under various types of queries has been performed, database
researchers and designers were hesitant to use any of these new point access methods. As shown in
a recent paper, such point access methods are not only important in traditional database applications.
In new applications such as CAD/CIM and geographic or environmental information systems, access
methods for spatial objects are needed. As recently shown such access methods are based on point
access methods in terms of functionality and performance. Our performance comparison naturally
consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in
part I I spatial access methods for rectangles will be compared. In part I we present a survey and
classification of existing point access methods. Then we carefully select the following four methods
for implementation and performance comparison under seven different data files (distributions) and
various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called
the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the
BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is
robust under ugly data and queries. In part I I we compare spatial access methods for rectangles.
After presenting a survey and classification of existing spatial access methods we carefully selected
the following four methods for implementation and performance comparison under six different data
files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the
BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree.
This comparison is a first step towards a standardized testbed or benchmark. We offer our data and
query files to each designer of a new point or spatial access method such that he can run his
implementation in our testbed
Network dependence in multi-indexed data on international trade flows
Faced with the problem that conventional multidimensional fixed effects models only focus on unobserved heterogeneity, but ignore any potential cross-sectional dependence due to network interactions, we introduce a model of trade flows between countries over time that allows for network dependence in flows, based on sociocultural connectivity structures. We show that conventional multidimensional fixed effects model specifications exhibit cross-sectional dependence between countries that should be modeled to avoid simultaneity bias. Given that the source of network interaction is unknown, we propose a panel gravity model that examines multiplenetwork interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo estimation methods that produce a Monte Carlo integration estimate of the log-marginal likelihood that can be used for model comparison. Application of the model to a panel of trade flows points to network spillover effects, suggesting the presence of network dependence and biased estimates from conventional trade flow specifications. The most important sources of network dependence were found to be membership in trade organizations, historical colonial ties, common currency, and spatial proximity of countries.Series: Working Papers in Regional Scienc
Mixed Tree and Spatial Representation of Dissimilarity Judgments
Whereas previous research has shown that either tree or spatial representations of dissimilarity judgments may be appropriate, focussing on the comparative fit at the aggregate level, we investigate whether there is heterogeneity among subjects in the extent to which their dissimilarity judgments are better represented by ultrametric tree or spatial multidimensional scaling models. We develop a mixture model for the analysis of dissimilarity data, that is formulated in a stochastic context, and entails a representation and a measurement model component. The latter involves distributional assumptions on the measurement error, and enables estimation by maximum likelihood. The representation component allows dissimilarity judgments to be represented either by a tree structure or by a spatial configuration, or a mixture of both. In order to investigate the appropriateness of tree versus spatial representations, the model is applied to twenty empirical data sets. We compare the fit of our model with that of aggregate tree and spatial models, as well as with mixtures of pure trees and mixtures of pure spaces, respectively. We formulate some empirical generalizations on the relative importance of tree versus spatial structures in representing dissimilarity judgments at the individual level.Multidimensional scaling;tree models;mixture models;dissimilarity judgments
Data Management and Mining in Astrophysical Databases
We analyse the issues involved in the management and mining of astrophysical
data. The traditional approach to data management in the astrophysical field is
not able to keep up with the increasing size of the data gathered by modern
detectors. An essential role in the astrophysical research will be assumed by
automatic tools for information extraction from large datasets, i.e. data
mining techniques, such as clustering and classification algorithms. This asks
for an approach to data management based on data warehousing, emphasizing the
efficiency and simplicity of data access; efficiency is obtained using
multidimensional access methods and simplicity is achieved by properly handling
metadata. Clustering and classification techniques, on large datasets, pose
additional requirements: computational and memory scalability with respect to
the data size, interpretability and objectivity of clustering or classification
results. In this study we address some possible solutions.Comment: 10 pages, Late
- âŠ