279 research outputs found
Performance comparison of point and spatial access methods
In the past few years a large number of multidimensional point access methods, also called
multiattribute index structures, has been suggested, all of them claiming good performance. Since no
performance comparison of these structures under arbitrary (strongly correlated nonuniform, short
"ugly") data distributions and under various types of queries has been performed, database
researchers and designers were hesitant to use any of these new point access methods. As shown in
a recent paper, such point access methods are not only important in traditional database applications.
In new applications such as CAD/CIM and geographic or environmental information systems, access
methods for spatial objects are needed. As recently shown such access methods are based on point
access methods in terms of functionality and performance. Our performance comparison naturally
consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in
part I I spatial access methods for rectangles will be compared. In part I we present a survey and
classification of existing point access methods. Then we carefully select the following four methods
for implementation and performance comparison under seven different data files (distributions) and
various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called
the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the
BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is
robust under ugly data and queries. In part I I we compare spatial access methods for rectangles.
After presenting a survey and classification of existing spatial access methods we carefully selected
the following four methods for implementation and performance comparison under six different data
files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the
BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree.
This comparison is a first step towards a standardized testbed or benchmark. We offer our data and
query files to each designer of a new point or spatial access method such that he can run his
implementation in our testbed
Multi-Dimensional Joins
We present three novel algorithms for performing multi-dimensional
joins and an in-depth survey and analysis of a low-dimensional
spatial join. The first algorithm, the Iterative Spatial Join,
performs a spatial join on low-dimensional data and is based
on a plane-sweep technique.
As we show analytically and experimentally,
the Iterative Spatial Join performs well when internal memory is
limited, compared to competing methods. This suggests that
the Iterative Spatial Join would be useful for very large data sets
or in situations where internal memory is a shared resource and
is therefore limited, such as with today's database engines which
share internal memory amongst several queries. Furthermore, the
performance of the Iterative Spatial Join is predictable and has
no parameters which need to be tuned, unlike other algorithms.
The second algorithm, the Quickjoin algorithm,
performs a higher-dimensional
similarity join in which pairs of objects that lie within a
certain distance epsilon of each other are reported.
The Quickjoin algorithm overcomes drawbacks of competing methods,
such as requiring embedding methods on the data first or using
multi-dimensional indices, which limit
the ability to discriminate between objects in each
dimension, thereby degrading performance.
A formal analysis is provided of the Quickjoin method, and
experiments show that the Quickjoin method significantly outperforms
competing methods.
The third algorithm adapts
incremental join techniques to improve the
speed of calculating the Hausdorff distance, which
is used in applications such as image matching, image analysis,
and surface approximations.
The nearest neighbor incremental join technique for indices that
are based on hierarchical containment use a priority queue
of index node pairs and bounds on the distance values between
pairs, both of which need to modified in order to calculate the
Hausdorff distance. Results of experiments are described that
confirm the performance improvement.
Finally, a survey is provided which
instead of just summarizing the literature and presenting each
technique in its entirety, describes distinct components of
the different techniques, and each technique is decomposed into
an overall framework for performing a spatial join
Learning from biophysical heterogeneity: inductive use of case studies for maize cropping systems in Central America
Global society has become conscious that efforts towards securing food production will only be successful if agricultural production increases are obtained through mechanisms that ensure active regeneration of the natural resource base. Production options should be targeted in the sense of that their suitability to improve agricultural production and maintain natural resources is evaluated prior to their introduction. Biophysical targeting evaluates production options as a function of the spatial and temporal variability of climate conditions, in interaction with soil, crop characteristics and agronomic management strategies. This thesis contributes to the development of a system-based methodology for biophysical targeting. Cropping system simulation and weather generator tools are interfaced to geographical information systems. Inductive use of two case studies - a green manure cover crop and reduced tillage with residue management - helped to develop the methodology. Insight is gained into the regional potential for and the soil and climate conditions under which successful introduction of these production options may be achieved. The resulting information supports regional stakeholders involved in agriculture in their analysis and discussion, negotiation and decision-making concerning where to implement production systems. This process can improve the supply of appropriate agricultural production practices that enhance production and conserve soil and water resources
An Evolutionary Approach to Adaptive Image Analysis for Retrieving and Long-term Monitoring Historical Land Use from Spatiotemporally Heterogeneous Map Sources
Land use changes have become a major contributor to the anthropogenic global change. The ongoing dispersion and concentration of the human species, being at their orders unprecedented, have indisputably altered Earth’s surface and atmosphere. The effects are so salient and irreversible that a new geological epoch, following the interglacial Holocene, has been announced: the Anthropocene. While its onset is by some scholars dated back to the Neolithic revolution, it is commonly referred to the late 18th century. The rapid development since the industrial revolution and its implications gave rise to an increasing awareness of the extensive anthropogenic land change and led to an urgent need for sustainable strategies for land use and land management. By preserving of landscape and settlement patterns at discrete points in time, archival geospatial data sources such as remote sensing imagery and historical geotopographic maps, in particular, could give evidence of the dynamic land use change during this crucial period.
In this context, this thesis set out to explore the potentials of retrospective geoinformation for monitoring, communicating, modeling and eventually understanding the complex and gradually evolving processes of land cover and land use change. Currently, large amounts of geospatial data sources such as archival maps are being worldwide made online accessible by libraries and national mapping agencies. Despite their abundance and relevance, the usage of historical land use and land cover information in research is still often hindered by the laborious visual interpretation, limiting the temporal and spatial coverage of studies. Thus, the core of the thesis is dedicated to the computational acquisition of geoinformation from archival map sources by means of digital image analysis. Based on a comprehensive review of literature as well as the data and proposed algorithms, two major challenges for long-term retrospective information acquisition and change detection were identified: first, the diversity of geographical entity representations over space and time, and second, the uncertainty inherent to both the data source itself and its utilization for land change detection.
To address the former challenge, image segmentation is considered a global non-linear optimization problem. The segmentation methods and parameters are adjusted using a metaheuristic, evolutionary approach. For preserving adaptability in high level image analysis, a hybrid model- and data-driven strategy, combining a knowledge-based and a neural net classifier, is recommended. To address the second challenge, a probabilistic object- and field-based change detection approach for modeling the positional, thematic, and temporal uncertainty adherent to both data and processing, is developed. Experimental results indicate the suitability of the methodology in support of land change monitoring. In conclusion, potentials of application and directions for further research are given
Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security
Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness
Advanced Biometrics with Deep Learning
Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
Formal extension of the relational model for the management of spatial and spatio-temporal data
[Resumen]
En los últioms años, se ha realizado un gran esfuerzo investigador en la manipulación de datos especiales y Sistemas de Información Geográfica (SIG).
Una clara limitación de las primeras aproximaciones es la falta de integración entre datos geográficos y alfanuméricos.
Para resolver esto surge el área de Bases de Datos Espaciales. Los problemas que aparecen en este campo son muchos y complejos. Un primer ejemplo son las peculiaridades de las operaciones espaciales, como el calculo de la intersección espacial de dos superficies. Otro ejemplo es el elegir las estructuras de datos apropiadas (relaciones, capas, etc.) y el conjunto de operaciones adeucado. La combinación con las Bases de Datos Temporales da lugar a las Bases de Datos Espacio-temporales, en las que la inclusión de la dimensión temporal complica más los problemas anteriores. A pesar de la gran cantidad de aproximaciones propuestas, no se ha llegado todavía a una solución satisfactoria.
La presente tesis propone una nueva solución que resuelve todos los problemas de modelado de datos espaciales y espacio-temporales resaltados arriba.
Parte del trabajo se completó durante el proyecto ""CHOROCRONOS"": A Research Network for Saptiotemporal Database Systems"", financiado por la Unión Europea.
El modelo propuesto en la tesis define tres tipos de dato punto, línea y superficie, que encajan perfectamente en la percepción humana. La definición de estos tipos de dato se basa en la definición previa de Quanta Espacial.
Las estructuras de datos usadas son las relaciones no anidadas de modelo relacional puro. El conjunto de operaciones relacionales permite alcanzar casi por completo la funcionalidad propuesta en otros modelos. Todas las operaciones han sido definidas en base a un núcleo reducido de operaciones primitvas. Todos los tipos de datos, espaciales, espacio-temporales y convencionales se manipulan de forma uniforme con este conjunto de operaciones
- …