144,756 research outputs found
A Data-driven, High-performance and Intelligent CyberInfrastructure to Advance Spatial Sciences
abstract: In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation data about earthquake, climate, ocean, hydrology, volcano, glacier, etc., are being collected on a daily basis by a wide range of organizations. In addition to the observation data, human-generated data including microblogs, photos, consumption records, evaluations, unstructured webpages and other Volunteered Geographical Information (VGI) are incessantly generated and shared on the Internet.
Meanwhile, the emerging cyberinfrastructure rapidly increases our capacity for handling such massive data with regard to data collection and management, data integration and interoperability, data transmission and visualization, high-performance computing, etc. Cyberinfrastructure (CI) consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs that are not otherwise possible.
The Geospatial CI (GCI, or CyberGIS), as the synthesis of CI and GIScience has inherent advantages in enabling computationally intensive spatial analysis and modeling (SAM) and collaborative geospatial problem solving and decision making.
This dissertation is dedicated to addressing several critical issues and improving the performance of existing methodologies and systems in the field of CyberGIS. My dissertation will include three parts: The first part is focused on developing methodologies to help public researchers find appropriate open geo-spatial datasets from millions of records provided by thousands of organizations scattered around the world efficiently and effectively. Machine learning and semantic search methods will be utilized in this research. The second part develops an interoperable and replicable geoprocessing service by synthesizing the high-performance computing (HPC) environment, the core spatial statistic/analysis algorithms from the widely adopted open source python package – Python Spatial Analysis Library (PySAL), and rich datasets acquired from the first research. The third part is dedicated to studying optimization strategies for feature data transmission and visualization. This study is intended for solving the performance issue in large feature data transmission through the Internet and visualization on the client (browser) side.
Taken together, the three parts constitute an endeavor towards the methodological improvement and implementation practice of the data-driven, high-performance and intelligent CI to advance spatial sciences.Dissertation/ThesisDoctoral Dissertation Geography 201
Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis
Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter
The Second Competition on Spatial Statistics for Large Datasets
In the last few decades, the size of spatial and spatio-temporal datasets in
many research areas has rapidly increased with the development of data
collection technologies. As a result, classical statistical methods in spatial
statistics are facing computational challenges. For example, the kriging
predictor in geostatistics becomes prohibitive on traditional hardware
architectures for large datasets as it requires high computing power and memory
footprint when dealing with large dense matrix operations. Over the years,
various approximation methods have been proposed to address such computational
issues, however, the community lacks a holistic process to assess their
approximation efficiency. To provide a fair assessment, in 2021, we organized
the first competition on spatial statistics for large datasets, generated by
our {\em ExaGeoStat} software, and asked participants to report the results of
estimation and prediction. Thanks to its widely acknowledged success and at the
request of many participants, we organized the second competition in 2022
focusing on predictions for more complex spatial and spatio-temporal processes,
including univariate nonstationary spatial processes, univariate stationary
space-time processes, and bivariate stationary spatial processes. In this
paper, we describe in detail the data generation procedure and make the
valuable datasets publicly available for a wider adoption. Then, we review the
submitted methods from fourteen teams worldwide, analyze the competition
outcomes, and assess the performance of each team
Hybrid Rendering: Enabling Interactivity in a Distributed Post-Processing Environment
The ever increasing compute capacity of high performance computing (HPC)
systems enables scientists to simulate and explore physical phenomena with an
enormous spatial and temporal accuracy. On the other hand, this accuracy leads
to datasets of many terabytes, petabytes, and even exabytes envisioning the up-
coming exascale area projected for 2018. To understand complex physical
coherences behind such a simulation, an efficient analysis and visualization is
essential but also difficult, since the challenges concern all stages of the visual-
ization pipeline. With this presentation we set the focus on distributed and
hybrid rendering
Learning to Zoom and Unzoom
Many perception systems in mobile computing, autonomous navigation, and AR/VR
face strict compute constraints that are particularly challenging for
high-resolution input images. Previous works propose nonuniform downsamplers
that "learn to zoom" on salient image regions, reducing compute while retaining
task-relevant image information. However, for tasks with spatial labels (such
as 2D/3D object detection and semantic segmentation), such distortions may harm
performance. In this work (LZU), we "learn to zoom" in on the input image,
compute spatial features, and then "unzoom" to revert any deformations. To
enable efficient and differentiable unzooming, we approximate the zooming warp
with a piecewise bilinear mapping that is invertible. LZU can be applied to any
task with 2D spatial input and any model with 2D spatial features, and we
demonstrate this versatility by evaluating on a variety of tasks and datasets:
object detection on Argoverse-HD, semantic segmentation on Cityscapes, and
monocular 3D object detection on nuScenes. Interestingly, we observe boosts in
performance even when high-resolution sensor data is unavailable, implying that
LZU can be used to "learn to upsample" as well.Comment: CVPR 2023. Code and additional visuals available at
https://tchittesh.github.io/lzu
Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains
We introduce a class of scalable Bayesian hierarchical models for the
analysis of massive geostatistical datasets. The underlying idea combines ideas
on high-dimensional geostatistics by partitioning the spatial domain and
modeling the regions in the partition using a sparsity-inducing directed
acyclic graph (DAG). We extend the model over the DAG to a well-defined spatial
process, which we call the Meshed Gaussian Process (MGP). A major contribution
is the development of a MGPs on tessellated domains, accompanied by a Gibbs
sampler for the efficient recovery of spatial random effects. In particular,
the cubic MGP (Q-MGP) can harness high-performance computing resources by
executing all large-scale operations in parallel within the Gibbs sampler,
improving mixing and computing time compared to sequential updating schemes.
Unlike some existing models for large spatial data, a Q-MGP facilitates massive
caching of expensive matrix operations, making it particularly apt in dealing
with spatiotemporal remote-sensing data. We compare Q-MGPs with large synthetic
and real world data against state-of-the-art methods. We also illustrate using
Normalized Difference Vegetation Index (NDVI) data from the Serengeti park
region to recover latent multivariate spatiotemporal random effects at millions
of locations. The source code is available at https://github.com/mkln/meshgp
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data
In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms
- …