Search CORE

191,214 research outputs found

Scalable big data systems: Architectures and optimizations

Author: Lee Kisung
Publication venue: Georgia Institute of Technology
Publication date: 22/08/2016
Field of study

Big data analytics has become not just a popular buzzword but also a strategic direction in information technology for many enterprises and government organizations. Even though many new computing and storage systems have been developed for big data analytics, scalable big data processing has become more and more challenging as a result of the huge and rapidly growing size of real-world data. Dedicated to the development of architectures and optimization techniques for scaling big data processing systems, especially in the era of cloud computing, this dissertation makes three unique contributions. First, it introduces a suite of graph partitioning algorithms that can run much faster than existing data distribution methods and inherently scale to the growth of big data. The main idea of these approaches is to partition a big graph by preserving the core computational data structure as much as possible to maximize intra-server computation and minimize inter-server communication. In addition, it proposes a distributed iterative graph computation framework that effectively utilizes secondary storage to maximize access locality and speed up distributed iterative graph computations. The framework not only considerably reduces memory requirements for iterative graph algorithms but also significantly improves the performance of iterative graph computations. Last but not the least, it establishes a suite of optimization techniques for scalable spatial data processing along with three orthogonal dimensions: (i) scalable processing of spatial alarms for mobile users traveling on road networks, (ii) scalable location tagging for improving the quality of Twitter data analytics and prediction accuracy, and (iii) lightweight spatial indexing for enhancing the performance of big spatial data queries.Ph.D

Scholarly Materials And Research @ Georgia Tech

Graph Database Solution for Higher Order Spatial Statistics in the Era of Big Data

Author: Hoyle Ben
Kim Juhan
Li Xiao-Dong
Sabiu Cristiano G.
Publication venue: 'American Astronomical Society'
Publication date: 02/01/2019
Field of study

We present an algorithm for the fast computation of the general

N

-point spatial correlation functions of any discrete point set embedded within an Euclidean space of

\mathbb{R}^n

. Utilizing the concepts of kd-trees and graph databases, we describe how to count all possible

N

-tuples in binned configurations within a given length scale, e.g. all pairs of points or all triplets of points with side lengths

<r_{max}

. Through bench-marking we show the computational advantage of our new graph based algorithm over more traditional methods. We show that all 3-point configurations up to and beyond the Baryon Acoustic Oscillation scale (

\sim

200 Mpc in physical units) can be performed on current SDSS data in reasonable time. Finally we present the first measurements of the 4-point correlation function of

\sim

0.5 million SDSS galaxies over the redshift range

0.43<z<0.7

.Comment: 9 pages, 8 figures, submitte

arXiv.org e-Print Archive

MPG.PuRe

SVS-JOIN : efficient spatial visual similarity join for geo-multimedia

Author: Huang Fang
Yu Hao
Yu Weiren
Zhang Chengyuan
Zhang Zuping
Zhu Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/10/2019
Field of study

In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently

Warwick Research Archives Portal Repository

Scalable model selection for spatial additive mixed modeling: application to crime analysis

Author: Kajita Mami
Kajita Seiji
Murakami Daisuke
Publication venue
Publication date: 30/09/2020
Field of study

A rapid growth in spatial open datasets has led to a huge demand for regression approaches accommodating spatial and non-spatial effects in big data. Regression model selection is particularly important to stably estimate flexible regression models. However, conventional methods can be slow for large samples. Hence, we develop a fast and practical model-selection approach for spatial regression models, focusing on the selection of coefficient types that include constant, spatially varying, and non-spatially varying coefficients. A pre-processing approach, which replaces data matrices with small inner products through dimension reduction dramatically accelerates the computation speed of model selection. Numerical experiments show that our approach selects the model accurately and computationally efficiently, highlighting the importance of model selection in the spatial regression context. Then, the present approach is applied to open data to investigate local factors affecting crime in Japan. The results suggest that our approach is useful not only for selecting factors influencing crime risk but also for predicting crime events. This scalable model selection will be key to appropriately specifying flexible and large-scale spatial regression models in the era of big data. The developed model selection approach was implemented in the R package spmoran

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

Author: Cai Jianfei
Liu Haitao
Ong Yew-Soon
Wang Yi
Publication venue
Publication date: 01/01/2018
Field of study

As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Studying the first galaxies with ALMA

Author: A. Beelen
A. Beelen
A. Omont
A. Venkatesan
A. Wootten
C. Carilli
C. L. Carilli
C. Lintott
C.N. Hao
D. Downes
E. Schinnerer
F. Bertoldi
F. Bertoldi
F. Walter
F. Walter
F. Walter
K. Gebhardt
K. Menten
L. Jiang
L.L. Cowie
M.S. Yun
P. Cox
R. Maiolino
R. Maiolino
R. Wang
R. White
S. Wyithe
T. Heckman
X. Fan
X. Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

We discuss observations of the first galaxies, within cosmic reionization, at centimeter and millimeter wavelengths. We present a summary of current observations of the host galaxies of the most distant QSOs (

z \sim 6

). These observations reveal the gas, dust, and star formation in the host galaxies on kpc-scales. These data imply an enriched ISM in the QSO host galaxies within 1 Gyr of the big bang, and are consistent with models of coeval supermassive black hole and spheroidal galaxy formation in major mergers at high redshift. Current instruments are limited to studying truly pathologic objects at these redshifts, meaning hyper-luminous infrared galaxies (

L_{FIR} \sim 10^{13}

_\odot

). ALMA will provide the one to two orders of magnitude improvement in millimeter astronomy required to study normal star forming galaxies (ie. Ly-

\alpha

emitters) at

z \sim 6

. ALMA will reveal, at sub-kpc spatial resolution, the thermal gas and dust -- the fundamental fuel for star formation -- in galaxies into cosmic reionization.Comment: to appear in Science with ALMA: a new era for Astrophysics}, ed. R. Bachiller (Springer: Berlin); 5 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Spatial information and the legibility of urban form: Big data in urban morphology

Author: Boeing Geoff
Publication venue: eScholarship, University of California
Publication date: 30/09/2019
Field of study

Urban planning and morphology have relied on analytical cartography and visual communication tools for centuries to illustrate spatial patterns, propose designs, compare alternatives, and engage the public. Classic urban form visualizations – from Giambattista Nolli’s ichnographic maps of Rome to Allan Jacobs’s figure-ground diagrams of city streets – have compressed physical urban complexity into easily comprehensible information artifacts. Today we can enhance these traditional workflows through the Smart Cities paradigm of understanding cities via user-generated content and harvested data in an information management context. New spatial technology platforms and big data offer new lenses to understand, evaluate, monitor, and manage urban form and evolution. This paper builds on the theoretical framework of visual cultures in urban planning and morphology to introduce and situate computational data science processes for exploring urban fabric patterns and spatial order. It demonstrates these workflows with OSMnx and data from OpenStreetMap, a collaborative spatial information system and mapping platform, to examine street network patterns, orientations, and configurations in different study sites around the world, considering what these reveal about the urban fabric. The age of ubiquitous urban data and computational toolkits opens up a new era of worldwide urban form analysis from integrated quantitative and qualitative perspectives

arXiv.org e-Print Archive

eScholarship - University of California