211 research outputs found
Multiple Query Optimization For Data Analysis Applications on Clusters of SMPs
This paper is concerned with the efficient execution of multiple query
workloads on a cluster of SMPs. We target applications that access and
manipulate large scientific datasets. Queries in these applications involve
user-defined processing operations on data and distributed data structures
to hold intermediate and final results. Our goal is to implement system
components to leverage previously computed query results and to effectively
utilize processing power and aggregated I/O bandwidth on SMP nodes so that
both single queries and multi-query batches can be efficiently executed.
(Also referenced as UMIACS-TR-2001-78
Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing
Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades.
Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics.
However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU.
First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing.
Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis
applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm
outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community.
Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host
memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope
GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing
Clusters, grids, and peer-to-peer (P2P) networks have emerged as popular
paradigms for next generation parallel and distributed computing. The
management of resources and scheduling of applications in such large-scale
distributed systems is a complex undertaking. In order to prove the
effectiveness of resource brokers and associated scheduling algorithms, their
performance needs to be evaluated under different scenarios such as varying
number of resources and users with different requirements. In a grid
environment, it is hard and even impossible to perform scheduler performance
evaluation in a repeatable and controllable manner as resources and users are
distributed across multiple organizations with their own policies. To overcome
this limitation, we have developed a Java-based discrete-event grid simulation
toolkit called GridSim. The toolkit supports modeling and simulation of
heterogeneous grid resources (both time- and space-shared), users and
application models. It provides primitives for creation of application tasks,
mapping of tasks to resources, and their management. To demonstrate suitability
of the GridSim toolkit, we have simulated a Nimrod-G like grid resource broker
and evaluated the performance of deadline and budget constrained cost- and
time-minimization scheduling algorithms
The Virginia Tech Computational Grid: A Research Agenda
An important goal of grid computing is to apply the rapidly expanding power of distributed
computing resources to large-scale multidisciplinary scientic problem solving. Developing a usable computational grid for Virginia Tech is desirable from many perspectives. It leverages distinctive strengths of the university, can help meet the research computing needs of users with the highest demands, and will generate many challenging computer science research questions. By deploying a campus-wide grid and demonstrating its effectiveness for real applications, the Grid Computing Research Group hopes to gain valuable experience and contribute to the grid computing community. This report describes the needs and advantages which characterize the Virginia Tech context with respect to grid computing, and summarizes several current research projects which will meet those needs
Identifying Crisis Response Communities in Online Social Networks for Compound Disasters: The Case of Hurricane Laura and Covid-19
Online social networks allow different agencies and the public to interact
and share the underlying risks and protective actions during major disasters.
This study revealed such crisis communication patterns during hurricane Laura
compounded by the COVID-19 pandemic. Laura was one of the strongest (Category
4) hurricanes on record to make landfall in Cameron, Louisiana. Using the
Application Programming Interface (API), this study utilizes large-scale social
media data obtained from Twitter through the recently released academic track
that provides complete and unbiased observations. The data captured publicly
available tweets shared by active Twitter users from the vulnerable areas
threatened by Laura. Online social networks were based on user influence
feature ( mentions or tags) that allows notifying other users while posting a
tweet. Using network science theories and advanced community detection
algorithms, the study split these networks into twenty-one components of
various sizes, the largest of which contained eight well-defined communities.
Several natural language processing techniques (i.e., word clouds, bigrams,
topic modeling) were applied to the tweets shared by the users in these
communities to observe their risk-taking or risk-averse behavior during a major
compounding crisis. Social media accounts of local news media, radio,
universities, and popular sports pages were among those who involved heavily
and interacted closely with local residents. In contrast, emergency management
and planning units in the area engaged less with the public. The findings of
this study provide novel insights into the design of efficient social media
communication guidelines to respond better in future disasters
- …