211 research outputs found

    Multiple Query Optimization For Data Analysis Applications on Clusters of SMPs

    Get PDF
    This paper is concerned with the efficient execution of multiple query workloads on a cluster of SMPs. We target applications that access and manipulate large scientific datasets. Queries in these applications involve user-defined processing operations on data and distributed data structures to hold intermediate and final results. Our goal is to implement system components to leverage previously computed query results and to effectively utilize processing power and aggregated I/O bandwidth on SMP nodes so that both single queries and multi-query batches can be efficiently executed. (Also referenced as UMIACS-TR-2001-78

    Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing

    Get PDF
    Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades. Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics. However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU. First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing. Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community. Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope

    GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Clusters, grids, and peer-to-peer (P2P) networks have emerged as popular paradigms for next generation parallel and distributed computing. The management of resources and scheduling of applications in such large-scale distributed systems is a complex undertaking. In order to prove the effectiveness of resource brokers and associated scheduling algorithms, their performance needs to be evaluated under different scenarios such as varying number of resources and users with different requirements. In a grid environment, it is hard and even impossible to perform scheduler performance evaluation in a repeatable and controllable manner as resources and users are distributed across multiple organizations with their own policies. To overcome this limitation, we have developed a Java-based discrete-event grid simulation toolkit called GridSim. The toolkit supports modeling and simulation of heterogeneous grid resources (both time- and space-shared), users and application models. It provides primitives for creation of application tasks, mapping of tasks to resources, and their management. To demonstrate suitability of the GridSim toolkit, we have simulated a Nimrod-G like grid resource broker and evaluated the performance of deadline and budget constrained cost- and time-minimization scheduling algorithms

    The Virginia Tech Computational Grid: A Research Agenda

    Get PDF
    An important goal of grid computing is to apply the rapidly expanding power of distributed computing resources to large-scale multidisciplinary scientic problem solving. Developing a usable computational grid for Virginia Tech is desirable from many perspectives. It leverages distinctive strengths of the university, can help meet the research computing needs of users with the highest demands, and will generate many challenging computer science research questions. By deploying a campus-wide grid and demonstrating its effectiveness for real applications, the Grid Computing Research Group hopes to gain valuable experience and contribute to the grid computing community. This report describes the needs and advantages which characterize the Virginia Tech context with respect to grid computing, and summarizes several current research projects which will meet those needs

    Identifying Crisis Response Communities in Online Social Networks for Compound Disasters: The Case of Hurricane Laura and Covid-19

    Full text link
    Online social networks allow different agencies and the public to interact and share the underlying risks and protective actions during major disasters. This study revealed such crisis communication patterns during hurricane Laura compounded by the COVID-19 pandemic. Laura was one of the strongest (Category 4) hurricanes on record to make landfall in Cameron, Louisiana. Using the Application Programming Interface (API), this study utilizes large-scale social media data obtained from Twitter through the recently released academic track that provides complete and unbiased observations. The data captured publicly available tweets shared by active Twitter users from the vulnerable areas threatened by Laura. Online social networks were based on user influence feature ( mentions or tags) that allows notifying other users while posting a tweet. Using network science theories and advanced community detection algorithms, the study split these networks into twenty-one components of various sizes, the largest of which contained eight well-defined communities. Several natural language processing techniques (i.e., word clouds, bigrams, topic modeling) were applied to the tweets shared by the users in these communities to observe their risk-taking or risk-averse behavior during a major compounding crisis. Social media accounts of local news media, radio, universities, and popular sports pages were among those who involved heavily and interacted closely with local residents. In contrast, emergency management and planning units in the area engaged less with the public. The findings of this study provide novel insights into the design of efficient social media communication guidelines to respond better in future disasters
    corecore