Search CORE

285,275 research outputs found

Efficient Multi-way Theta-Join Processing Using MapReduce

Author: Chen Lei
Wang Min
Zhang Xiaofei
Publication venue
Publication date: 01/01/2012
Field of study

Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Optimizing energy-efficiency for multi-core packet processing systems in a compiler framework

Author: Huang Jing
Publication venue: Dublin City University. Research Institute for Networks and Communications Engineering (RINCE)
Publication date: 01/11/2012
Field of study

Network applications become increasingly computation-intensive and the amount of traffic soars unprecedentedly nowadays. Multi-core and multi-threaded techniques are thus widely employed in packet processing system to meet the changing requirement. However, the processing power cannot be fully utilized without a suitable programming environment. The compilation procedure is decisive for the quality of the code. It can largely determine the overall system performance in terms of packet throughput, individual packet latency, core utilization and energy efficiency. The thesis investigated compilation issues in networking domain first, particularly on energy consumption. And as a cornerstone for any compiler optimizations, a code analysis module for collecting program dependency is presented and incorporated into a compiler framework. With that dependency information, a strategy based on graph bi-partitioning and mapping is proposed to search for an optimal configuration in a parallel-pipeline fashion. The energy-aware extension is specifically effective in enhancing the energy-efficiency of the whole system. Finally, a generic evaluation framework for simulating the performance and energy consumption of a packet processing system is given. It accepts flexible architectural configuration and is capable of performingarbitrary code mapping. The simulation time is extremely short compared to full-fledged simulators. A set of our optimization results is gathered using the framework

Irish Universities

DCU Online Research Access Service

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Gunrock: GPU Graph Analytics

Author: Davidson Andrew
Liu Weitang
Osama Muhammad
Owens John D.
Pan Yuechao
Riffel Andy T.
Wang Leyuan
Wang Yangzihao
Wu Yuduo
Yang Carl
Yuan Chenshan
Publication venue
Publication date: 04/01/2017
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Raster data structures and topographic data

Author: Adams Timothy A.
Publication venue
Publication date: 01/01/1982
Field of study

The use of computers to assist in map-making has been growing for two decades; their speed of operation, large data storage capacity and flexibility of usage have been major factors in establishing many development and working computer mapping systems throughout the world. In Britain, the Ordnance Survey has supported a digital solution to the production, storage and display of large scale topographic maps since 1972. Until now, the work of the Ordnance Survey - and, indeed, most topographic map-makers in Britain who are investigating digital techniques - have adopted a vector-based strategy to digital mapping in which the data are held as a series of coordinate-points describing the lines shown on the map images. Comparatively little work has been undertaken in Britain on the use of raster-based methods of data capture and storage in which map images are resolved into arrays of small cells or picture elements by appropriately tuned scanning devices. This alternative strategy is known - from work carried out in other countries, chiefly the United States - to be suitable for some types of data manipulation, although its suitability for Ordnance Survey mapping applications is unknown. Very little investigation has been made anywhere in the world of the manipulation of raster data structures by the recently developed array processor computers; almost all existing work is restricted to the use of traditional serial machines. This thesis reports on a three year study carried out in the University of Durham to investigate the applicability of raster data processing for the work of the British national mapping organisation. In particular, it describes the distinction between vector and raster applications with geographic data and the likely characteristics of suitable raster data structures on both serial and parallel computers. A section is also included which describes the nature of scanning trials carried out on a number of commercial devices; it has thus been possible to assess not only the likely advantages and limitations of handling British large-scale map data in raster form but also its technical feasibility. The work reports on the likely volumes of data to be expected and describes parallel algorithms for operations such as polygon creation (and, indirectly, the creation of node and link vector files)

Durham e-Theses

Recommended from our members

A mapping strategy for MIMD computers

Author: Bic Lubomir
Nicolau Alexandru
Yang Jiyuan
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

In this paper, a heuristic mapping approach which maps parallel programs, described by precedence graphs, to MIMD architectures, described by system graphs, is presented. The complete execution time of a parallel program is used as a measure, and the concept of critical edges is utilized as the heuristic to guide the search for a better initial assignment and subsequent refinement. An important feature is the use of a termination condition of the refinement process. This is based on deriving a lower bound on the total execution time of the mapped program. When this has been reached, no further refinement steps are necessary. The algorithms have been implemented and applied to the mapping of random problem graphs to various system topologies, including hypercubes, meshes, and random graphs. The results show reductions in execution times of the mapped programs of up to 77 percent over random mapping

eScholarship - University of California

A Logical Model and Data Placement Strategies for MEMS Storage Devices

Author: Kim Min-Soo
Kim Yi-Reun
Song Il-Yeol
Whang Kyu-Young
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 29/07/2008
Field of study

MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility: freely selecting a set of probe tips for accessing data, (2) parallelism: simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.Comment: 37 page

arXiv.org e-Print Archive

Crossref