Search CORE

5,763 research outputs found

Recent Advances in Graph Partitioning

Author: A Buluç
A Felner
A George
A Lisser
A Pothen
A Trifunović
AB Kahng
AE Feldmann
AH Land
AJ Soper
B Brandfass
B Hendrickson
B Hendrickson
B Hendrickson
B Junker
B Monien
B Peng
BW Kernighan
C Aykanat
C Chevalier
C Chevalier
C Farhat
C Lanczos
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
CE Bichot
CE Ferreira
D Delling
D Delling
D Delling
D Drake
D Luxen
D Ron
D Ron
D Wagner
DA Papa
DE Drake Vinkemeier
E Jeannot
E Rolland
F Comellas
F Glover
F Glover
F Pellegrini
F Pellegrini
F Pellegrini
F Schulz
FT Leighton
G Even
G Karypis
G Karypis
G Karypis
G Zumbusch
H Li
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
HD Simon
HD Simon
I Moulitsas
I Safro
I Safro
J Chen
J Cong
J Fietz
J Hromkovič
J Hungershöfer
J Maue
J Maue
J Shalf
JR Gilbert
K Andreev
K Lang
K Schloegel
K Schloegel
K Schloegel
KS Camilus
L Brunetta
L Grady
L Lovász
LA Sanchis
LR Ford
M Armbruster
M Bader
M Birn
M Fiedler
M Jerrum
M Newman
M Sellmann
M Zhou
MR Garey
N Sensen
O Goldschmidt
P Chardaire
P Galinier
P Korosec
P Sanders
P Sanders
R Diekmann
R Diekmann
R Glantz
R Preis
RD Williams
S Arora
S Huang
S Lafon
S Lloyd
S Pettie
SE Karisch
SY Chan
T Bui
T Kieritz
U Benlic
U Benlic
U Feige
V Osipov
WE Donath
WE Donath
WW Hager
WW Hager
X Sui
Y Low
YM Kim
Ü Çatalyürek
Publication venue
Publication date: 03/02/2015
Field of study

We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Parallelization of irregularly coupled regular meshes

Author: Chase Craig
Crowley Kay
Reeves Anthony
Saltz Joel
Publication venue
Publication date
Field of study

Regular meshes are frequently used for modeling physical phenomena on both serial and parallel computers. One advantage of regular meshes is that efficient discretization schemes can be implemented in a straight forward manner. However, geometrically-complex objects, such as aircraft, cannot be easily described using a single regular mesh. Multiple interacting regular meshes are frequently used to describe complex geometries. Each mesh models a subregion of the physical domain. The meshes, or subdomains, can be processed in parallel, with periodic updates carried out to move information between the coupled meshes. In many cases, there are a relatively small number (one to a few dozen) subdomains, so that each subdomain may also be partitioned among several processors. We outline a composite run-time/compile-time approach for supporting these problems efficiently on distributed-memory machines. These methods are described in the context of a multiblock fluid dynamics problem developed at LaRC

NASA Technical Reports Server

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

High-performance computing for vision

Author: Bhat PB
Präs Anna VK
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio

HKU Scholars Hub

Parallel Aerodynamic Simulation on Open Workstation Clusters. Department of Aerospace Engineering Report no. 9830

Author: Badcock K.
Gribben B.J.
Richards B.E.
Publication venue: Department of Aerospace Engineering, University of Glasgow
Publication date: 01/10/1998
Field of study

The parallel execution of an aerodynamic simulation code on a non-dedicated, heterogeneous cluster of workstations is examined. This type of facility is commonly available to CFD developers and users in academia, industry and government laboratories and is attractive in terms of cost for CFD simulations. However, practical considerations appear at present to be discouraging widespread adoption of this technology. The main obstacles to achieving an efficient, robust parallel CFD capability in a demanding multi-user environment are investigated. A static load-balancing method, which takes account of varying processor speeds, is described. A dynamic re-allocation method to account for varying processor loads has been developed. Use of proprietary management software has facilitated the implementation of the method

Enlighten

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line