Search CORE

67 research outputs found

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Author: E-J Im
J Mellor-Crummey
M Krotkiewski
R Nishtala
Publication venue
Publication date: 05/02/2013
Field of study

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi's sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs

arXiv.org e-Print Archive

Crossref

Impact of Corporate Governance Practices on Firm Capital Structure and Profitability: A Study of Selected Hotels and Restaurant Companies in Sri Lanka.

Author: Crowl L. A.
Dibble P. C. (1953 - )
Gafter N. M. (1960 - )
LeBlanc T. J.
Mellor-Crummey J. M. (1962 - )
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/08/2013
Field of study

Corporate governance issues have been a growing area of management research especially among large and listed firms. Good corporate governance practices are regarded as important in reducing risk for investors, attracting investment capital and improving the performance of companies. Companies need financial resources and better earnings to promote their objectives. Therefore, factorsmay affect the capital structure and profitability of companies should be considered carefully. The purpose of the present study is to investigate whether there is any relationship among some specific characters of corporate governance, capital structure and profitability of listedHotels &Restaurant companies in Colombo Stock Exchange (CSE). To do so, 18 companies were selected from those which were listed inCSE during the 2007-2012. The ‘Board Composition(BC)’, ‘Board Size (BS)’ and ‘CEOduality (CEOD)’ were considered as independent variables, whereas,’ Debt Ratio(DR)’,‘Debt-to-Equity Ratio(DER)’,‘Returns on Equity(ROE)’,and ‘Return on Assets(ROA)’ as dependent variable. The results indicate a positive relationship between ‘BS; BC; CEOD; ROE; ROA and DERwhereas negative relationship between BS; BID and DR.in addition CEOD have a positive relationship with DR.In addition, none of the variables have a significant relationship with capital structure and profitability. Key words: Corporate Governance; Capital Structure and Profitability

UR Research

International Institute for Science, Technology and Education (IISTE): E-Journals

On the nature of progress

Author: G. Taubenfeld
H. Attiya
J. Aspnes
J. Mellor-Crummey
L. Lamport
M. Herlihy
M. Herlihy
M.P. Herlihy
N. Lynch
S. Heller
T.L. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

15th International Conference, OPODIS 2011, Toulouse, France, December 13-16, 2011. ProceedingsWe identify a simple relationship that unifies seemingly unrelated progress conditions ranging from the deadlock-free and starvation-free properties common to lock-based systems, to non-blocking conditions such as obstruction-freedom, lock-freedom, and wait-freedom. Properties can be classified along two dimensions based on the demands they make on the operating system scheduler. A gap in the classification reveals a new non-blocking progress condition, weaker than obstruction-freedom, which we call clash-freedom. The classification provides an intuitively-appealing explanation why programmers continue to devise data structures that mix both blocking and non-blocking progress conditions. It also explains why the wait-free property is a natural basis for the consensus hierarchy: a theory of shared-memory computation requires an independent progress condition, not one that makes demands of the operating system scheduler

CiteSeerX

DSpace@MIT

Crossref

Barrier elision for production parallel programs

Author: Chabbi M
De Jong W
Iancu C
Lavrijsen W
Mellor-Crummey J
Sen K
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem - a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack

Crossref

eScholarship - University of California

Pessimistic Software Lock-Elision

Author: A. Adl-Tabatabai
C. Fetzer
D. Dice
H. Attiya
H. Attiya
I. Keidar
J. Mellor-Crummey
M. Kapalka
M. Spear
T. Harris
T. Riegel
T. Shpeisman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes. This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with pessimistic (i.e. non-aborting) software transactional code that allows read-write concurrency even for contended code and even if the code includes system calls. On systems with hardware transactional support, PLE will allow failed transactions, or ones that contain system calls, to preserve read-write concurrency. Our PLE algorithm is based on a novel encounter-order design of a fully pessimistic STM system that in a variety of benchmarks spanning from counters to trees, even when up to 40% of calls are mutating the locked structure, provides up to 5 times the performance of a state-of-the-art read-write lock.National Science Foundation (U.S.) (Grant 1217921

CiteSeerX

DSpace@MIT

Crossref

Efficient Symmetry Reduction and the Use of State Symmetries for Symbolic Model Checking

Author: A. Emerson
A. Miller
A. P. Sistla
A. Pnueli
A. Pnueli
Allen Emerson
Amir Pnueli
Angelo Montanari
C. N. Ip
C. N. Ip
Christian Appold
Christian Appold
E. A. Emerson
E. A. Emerson
E. M. Clarke
E. M. Clarke
E.A. Emerson
F. Somenzi
G. L. Peterson
I.-H. Moon
J. M. Mellor-Crummey
J. R. Burch
J.-P. Queille
M. Ben-Ari
Margherita Napoli
Mimmo Parente
T. Wahl
V. Gyuris
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2010
Field of study

One technique to reduce the state-space explosion problem in temporal logic model checking is symmetry reduction. The combination of symmetry reduction and symbolic model checking by using BDDs suffered a long time from the prohibitively large BDD for the orbit relation. Dynamic symmetry reduction calculates representatives of equivalence classes of states dynamically and thus avoids the construction of the orbit relation. In this paper, we present a new efficient model checking algorithm based on dynamic symmetry reduction. Our experiments show that the algorithm is very fast and allows the verification of larger systems. We additionally implemented the use of state symmetries for symbolic symmetry reduction. To our knowledge we are the first who investigated state symmetries in combination with BDD based symbolic model checking

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Author: D. Lenoski
D. Wentzlaff
Dimitrios S. Nikolopoulos
H. Shan
I. Schoinas
J. Leverich
J.A. Kahle
J.M. Mellor-Crummey
K. Gharachorloo
M. Wen
M.M.K. Martin
Manolis Katevenis
Michail Zampetakis
P.S. Magnusson
S.L. Scott
S.P. Amarasinghe
S.W. Keckler
Stamatis Kavadias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

Fast, contention-free combining tree barriers for shared-memory multiprocessors

Author: D. Hensgen
E. D. Brooks III
G. Graunke
J. M. Mellor-Crummery
John M. Mellor-Crummey
M. Herlihy
Michael L. Scott
P. L. Lehman
P.-C Yew
R. Gupta
T. E. Anderson
Y. Sagiv
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Efficient Data Race Detection for Async-Finish Parallelism

Author: C. Flanagan
C. Sadowski
D. Lea
D. Leijen
E.A. Lee
J. Mellor-Crummey
J.-D. Choi
J.K. Lee
M. Feng
R. Barik
R. Barik
R.D. Blumofe
S. Agarwal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract. A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, includ-ing determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10 × or larger) for use in mainstream software development. In this paper, we present an efficient dynamic race detector algorithm targeting the async-finish task-parallel parallel programming model. The async and finish constructs are at the core of languages such as X10 and Habanero Java (HJ). These constructs generalize the spawn-sync constructs used in Cilk, while still ensuring that all computation graphs are deadlock-free. We have implemented our algorithm in a tool called TASKCHECKER and eval-uated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experi-mental results indicate that our approach performs well in practice, incurring an average slowdown of 3.05 × compared to a serial execution in the optimized case.

CiteSeerX

Crossref

High-throughput sequence alignment using Graphics Processing Units

Author: AL Delcher
AL Delcher
Amitabh Varshney
Arthur L Delcher
C Shaffer
Cole Trapnell
D Gusfield
E Ukkonen
EW Myers
I Buck
J Mellor-Crummey
JD Owens
M Brudno
M Charalambous
M Hohl
M Pop
Michael C Schatz
MJ Harris
NK Govindaraju
nVidia
P Weiner
S Kurtz
S Kurtz
SF Atschul
W Liu
W Pearson
WJ Dally
Y Juekuan
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and <it>de novo </it>genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.</p

CiteSeerX

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland