Search CORE

152 research outputs found

Doctor of Philosophy

Author: Sharma Subodh
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationMessage passing (MP) has gained a widespread adoption over the years, so much so, that even heterogeneous embedded multicore systems are running programs that are developed using message passing libraries. Such a phenomenon is a shift in computing practices, since, traditionally MP programs have been developed specifically for high performance computing. With growing importance and the complexity of MP programs in today's times, it becomes absolutely imperative to have formal tools and sound methodologies that can help reason about the correctness of the program. It has been demonstrated by many researchers in the area of concurrent program verification that a suitable strategy to verify programs which rely heavily on nondeterminism, is dynamic verification. Dynamic verification integrates the best features of testing and model checking. In the area of MP program verification, however, there have been only a handful of dynamic verifiers. These dynamic verifiers, despite their strengths, suffer from the explosion in execution scenarios. All existing dynamic verifiers, to our knowledge, exhaustively explore the nondeterministic choices in an MP program. It is apparent that an MP program with many nondeterministic constructs will quickly inundate such tools. This dissertation focuses on the problem of containing the exponential space of execution scenarios (or interleavings) while providing a soundness and completeness guarantee over safety properties of MP programs (specifically deadlocks). We present a predictive verification methodology and an associated framework, called MAAPED(Messaging Application Analysis with Predictive Error Discovery), that operates in polynomial time over MP programs to detect deadlocks among other safety property violations. In brief, we collect a single execution trace of an MP program and without re-running other execution schedules, reliably construct the artifacts necessary to predict any mishappening in an unexplored execution schedule with the aforementioned formal guarantee. The main contributions of the thesis are the following: The Functionally Irrelevant Barrier Algorithm to increase program productivity and ease in verification complexity. A sound pragmatic strategy to reduce the interleaving space of existing dynamic verifiers which is complete only for a certain class of MPI programs. A generalized matches-before ordering for MP programs. A predictive polynomial time verification framework as an alternate solution in the dynamic MP verification landscape. A soundness and completeness proof for the predictive framework's deadlock detection strategy for many formally characterized classes of MP programs. In the process of developing solutions that are mentioned above, we also collected important experiences relating to the development of dynamic verification schedulers. We present those experiences as a minor contribution of this thesis

The University of Utah: J. Willard Marriott Digital Library

Addressing Application Bottlenecks: Distributed Memory

Author: Alexander Supalov
Andrey Semin
Christopher Dahnken
Michael Klemm
Publication venue: Apress
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Application of satellite data in observational and theoretical studies of the evolving structure of baroclinic waves

Author: Saltzman Barry
Publication venue
Publication date
Field of study

A variety of observational and theoretical studies were performed which were designed to clarify the relationship between satellite measurements of cloud and radiation and the evolution of transient and stationary circulation in middle latitudes. Satellite outgoing longwave radiation data are used to: (1) estimate the generation of available potential energy due to infrared radiation, and (2) show the extent to which these data can provide the signature of high and low frequency weather phenomena including blocking. In a significant series of studies the nonlinear, energetical, and predictability properties of these blocking situations, and the ralationship of blocking to the planetary, scale longwave structure are described. These studies form the background for continuing efforts to describe and theoretically account for these low frequency planetary wave phenomena in terms of their bimodal properties

NASA Technical Reports Server

Shortest Path versus Multi-Hub Routing in Networks with Uncertain Demand

Author: Fréchette Alexandre
Shepherd F. Bruce
Thottan Marina K.
Winzer Peter J.
Publication venue
Publication date: 27/02/2013
Field of study

We study a class of robust network design problems motivated by the need to scale core networks to meet increasingly dynamic capacity demands. Past work has focused on designing the network to support all hose matrices (all matrices not exceeding marginal bounds at the nodes). This model may be too conservative if additional information on traffic patterns is available. Another extreme is the fixed demand model, where one designs the network to support peak point-to-point demands. We introduce a capped hose model to explore a broader range of traffic matrices which includes the above two as special cases. It is known that optimal designs for the hose model are always determined by single-hub routing, and for the fixed- demand model are based on shortest-path routing. We shed light on the wider space of capped hose matrices in order to see which traffic models are more shortest path-like as opposed to hub-like. To address the space in between, we use hierarchical multi-hub routing templates, a generalization of hub and tree routing. In particular, we show that by adding peak capacities into the hose model, the single-hub tree-routing template is no longer cost-effective. This initiates the study of a class of robust network design (RND) problems restricted to these templates. Our empirical analysis is based on a heuristic for this new hierarchical RND problem. We also propose that it is possible to define a routing indicator that accounts for the strengths of the marginals and peak demands and use this information to choose the appropriate routing template. We benchmark our approach against other well-known routing templates, using representative carrier networks and a variety of different capped hose traffic demands, parameterized by the relative importance of their marginals as opposed to their point-to-point peak demands

arXiv.org e-Print Archive

Crossref

Switching considerations in storage networks.

Author
Publication venue
Publication date: 01/01/2003
Field of study

by Leung Yiu Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-98).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Thesis Organization --- p.3Chapter 2. --- Storage Network Fundamentals --- p.4Chapter 2.1 --- Storage Network Topology --- p.4Chapter 2.1.1 --- Direct Attached Storage (DAS) --- p.5Chapter 2.1.2 --- Network Attached Storage (NAS) --- p.7Chapter 2.1.3 --- Storage Area Network (SAN) --- p.9Chapter 2.1.3.1 --- SAN and the Fibre Channel Protocol --- p.11Chapter 2.1.4 --- Summary on Storage Network Topology --- p.12Chapter 2.2 --- Storage Protocol --- p.15Chapter 2.2.1 --- Fibre Channel --- p.15Chapter 2.2.1.1 --- Fibre Channel over IP (FCIP) --- p.17Chapter 2.2.1.2 --- Internet Fibre Channel Protocol (iFCP) --- p.19Chapter 2.2.2 --- Internet SCSI (iSCSI) --- p.20Chapter 2.2.3 --- InfiniBand --- p.22Chapter 2.2.4 --- Review on Storage Network Protocol --- p.25Chapter 2.3 --- Standard Organization --- p.27Chapter 2.4 --- Summary --- p.28Chapter 3. --- Switching Design for Storage Networks --- p.30Chapter 3.1. --- Shared Bus Design --- p.32Chapter 3.2. --- Time Division Switch --- p.36Chapter 3.3. --- Share Buffer Memory Switch --- p.37Chapter 3.3.1 --- Parallel Memory Array --- p.40Chapter 3.3.2 --- Distributive Storage --- p.43Chapter 3.4. --- Crossbar Switch --- p.45Chapter 3.4.1 --- Arbitrated Crossbar vs. Buffered Crossbar --- p.46Chapter 3.4.1.1 --- Arbitrated Crossbar Switch --- p.47Chapter 3.4.1.2 --- Buffered Crossbar Switch --- p.48Chapter 3.4.2 --- Switch Scheduling --- p.49Chapter 3.4.2.1 --- Bipartite Matching --- p.50Chapter 3.4.2.2 --- Token-based Distributive Scheduling --- p.53Chapter 3.4.2.3 --- Resource Counting using Semaphore --- p.56Chapter 3.5. --- Algebraic Switches --- p.60Chapter 3.5.1 --- Switching by Conditionally Nonblocking Properties --- p.61Chapter 3.5.2 --- Self-Routing Mechanism with Zero-Bit Buffering --- p.64Chapter 3.5.3 --- Multistage Interconnection of Self-routing Concentrators --- p.69Chapter 3.6. --- Summary --- p.73Chapter 4. --- Investigating Switching Issue in Storage Networks --- p.74Chapter 4.1 --- Choosing a Suitable Switch --- p.74Chapter 4.2 --- Quality of Service (QoS) --- p.76Chapter 4.3 --- Multicasting --- p.77Chapter 4.3.1 --- Crossbar Switch --- p.78Chapter 4.3.2 --- Shared-Buffer Memory Switches --- p.80Chapter 4.3.3 --- Algebraic Switch --- p.82Chapter 4.3.4 --- Application on Multicast Transmission --- p.86Chapter 4.4 --- Load Balancing Mechanism --- p.87Chapter 4.5 --- Optimization on Storage Utilization --- p.91Chapter 4.6 --- Summary --- p.93Chapter 5. --- Conclusion and Summary of Original Contributions --- p.9

CUHK Digital Repository

High-performance computing for vision

Author: Bhat PB
Präs Anna VK
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio

HKU Scholars Hub

Self Adjusting Contention Friendly Concurrent Binary Search Tree by Lazy Splaying

Author: Regmee Mahesh Raj
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2014
Field of study

We present a partial blocking implementation of concurrent binary search tree data structure that is contention friendly, fast and scales well. It uses a technique, called lazy splaying to move frequently accessed items close to the root without making the root of the tree a sequential bottleneck. Most of the self adjusting binary search trees are constrained to guarantee the height of a tree even in the presence of concurrency. But, this methodology roughly guarantees the height of a tree only in the absence of contention and limits the contention during concurrent accesses. The main idea is to divide the update operation into two operations:an eager abstract modification with lazy splayingthat completes quickly and makes at most one local rotation of the tree on each access as a function of historical access frequencies; anda lazy structural adaptation with long/semi splayingwhich implements top down recursive splaying of the tree that may be postponed to diminish contention and re-balance the tree during less contention. This way, the frequently accessed items perform full splaying but after few accesses only and always appear near the root of the tree. Whereas, the infrequently accessed items will not get enough pushes up the tree and stay in the bottom part of the tree. As in sequential counting based splay tree, the amortized time bound of each operation isO(log N), whereNis the number of items in the tree

University of Nevada, Las Vegas Repository