197 research outputs found
Enabling technology for non-rigid registration during image-guided neurosurgery
In the context of image processing, non-rigid registration is an operation that attempts to align two or more images using spatially varying transformations. Non-rigid registration finds application in medical image processing to account for the deformations in the soft tissues of the imaged organs. During image-guided neurosurgery, non-rigid registration has the potential to assist in locating critical brain structures and improve identification of the tumor boundary. Robust non-rigid registration methods combine estimation of tissue displacement based on image intensities with the spatial regularization using biomechanical models of brain deformation. In practice, the use of such registration methods during neurosurgery is complicated by a number of issues: construction of the biomechanical model used in the registration from the image data, high computational demands of the application, and difficulties in assessing the registration results. In this dissertation we develop methods and tools that address some of these challenges, and provide components essential for the intra-operative application of a previously validated physics-based non-rigid registration method.;First, we study the problem of image-to-mesh conversion, which is required for constructing biomechanical model of the brain used during registration. We develop and analyze a number of methods suitable for solving this problem, and evaluate them using application-specific quantitative metrics. Second, we develop a high-performance implementation of the non-rigid registration algorithm and study the use of geographically distributed Grid resources for speculative registration computations. Using the high-performance implementation running on the remote computing resources we are able to deliver the results of registration within the time constraints of the neurosurgery. Finally, we present a method that estimates local alignment error between the two images of the same subject. We assess the utility of this method using multiple sources of ground truth to evaluate its potential to support speculative computations of non-rigid registration
A Preemption-Based Meta-Scheduling System for Distributed Computing
This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments.
In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers.
Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution.
The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application
Toward Optimizing Distributed Programs Directed by Configurations
Networks of workstations are now viable environments for running distributed
and parallel applications. Recent advances in software interconnection
technology enables programmers to prepare applications to run in dynamically
changing environments because module interconnection activity is regarded as
an essentially distinct and different intellectual activity so as isolated
from that of implementing individual modules. But there remains the question
of how to optimize the performance of those applications for a given execution
environment: how can developers realize performance gains without paying a high
programming cost to specialize their application for the target environment?
Interconnection technology has allowed programmers to tailor and tune their
applications on distributed environments, but the traditional approach to this
process has ignored the performance issue over gracefully seemless integration
of various software components
Runtime support for load balancing of parallel adaptive and irregular applications
Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform
GUMSMP: a scalable parallel Haskell implementation
The most widely available high performance platforms today are hierarchical,
with shared memory leaves, e.g. clusters of multi-cores, or NUMA with multiple
regions. The Glasgow Haskell Compiler (GHC) provides a number of parallel
Haskell implementations targeting different parallel architectures. In particular,
GHC-SMP supports shared memory architectures, and GHC-GUM supports
distributed memory machines. Both implementations use different, but related,
runtime system (RTS) mechanisms and achieve good performance. A specialised
RTS for the ubiquitous hierarchical architectures is lacking.
This thesis presents the design, implementation, and evaluation of a new
parallel Haskell RTS, GUMSMP, that combines shared and distributed memory
mechanisms to exploit hierarchical architectures more effectively. The design
evaluates a variety of design choices and aims to efficiently combine scalable
distributed memory parallelism, using a virtual shared heap over a hierarchical
architecture, with low-overhead shared memory parallelism on shared memory
nodes. Key design objectives in realising this system are to prefer local work,
and to exploit mostly passive load distribution with pre-fetching.
Systematic performance evaluation shows that the automatic hierarchical load
distribution policies must be carefully tuned to obtain good performance. We
investigate the impact of several policies including work pre-fetching, favouring
inter-node work distribution, and spark segregation with different export and
select policies. We present the performance results for GUMSMP, demonstrating
good scalability for a set of benchmarks on up to 300 cores. Moreover, our policies
provide performance improvements of up to a factor of 1.5 compared to GHC-
GUM.
The thesis provides a performance evaluation of distributed and shared heap
implementations of parallel Haskell on a state-of-the-art physical shared memory
NUMA machine. The evaluation exposes bottlenecks in memory management,
which limit scalability beyond 25 cores. We demonstrate that GUMSMP, that
combines both distributed and shared heap abstractions, consistently outper-
forms the shared memory GHC-SMP on seven benchmarks by a factor of 3.3
on average. Specifically, we show that the best results are obtained when shar-
ing memory only within a single NUMA region, and using distributed memory
system abstractions across the regions
Support for flexible and transparent distributed computing
Modern distributed computing developed from the traditional supercomputing community rooted firmly
in the culture of batch management. Therefore, the field has been dominated by queuing-based resource
managers and work flow based job submission environments where static resource demands needed be
determined and reserved prior to launching executions. This has made it difficult to support resource
environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements
of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution
model where the compute capacity can be adapted to fit the needs of applications as they change during
execution. Resource provision in this model is based on a fine-grained, self-service approach instead
of the traditional one-time, system-level model. The thesis introduces a middleware based Application
Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources
with the underlying resource infrastructure.
We also consider the issue of transparency, i.e., hiding the provision and management of the distributed
environment. This is the key to attracting public to use the technology. The AA not only replaces
user-controlled process of preparing and executing an application with a transparent software-controlled
process, it also hides the complexity of selecting right resources to ensure execution QoS. This service
is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating
with the flexible execution model. The AA constantly monitors utility-based feedbacks from the
application during execution and thus is able to learn its behaviour and resource characteristics. This
allows it to automatically compose the most efficient execution environment on the fly and satisfy any
execution requirements defined by users. Two policies are introduced to supervise the information learning
and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their
historical performance contributions to the application. According to this classification, the AA chooses
high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired
Processing Power Estimation (DPPE) policy dynamically configures the execution environment according
to the estimated desired total processing power needed to satisfy usersā execution requirements.
Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal
distributed application anywhere with optimised execution performance, without managing distributed
resources. Based on the standalone model, the thesis further introduces a federated resource negotiation
framework as a step forward towards an autonomous multi-user distributed computing world
Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014): Porto, Portugal
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014
- ā¦