13 research outputs found
Generalizing Amdahl’s Law for Power and Energy
Extending Amdahl\u27s law to identify optimal power-performance configurations requires considering the interactive effects of power, performance, and parallel overhead
The Limits Of DIS: Final Report
Two reports on the computational capabilities of formal DIS networks in simulating real physical events and the demands placed on DIS in the future as projected by the DIS user community.; Contents: 50514 01 Purpose -- Background -- Phase 1 -- Phase 2 --Appendices -- A On the limitation of DIS by Thomas L. Clarke -- Appendix B Useful communications in distributed interactive simulation by Scott H. Smith --Appendix C Useful communications in distributed interactive simulation (viewgraphs) by Scott H. Smith
Coupling of adaptive refinement with variational multiscale element free Galerkin method for high gradient problems
In this thesis, a new adaptive refinement coupled with variational multiscale element free Galerkin method (EFGM) is developed for solving high gradient problems. The aim of this thesis is to propose a new framework of moving least squares (MLS) approximation with coupling method based on the variational multiscale concept. Additional new nodes will be inserted automatically at high gradient regions by adaptive algorithm based on refinement criteria. An enrichment function is embedded in the MLS approximation for the fine scale part of the problem. Besides, this new technique will be parallelized by using OpenMP which is based on shared memory architecture. The proposed new approach is first applied in two-dimensional large localized gradient problem, transient heat conduction problem as well as Burgers' equation in order to analyze the accuracy of the proposed method and validated with an available analytic solutions. The obtained numerical results show a very good agreement with the analytic solutions and is able to obtain more accurate results than the standard EFGM. It is found that the average relative error of this new method is reduced in the range of 15% to 70%. Besides, this new method is also extended to solve two-dimensional sine-Gordon solitons. The results obtained show good agreement with the published results. Moreover, the parallelization of adaptive variational multiscale EFGM can improve the computational efficiency by reducing the execution time without loss of accuracy. Therefore, the capability and robustness of this new method has the potential to investigate more complicated problems in order to produce higher precision solutions with shorter computational time
Recommended from our members
A graph theoretic approach to transputer network design for computer vision
The work in this thesis is concerned with parallel architectures based on the Inmos transputer-type processors and parallelisation of some computer vision tasks chosen from low to high level.
The transputer is a microprocessor with a micro-programmed scheduler and four serial communication links. It directly supports parallel processing since several transputers can be connected through their links to co-operate on solving a problem. Also several processes can be run on the same transputer. A major issue in parallel processing is the communication overhead introduced by parallelising a given task. This overhead is not present in sequential processing and must be curbed if the implementation of a task on a parallel machine is to be successful. The interconnection network underlying the architecture of a parallel computer is therefore of the utmost importance.
Computer Vision consists of a hierarchy of tasks ranging from low-level operations dealing with large amounts of relatively simple data to high level operations handling increasingly complex structures. In this work a novel edge detector based on adaptive filtering and an edge detector operating on colour images are presented and implemented on a number of transputers. These parallel implementations together with implementations of vector quantisation, Fourier descriptors for shape discrimination, the Hough transform and the Maximum clique algorithm, offer a notable performance increase when compared with sequential implementations. However, every algorithm required the design of a specific network of transputers to take advantage of the parallelism and data dependencies inherent in each.
Consequently, attention is focused on the topology of interconnection networks. In particular, the communication requirements of computer vision algorithms as identified by the various computer vision tasks are analysed. These requirements together with graph theoretical considerations are then used to suggest a topology for large transputer networks. The latter is based on sub-graphs, with proven performance when used to implement interconnection networks, combined to form an architecture with improved performance. This architecture consists of a fixed structure supplemented with a dynamically reconfigured network. After describing this topology, a routing algorithm that conveys messages along shortest paths in the network is given and implemented. And finally, some practical issues in the use of transputers are considered and solutions proposed
Optimal seismic retrofitting of existing RC frames through soft-computing approaches
2016 - 2017Ph.D. Thesis proposes a Soft-Computing approach capable of supporting the engineer judgement in the selection and
design of the cheapest solution for seismic retrofitting of existing RC framed structure. Chapter 1 points out the need for
strengthening the existing buildings as one of the main way of decreasing economic and life losses as direct
consequences of earthquake disasters. Moreover, it proposes a wide, but not-exhaustive, list of the most frequently
observed deficiencies contributing to the vulnerability of concrete buildings. Chapter 2 collects the state of practice on
seismic analysis methods for the assessment the safety of the existing buildings within the framework of a performancebased
design. The most common approaches for modeling the material plasticity in the frame non-linear analysis are
also reviewed. Chapter 3 presents a wide state of practice on the retrofitting strategies, intended as preventive measures
aimed at mitigating the effect of a future earthquake by a) decreasing the seismic hazard demands; b) improving the
dynamic characteristics supplied to the existing building. The chapter presents also a list of retrofitting systems,
intended as technical interventions commonly classified into local intervention (also known “member-level”
techniques) and global intervention (also called “structure-level” techniques) that might be used in synergistic
combination to achieve the adopted strategy. In particular, the available approaches and the common criteria,
respectively for selecting an optimum retrofit strategy and an optimal system are discussed. Chapter 4 highlights the
usefulness of the Soft-Computing methods as efficient tools for providing “objective” answer in reasonable time for
complex situation governed by approximation and imprecision. In particular, Chapter 4 collects the applications found
in the scientific literature for Fuzzy Logic, Artificial Neural Network and Evolutionary Computing in the fields of
structural and earthquake engineering with a taxonomic classification of the problems in modeling, simulation and
optimization. Chapter 5 “translates” the search for the cheapest retrofitting system into a constrained optimization
problem. To this end, the chapter includes a formulation of a novel procedure that assembles a numerical model for
seismic assessment of framed structures within a Soft-Computing-driven optimization algorithm capable to minimize
the objective function defined as the total initial cost of intervention. The main components required to assemble the
procedure are described in the chapter: the optimization algorithm (Genetic Algorithm); the simulation framework
(OpenSees); and the software environment (Matlab). Chapter 6 describes step-by-step the flow-chart of the proposed
procedure and it focuses on the main implementation aspects and working details, ranging from a clever initialization of
the population of candidate solutions up to a proposal of tuning procedure for the genetic parameters. Chapter 7
discusses numerical examples, where the Soft-Computing procedure is applied to the model of multi-storey RC frames
obtained through simulated design. A total of fifteen “scenarios” are studied in order to assess its “robustness” to
changes in input data. Finally, Chapter 8, on the base of the outcomes observed, summarizes the capabilities of the
proposed procedure, yet highlighting its “limitations” at the current state of development. Some possible modifications
are discussed to enhance its efficiency and completeness. [edited by author]XVI n.s
High performance bioinformatics and computational biology on general-purpose graphics processing units
Bioinformatics and Computational Biology (BCB) is a relatively new
multidisciplinary field which brings together many aspects of the fields of
biology, computer science, statistics, and engineering. Bioinformatics extracts
useful information from biological data and makes these more intuitive and
understandable by applying principles of information sciences, while
computational biology harnesses computational approaches and technologies
to answer biological questions conveniently. Recent years have seen an
explosion of the size of biological data at a rate which outpaces the rate of
increases in the computational power of mainstream computer technologies,
namely general purpose processors (GPPs). The aim of this thesis is to explore
the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high
performance and efficient implementation of BCB applications in order to meet
the demands of biological data increases at affordable cost.
The thesis presents detailed design and implementations of GPU solutions for
a number of BCB algorithms in two widely used BCB applications, namely
biological sequence alignment and phylogenetic analysis. Biological sequence
alignment can be used to determine the potential information about a newly
discovered biological sequence from other well-known sequences through
similarity comparison. On the other hand, phylogenetic analysis is concerned
with the investigation of the evolution and relationships among organisms,
and has many uses in the fields of system biology and comparative genomics.
In molecular-based phylogenetic analysis, the relationship between species is
estimated by inferring the common history of their genes and then
phylogenetic trees are constructed to illustrate evolutionary relationships
among genes and organisms. However, both biological sequence alignment
and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with
the size of sequence databases.
The thesis firstly presents a multi-threaded parallel design of the Smith-
Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A
novel technique is put forward to solve the restriction on the length of the
query sequence in previous GPU-based implementations of the SW algorithm.
Based on this implementation, the difference between two main task
parallelization approaches (Inter-task and Intra-task parallelization) is
presented. The resulting GPU implementation matches the speed of existing
GPU implementations while providing more flexibility, i.e. flexible length of
sequences in real world applications. It also outperforms an equivalent GPPbased
implementation by 15x-20x. After this, the thesis presents the first
reported multi-threaded design and GPU implementation of the Gapped
BLAST with Two-Hit method algorithm, which is widely used for aligning
biological sequences heuristically. This achieved up to 3x speed-up
improvements compared to the most optimised GPP implementations.
The thesis then presents a multi-threaded design and GPU implementation of
a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and
multiple sequence alignment (MSA). This achieves 8x-20x speed up compared
to an equivalent GPP implementation based on the widely used ClustalW
software. The NJ method however only gives one possible tree which strongly
depends on the evolutionary model used. A more advanced method uses
maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte
Carlo (MCMC)-based Bayesian inference. The latter was the subject of another
multi-threaded design and GPU implementation presented in this thesis,
which achieved 4x-8x speed up compared to an equivalent GPP
implementation based on the widely used MrBayes software.
Finally, the thesis presents a general evaluation of the designs and
implementations achieved in this work as a step towards the evaluation of
GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA)
technology
Understanding Quantum Technologies 2022
Understanding Quantum Technologies 2022 is a creative-commons ebook that
provides a unique 360 degrees overview of quantum technologies from science and
technology to geopolitical and societal issues. It covers quantum physics
history, quantum physics 101, gate-based quantum computing, quantum computing
engineering (including quantum error corrections and quantum computing
energetics), quantum computing hardware (all qubit types, including quantum
annealing and quantum simulation paradigms, history, science, research,
implementation and vendors), quantum enabling technologies (cryogenics, control
electronics, photonics, components fabs, raw materials), quantum computing
algorithms, software development tools and use cases, unconventional computing
(potential alternatives to quantum and classical computing), quantum
telecommunications and cryptography, quantum sensing, quantum technologies
around the world, quantum technologies societal impact and even quantum fake
sciences. The main audience are computer science engineers, developers and IT
specialists as well as quantum scientists and students who want to acquire a
global view of how quantum technologies work, and particularly quantum
computing. This version is an extensive update to the 2021 edition published in
October 2021.Comment: 1132 pages, 920 figures, Letter forma
Lifetime reliability of multi-core systems: modeling and applications.
Huang, Lin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 218-232).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Preface --- p.1Chapter 1.2 --- Background --- p.5Chapter 1.3 --- Contributions --- p.6Chapter 1.3.1 --- Lifetime Reliability Modeling --- p.6Chapter 1.3.2 --- Simulation Framework --- p.7Chapter 1.3.3 --- Applications --- p.9Chapter 1.4 --- Thesis Outline --- p.10Chapter I --- Modeling --- p.12Chapter 2 --- Lifetime Reliability Modeling --- p.13Chapter 2.1 --- Notation --- p.13Chapter 2.2 --- Assumption --- p.16Chapter 2.3 --- Introduction --- p.16Chapter 2.4 --- Related Work --- p.19Chapter 2.5 --- System Model --- p.21Chapter 2.5.1 --- Reliability of A Surviving Component --- p.22Chapter 2.5.2 --- Reliability of a Hybrid k-out-of-n:G System --- p.26Chapter 2.6 --- Special Cases --- p.31Chapter 2.6.1 --- Case I: Gracefully Degrading System --- p.31Chapter 2.6.2 --- Case II: Standby Redundant System --- p.33Chapter 2.6.3 --- Case III: l-out-of-3:G System with --- p.34Chapter 2.7 --- Numerical Results --- p.37Chapter 2.7.1 --- Experimental Setup --- p.37Chapter 2.7.2 --- Experimental Results and Discussion --- p.40Chapter 2.8 --- Conclusion --- p.43Chapter 2.9 --- Appendix --- p.44Chapter II --- Simulation Framework --- p.47Chapter 3 --- AgeSim: A Simulation Framework --- p.48Chapter 3.1 --- Introduction --- p.48Chapter 3.2 --- Preliminaries and Motivation --- p.51Chapter 3.2.1 --- Prior Work on Lifetime Reliability Analysis of Processor- Based Systems --- p.51Chapter 3.2.2 --- Motivation of This Work --- p.53Chapter 3.3 --- The Proposed Framework --- p.54Chapter 3.4 --- Aging Rate Calculation --- p.57Chapter 3.4.1 --- Lifetime Reliability Calculation --- p.58Chapter 3.4.2 --- Aging Rate Extraction --- p.60Chapter 3.4.3 --- Discussion on Representative Workload --- p.63Chapter 3.4.4 --- Numerical Validation --- p.65Chapter 3.4.5 --- Miscellaneous --- p.66Chapter 3.5 --- Lifetime Reliability Model for MPSoCs with Redundancy --- p.68Chapter 3.6 --- Case Studies --- p.70Chapter 3.6.1 --- Dynamic Voltage and Frequency Scaling --- p.71Chapter 3.6.2 --- Burst Task Arrival --- p.75Chapter 3.6.3 --- Task Allocation on Multi-Core Processors --- p.77Chapter 3.6.4 --- Timeout Policy on Multi-Core Processors with Gracefully Degrading Redundancy --- p.78Chapter 3.7 --- Conclusion --- p.79Chapter 4 --- Evaluating Redundancy Schemes --- p.83Chapter 4.1 --- Introduction --- p.83Chapter 4.2 --- Preliminaries and Motivation --- p.85Chapter 4.2.1 --- Failure Mechanisms --- p.85Chapter 4.2.2 --- Related Work and Motivation --- p.86Chapter 4.3 --- Proposed Analytical Model for the Lifetime Reliability of Proces- sor Cores --- p.88Chapter 4.3.1 --- "Impact of Temperature, Voltage, and Frequency" --- p.88Chapter 4.3.2 --- Impact of Workloads --- p.92Chapter 4.4 --- Lifetime Reliability Analysis for Multi-core Processors with Vari- ous Redundancy Schemes --- p.95Chapter 4.4.1 --- Gracefully Degrading System (GDS) --- p.95Chapter 4.4.2 --- Processor Rotation System (PRS) --- p.97Chapter 4.4.3 --- Standby Redundant System (SRS) --- p.98Chapter 4.4.4 --- Extension to Heterogeneous System --- p.99Chapter 4.5 --- Experimental Methodology --- p.101Chapter 4.5.1 --- Workload Description --- p.102Chapter 4.5.2 --- Temperature Distribution Extraction --- p.102Chapter 4.5.3 --- Reliability Factors --- p.103Chapter 4.6 --- Results and Discussions --- p.103Chapter 4.6.1 --- Wear-out Rate Computation --- p.103Chapter 4.6.2 --- Comparison on Lifetime Reliability --- p.105Chapter 4.6.3 --- Comparison on Performance --- p.110Chapter 4.6.4 --- Comparison on Expected Computation Amount --- p.112Chapter 4.7 --- Conclusion --- p.118Chapter III --- Applications --- p.119Chapter 5 --- Task Allocation and Scheduling for MPSoCs --- p.120Chapter 5.1 --- Introduction --- p.120Chapter 5.2 --- Prior Work and Motivation --- p.122Chapter 5.2.1 --- IC Lifetime Reliability --- p.122Chapter 5.2.2 --- Task Allocation and Scheduling for MPSoC Designs --- p.124Chapter 5.3 --- Proposed Task Allocation and Scheduling Strategy --- p.126Chapter 5.3.1 --- Problem Definition --- p.126Chapter 5.3.2 --- Solution Representation --- p.128Chapter 5.3.3 --- Cost Function --- p.129Chapter 5.3.4 --- Simulated Annealing Process --- p.130Chapter 5.4 --- Lifetime Reliability Computation for MPSoC Embedded Systems --- p.133Chapter 5.5 --- Efficient MPSoC Lifetime Approximation --- p.138Chapter 5.5.1 --- Speedup Technique I - Multiple Periods --- p.139Chapter 5.5.2 --- Speedup Technique II - Steady Temperature --- p.139Chapter 5.5.3 --- Speedup Technique III - Temperature Pre- calculation --- p.140Chapter 5.5.4 --- Speedup Technique IV - Time Slot Quantity Control --- p.144Chapter 5.6 --- Experimental Results --- p.144Chapter 5.6.1 --- Experimental Setup --- p.144Chapter 5.6.2 --- Results and Discussion --- p.146Chapter 5.7 --- Conclusion and Future Work --- p.152Chapter 6 --- Energy-Efficient Task Allocation and Scheduling --- p.154Chapter 6.1 --- Introduction --- p.154Chapter 6.2 --- Preliminaries and Problem Formulation --- p.157Chapter 6.2.1 --- Related Work --- p.157Chapter 6.2.2 --- Problem Formulation --- p.159Chapter 6.3 --- Analytical Models --- p.160Chapter 6.3.1 --- Performance and Energy Models for DVS-Enabled Pro- cessors --- p.160Chapter 6.3.2 --- Lifetime Reliability Model --- p.163Chapter 6.4 --- Proposed Algorithm for Single-Mode Embedded Systems --- p.165Chapter 6.4.1 --- Task Allocation and Scheduling --- p.165Chapter 6.4.2 --- Voltage Assignment for DVS-Enabled Processors --- p.168Chapter 6.5 --- Proposed Algorithm for Multi-Mode Embedded Systems --- p.169Chapter 6.5.1 --- Feasible Solution Set --- p.169Chapter 6.5.2 --- Searching Procedure for a Single Mode --- p.171Chapter 6.5.3 --- Feasible Solution Set Identification --- p.171Chapter 6.5.4 --- Multi-Mode Combination --- p.177Chapter 6.6 --- Experimental Results --- p.178Chapter 6.6.1 --- Experimental Setup --- p.178Chapter 6.6.2 --- Case Study --- p.180Chapter 6.6.3 --- Sensitivity Analysis --- p.181Chapter 6.6.4 --- Extensive Results --- p.183Chapter 6.7 --- Conclusion --- p.185Chapter 7 --- Customer-Aware Task Allocation and Scheduling --- p.186Chapter 7.1 --- Introduction --- p.186Chapter 7.2 --- Prior Work and Problem Formulation --- p.188Chapter 7.2.1 --- Related Work and Motivation --- p.188Chapter 7.2.2 --- Problem Formulation --- p.191Chapter 7.3 --- Proposed Design-Stage Task Allocation and Scheduling --- p.192Chapter 7.3.1 --- Solution Representation and Moves --- p.193Chapter 7.3.2 --- Cost Function --- p.196Chapter 7.3.3 --- Impact of DVFS --- p.198Chapter 7.4 --- Proposed Algorithm for Online Adjustment --- p.200Chapter 7.4.1 --- Reliability Requirement for Online Adjustment --- p.201Chapter 7.4.2 --- Analytical Model --- p.203Chapter 7.4.3 --- Overall Flow --- p.204Chapter 7.5 --- Experimental Results --- p.205Chapter 7.5.1 --- Experimental Setup --- p.205Chapter 7.5.2 --- Results and Discussion --- p.207Chapter 7.6 --- Conclusion --- p.211Chapter 7.7 --- Appendix --- p.211Chapter 8 --- Conclusion and Future Work --- p.214Chapter 8.1 --- Conclusion --- p.214Chapter 8.2 --- Future Work --- p.215Bibliography --- p.23