75 research outputs found
Evaluation of uncertainty in dynamic, reduced-order power system models
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (leaves 209-213).With the advent of high-speed computation and the desire to analyze increasingly complex behavior in power systems, simulation techniques are gaining importance and prevalence. However, while simulations of large, interconnected power systems are feasible, they remain time-consuming. Additionally, the models and parameters used in simulations are uncertain, due to measurement uncertainty, the need to approximate complex behavior with low-order models and the inherent changing nature of the power system. This thesis explores the use of model reduction techniques to enable the study of uncertainty in large-scale power system models. The main goal of this thesis is to demonstrate that uncertainty analyses of transient simulations of large, interconnected power systems are possible. To achieve this, we demonstrate that a basic three stage approach to the problem yields useful results without significantly increasing the computational burden. The first stage is to reduce the order of the original power system model, which reduces simulation times and allows the system to be simulated multiple times in a reasonable time-frame. Second, the mechanics of the model reduction are closely studied; how uncertainties affect the reduction process and the parameters in the reduced-order model as well as how the process of reduction increases uncertainty are of particular interest. Third, the reduced-order model and its accompanying uncertainty description are used to study the uncertainty of the original model. Our demonstration uses a particular model reduction technique, synchronic modal equivalencing (SME), and a particular uncertainty analysis method, the probabilistic collocation method (PCM). Though our ideas are applicable more generally, a concrete demonstration of the principle is instructive and necessary. Further, while these particular techniques are not relevant to every system, they do apply to a broad class of systems and illustrate the salient features of our methodology. As mentioned above, a detailed analysis of the model reduction technique, in this case SME, is necessary. As an ancillary benefit of the thesis work, interesting theoretical results relevant to the SME algorithm, which is still under development, are derived.by James R. Hockenberry.Ph.D
Fourteenth NASTRAN (R) Users' Colloquium
The proceedings of a colloquium are presented along with technical papers contributed during the conference. Reviewed are general applications of finite element methodology and the specific application of the NASA Structural Analysis System, NASTRAN, to a variety of static and dynamic sturctural problems
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA
Deep neural network (DNN) has achieved remarkable success in many applications because of its powerful capability for data processing. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex nonlinear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. The brute-force computing model of DNN often requires extremely large hardware resources, introducing severe concerns on its scalability running on traditional von Neumann architecture. The well-known memory wall, and latency brought by the long-range connectivity and communication of DNN severely constrain the computation efficiency of DNN. The acceleration techniques of DNN, either software or hardware, often suffer from poor hardware execution efficiency of the simplified model (software), or inevitable accuracy degradation and limited supportable algorithms (hardware), respectively. In order to preserve the inference accuracy and make the hardware implementation in a more efficient form, a close investigation to the hardware/software co-design methodologies for DNNs is needed.
The proposed work first presents an FPGA-based implementation framework for Recurrent Neural Network (RNN) acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. Secondly, we propose a data locality-aware sparse matrix and vector multiplication (SpMV) kernel. At software level, we reorganize a large sparse matrix into many modest-sized blocks by adopting hypergraph-based partitioning and clustering. Available hardware constraints have been taken into consideration for the memory allocation and data access regularization. Thirdly, we present a holistic acceleration to sparse convolutional neural network (CNN). During network training, the data locality is regularized to ease the hardware mapping. The distributed architecture enables high computation parallelism and data reuse. The proposed research results in an hardware/software co-design methodology for fast and accurate DNN acceleration, through the innovations in algorithm optimization, hardware implementation, and the interactive design process across these
two domains
Fast prediction of transonic aeroelasticity using computational fluid dynamics
The exploitation of computational fluid dynamics for non linear aeroelastic simulations is mainly based on time domain simulations of the Euler and Navier-Stokes equations coupled with structural models. Current industrial practice relies heavily on linear methods which can lead to conservative design and flight envelope restrictions.
The significant aeroelastic effects caused by nonlinear aerodynamics include the transonic flutter dip and limit cycle oscillations. An intensive research effort is underway to account for aerodynamic nonlinearity at a practical computational cost.To achieve this a large
reduction in the numbers of degrees of freedoms is required and leads to the construction of reduced order models which provide compared with CFD simulations an accurate description of the dynamical system at much lower cost.
In this thesis we consider limit cycle oscillations as local bifurcations of equilibria which are associated with degenerate behaviour of a system of linearised aeroelastic
equations. This extra information can be used to formulate a method for the augmented solve of the onset point of instability - the flutter point. This method contains all the fidelity of the original aeroelastic equations at much
lower cost as the stability calculation has been reduced from multiple unsteady computations to a single steady state one. Once the flutter point has been found, the
centre manifold theory is used to reduce the full order system to two degrees of freedom. The thesis describes three methods for finding stability boundaries, the calculation of a reduced order models for damping
and for limit cycle oscillations predictions. Results are shown for aerofoils, and the AGARD, Goland, and a supercritical transport wing.
It is shown that the methods presented allow results comparable to the full order system predictions to be obtained with CPU time reductions of between one and three orders of magnitude
Scalable Graph Analysis and Clustering on Commodity Hardware
The abundance of large-scale datasets both in industry and academia today has
lead to a need for scalable data analysis frameworks and libraries.
This assertion is exceedingly apparent in large-scale graph datasets.
The vast majority of existing frameworks focus on distributing computation
within a cluster, neglecting to fully utilize each individual node,
leading to poor overall performance. This thesis is motivated by the prevalence of
Non-Uniform Memory Access (NUMA) architectures within multicore machines and
the advancements in the performance of external memory devices like SSDs.
This thesis focusses on the development of machine
learning frameworks, libraries, and application development principles to enable
scalable data analysis, with minimal resource consumption. We develop novel
optimizations that leverage fine-grain I/O and NUMA-awareness to advance
the state-of-the-art within the areas of scalable graph analytics and machine
learning.
We focus on minimality, scalability and memory
parallelism when data reside either in (i) memory, (ii) semi-externally, or
(iii) distributed memory. We target two core areas:
(i) graph analytics and (ii) community detection (clustering).
The semi-external memory (SEM) paradigm is an attractive middle ground for
limited resource consumption and near-in-memory performance on a single thick
compute node. In recent years, its adoption has steadily risen in popularity with
framework developers, despite having limited adoption from application developers.
We address key questions surrounding the development of state-of-the-art
applications within an SEM, vertex-centric graph framework. Our target is to
lower the barrier for entry to SEM, vertex-centric application development.
As such, we develop Graphyti, a library of highly optimized applications in
Semi-External Memory (SEM) using the FlashGraph framework. We utilize this
library to identify the core principles that underlie the development of
state-of-the-art vertex-centric graph applications in SEM.
We then address scaling the task of community detection through clustering given
arbitrary hardware budgets. We develop the clusterNOR
extensible clustering framework and library with facilities for optimized
scale-out and scale-up computation. In summation, this thesis develops
key SEM design principles for graph analytics, introduces novel algorithmic
and systems-oriented optimizations for scalable algorithms that utilize a
two-step Majorize-Minimization or Minorize-Maximization (MM) objective function
optimization pattern. The optimizations we develop enable the applications and
libraries provided to attain state-of-the-art performance in varying memory
settings
Parallel implementation of the finite element method on shared memory multiprocessors
PhD ThesisThe work presented in this thesis concerns parallel methods for finite element
analysis. The research has been funded by British Gas and some of the presented
material involves work on their software. Practical problems involving the finite
element method can use a large amount of processing power and the execution
times can be very large. It is consequently important to investigate the possibilities
for the parallel implementation of the method. The research has been carried out
on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's.
We firstly experimented on autoparallelising a large British Gas finite element
program (GASP4) using Encore's parallelising Fortran compiler (epf). The par-
allel program generated by epj proved not to be efficient. The main reasons are
the complexity of the code and small grain parallelism. Since the program is hard
to analyse for the compiler at high levels, only small grain parallelism has been
inserted automatically into the code. This involves a great deal of low level syn-
chronisations which produce large overheads and cause inefficiency. A detailed
analysis of the autoparallelised code has been made with a view to determining
the reasons for the inefficiency. Suggestions have also been made about writing
programs such that they are suitable for efficient autoparallelisation.
The finite element method consists of the assembly of a stiffness matrix and
the solution of a set of simultaneous linear equations. A sparse representation of
the stiffness matrix has been used to allow experimentation on large problems.
Parallel assembly techniques for the sparse representation have been developed.
Some of these methods have proved to be very efficient giving speed ups that are
near ideal.
For the solution phase, we have used the preconditioned conjugate gradient
method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill-
in (ILU(O)) has been found to be an effective preconditioner. The factors can be
obtained at a low cost. We have parallelised all the steps of the PCG method. The
main bottleneck is the triangular solves (preconditioning operations) at each step.
Two parallel methods of triangular solution have been implemented. One is based
on level scheduling (row-oriented parallelism) and the other is a new approach
called independent columns (column-oriented parallelism). The algorithms have
been tested for row and red-black orderings of the nodal unknowns in the finite
element meshes considered.
The best speed ups obtained are 7.29 (on 12 processors) for level scheduling
and 7.11 (on 12 processors) for independent columns. Red-black ordering gives
rise to better parallel performance than row ordering in general. An analysis of
methods for the improvement of the parallel efficiency has been made.British Ga
- …