75 research outputs found

    Evaluation of uncertainty in dynamic, reduced-order power system models

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (leaves 209-213).With the advent of high-speed computation and the desire to analyze increasingly complex behavior in power systems, simulation techniques are gaining importance and prevalence. However, while simulations of large, interconnected power systems are feasible, they remain time-consuming. Additionally, the models and parameters used in simulations are uncertain, due to measurement uncertainty, the need to approximate complex behavior with low-order models and the inherent changing nature of the power system. This thesis explores the use of model reduction techniques to enable the study of uncertainty in large-scale power system models. The main goal of this thesis is to demonstrate that uncertainty analyses of transient simulations of large, interconnected power systems are possible. To achieve this, we demonstrate that a basic three stage approach to the problem yields useful results without significantly increasing the computational burden. The first stage is to reduce the order of the original power system model, which reduces simulation times and allows the system to be simulated multiple times in a reasonable time-frame. Second, the mechanics of the model reduction are closely studied; how uncertainties affect the reduction process and the parameters in the reduced-order model as well as how the process of reduction increases uncertainty are of particular interest. Third, the reduced-order model and its accompanying uncertainty description are used to study the uncertainty of the original model. Our demonstration uses a particular model reduction technique, synchronic modal equivalencing (SME), and a particular uncertainty analysis method, the probabilistic collocation method (PCM). Though our ideas are applicable more generally, a concrete demonstration of the principle is instructive and necessary. Further, while these particular techniques are not relevant to every system, they do apply to a broad class of systems and illustrate the salient features of our methodology. As mentioned above, a detailed analysis of the model reduction technique, in this case SME, is necessary. As an ancillary benefit of the thesis work, interesting theoretical results relevant to the SME algorithm, which is still under development, are derived.by James R. Hockenberry.Ph.D

    Fourteenth NASTRAN (R) Users' Colloquium

    Get PDF
    The proceedings of a colloquium are presented along with technical papers contributed during the conference. Reviewed are general applications of finite element methodology and the specific application of the NASA Structural Analysis System, NASTRAN, to a variety of static and dynamic sturctural problems

    Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA

    Get PDF
    Deep neural network (DNN) has achieved remarkable success in many applications because of its powerful capability for data processing. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex nonlinear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. The brute-force computing model of DNN often requires extremely large hardware resources, introducing severe concerns on its scalability running on traditional von Neumann architecture. The well-known memory wall, and latency brought by the long-range connectivity and communication of DNN severely constrain the computation efficiency of DNN. The acceleration techniques of DNN, either software or hardware, often suffer from poor hardware execution efficiency of the simplified model (software), or inevitable accuracy degradation and limited supportable algorithms (hardware), respectively. In order to preserve the inference accuracy and make the hardware implementation in a more efficient form, a close investigation to the hardware/software co-design methodologies for DNNs is needed. The proposed work first presents an FPGA-based implementation framework for Recurrent Neural Network (RNN) acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. Secondly, we propose a data locality-aware sparse matrix and vector multiplication (SpMV) kernel. At software level, we reorganize a large sparse matrix into many modest-sized blocks by adopting hypergraph-based partitioning and clustering. Available hardware constraints have been taken into consideration for the memory allocation and data access regularization. Thirdly, we present a holistic acceleration to sparse convolutional neural network (CNN). During network training, the data locality is regularized to ease the hardware mapping. The distributed architecture enables high computation parallelism and data reuse. The proposed research results in an hardware/software co-design methodology for fast and accurate DNN acceleration, through the innovations in algorithm optimization, hardware implementation, and the interactive design process across these two domains

    Fast prediction of transonic aeroelasticity using computational fluid dynamics

    Get PDF
    The exploitation of computational fluid dynamics for non linear aeroelastic simulations is mainly based on time domain simulations of the Euler and Navier-Stokes equations coupled with structural models. Current industrial practice relies heavily on linear methods which can lead to conservative design and flight envelope restrictions. The significant aeroelastic effects caused by nonlinear aerodynamics include the transonic flutter dip and limit cycle oscillations. An intensive research effort is underway to account for aerodynamic nonlinearity at a practical computational cost.To achieve this a large reduction in the numbers of degrees of freedoms is required and leads to the construction of reduced order models which provide compared with CFD simulations an accurate description of the dynamical system at much lower cost. In this thesis we consider limit cycle oscillations as local bifurcations of equilibria which are associated with degenerate behaviour of a system of linearised aeroelastic equations. This extra information can be used to formulate a method for the augmented solve of the onset point of instability - the flutter point. This method contains all the fidelity of the original aeroelastic equations at much lower cost as the stability calculation has been reduced from multiple unsteady computations to a single steady state one. Once the flutter point has been found, the centre manifold theory is used to reduce the full order system to two degrees of freedom. The thesis describes three methods for finding stability boundaries, the calculation of a reduced order models for damping and for limit cycle oscillations predictions. Results are shown for aerofoils, and the AGARD, Goland, and a supercritical transport wing. It is shown that the methods presented allow results comparable to the full order system predictions to be obtained with CPU time reductions of between one and three orders of magnitude

    Scalable Graph Analysis and Clustering on Commodity Hardware

    Get PDF
    The abundance of large-scale datasets both in industry and academia today has lead to a need for scalable data analysis frameworks and libraries. This assertion is exceedingly apparent in large-scale graph datasets. The vast majority of existing frameworks focus on distributing computation within a cluster, neglecting to fully utilize each individual node, leading to poor overall performance. This thesis is motivated by the prevalence of Non-Uniform Memory Access (NUMA) architectures within multicore machines and the advancements in the performance of external memory devices like SSDs. This thesis focusses on the development of machine learning frameworks, libraries, and application development principles to enable scalable data analysis, with minimal resource consumption. We develop novel optimizations that leverage fine-grain I/O and NUMA-awareness to advance the state-of-the-art within the areas of scalable graph analytics and machine learning. We focus on minimality, scalability and memory parallelism when data reside either in (i) memory, (ii) semi-externally, or (iii) distributed memory. We target two core areas: (i) graph analytics and (ii) community detection (clustering). The semi-external memory (SEM) paradigm is an attractive middle ground for limited resource consumption and near-in-memory performance on a single thick compute node. In recent years, its adoption has steadily risen in popularity with framework developers, despite having limited adoption from application developers. We address key questions surrounding the development of state-of-the-art applications within an SEM, vertex-centric graph framework. Our target is to lower the barrier for entry to SEM, vertex-centric application development. As such, we develop Graphyti, a library of highly optimized applications in Semi-External Memory (SEM) using the FlashGraph framework. We utilize this library to identify the core principles that underlie the development of state-of-the-art vertex-centric graph applications in SEM. We then address scaling the task of community detection through clustering given arbitrary hardware budgets. We develop the clusterNOR extensible clustering framework and library with facilities for optimized scale-out and scale-up computation. In summation, this thesis develops key SEM design principles for graph analytics, introduces novel algorithmic and systems-oriented optimizations for scalable algorithms that utilize a two-step Majorize-Minimization or Minorize-Maximization (MM) objective function optimization pattern. The optimizations we develop enable the applications and libraries provided to attain state-of-the-art performance in varying memory settings

    Parallel implementation of the finite element method on shared memory multiprocessors

    Get PDF
    PhD ThesisThe work presented in this thesis concerns parallel methods for finite element analysis. The research has been funded by British Gas and some of the presented material involves work on their software. Practical problems involving the finite element method can use a large amount of processing power and the execution times can be very large. It is consequently important to investigate the possibilities for the parallel implementation of the method. The research has been carried out on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's. We firstly experimented on autoparallelising a large British Gas finite element program (GASP4) using Encore's parallelising Fortran compiler (epf). The par- allel program generated by epj proved not to be efficient. The main reasons are the complexity of the code and small grain parallelism. Since the program is hard to analyse for the compiler at high levels, only small grain parallelism has been inserted automatically into the code. This involves a great deal of low level syn- chronisations which produce large overheads and cause inefficiency. A detailed analysis of the autoparallelised code has been made with a view to determining the reasons for the inefficiency. Suggestions have also been made about writing programs such that they are suitable for efficient autoparallelisation. The finite element method consists of the assembly of a stiffness matrix and the solution of a set of simultaneous linear equations. A sparse representation of the stiffness matrix has been used to allow experimentation on large problems. Parallel assembly techniques for the sparse representation have been developed. Some of these methods have proved to be very efficient giving speed ups that are near ideal. For the solution phase, we have used the preconditioned conjugate gradient method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill- in (ILU(O)) has been found to be an effective preconditioner. The factors can be obtained at a low cost. We have parallelised all the steps of the PCG method. The main bottleneck is the triangular solves (preconditioning operations) at each step. Two parallel methods of triangular solution have been implemented. One is based on level scheduling (row-oriented parallelism) and the other is a new approach called independent columns (column-oriented parallelism). The algorithms have been tested for row and red-black orderings of the nodal unknowns in the finite element meshes considered. The best speed ups obtained are 7.29 (on 12 processors) for level scheduling and 7.11 (on 12 processors) for independent columns. Red-black ordering gives rise to better parallel performance than row ordering in general. An analysis of methods for the improvement of the parallel efficiency has been made.British Ga
    • …
    corecore