Search CORE

445 research outputs found

A fast parallel algorithm for special linear systems of equations using processor arrays with reconfigurable bus systems

Author: Chaudhari N. S.
Fehr Elfriede
Wankar Rajeev
Publication venue
Publication date: 01/01/1999
Field of study

A parallel algorithm using Processor Arrays with Reconfigurable Bus Systems has been designed to solve dense Symmetric Positive Definite (SPD) systems of equations Ax = b. The key content of this report is the parallelisation of the algorithm by Delosme & Ipson [8]. In order to design a parallel algorithm for PARBS, many procedures involved in [8] are handled in a slightly different way. The parallel time and processor’s complexity of each step of the algorithm is calculated. The parallel time complexity is O(n) using 2n × 2n × 5n number of Processing Elements

Institutional Repository of the Freie Universität Berlin

A Finite Domain Constraint Approach for Placement and Routing of Coarse-Grained Reconfigurable Architectures

Author: Saraswat Rohit
Publication venue: DigitalCommons@USU
Publication date: 01/05/2010
Field of study

Scheduling, placement, and routing are important steps in Very Large Scale Integration (VLSI) design. Researchers have developed numerous techniques to solve placement and routing problems. As the complexity of Application Specific Integrated Circuits (ASICs) increased over the past decades, so did the demand for improved place and route techniques. The primary objective of these place and route approaches has typically been wirelength minimization due to its impact on signal delay and design performance. With the advent of Field Programmable Gate Arrays (FPGAs), the same place and route techniques were applied to FPGA-based design. However, traditional place and route techniques may not work for Coarse-Grained Reconfigurable Architectures (CGRAs), which are reconfigurable devices offering wider path widths than FPGAs and more flexibility than ASICs, due to the differences in architecture and routing network. Further, the routing network of several types of CGRAs, including the Field Programmable Object Array (FPOA), has deterministic timing as compared to the routing fabric of most ASICs and FPGAs reported in the literature. This necessitates a fresh look at alternative approaches to place and route designs. This dissertation presents a finite domain constraint-based, delay-aware placement and routing methodology targeting an FPOA. The proposed methodology takes advantage of the deterministic routing network of CGRAs to perform a delay aware placement

DigitalCommons@USU

Applications of Broyden-based input space mapping to modeling and design optimization in high-tech companies in Mexico

Author: bakr
chávez-hurtado
koziel
leal-romo
leal-romo
rangel-patino
rangel-patiño
rangel-patiño
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/10/2019
Field of study

One of the most powerful and computationally efficient optimization approaches in RF and microwave engineering is the space mapping (SM) approach to design. SM optimization methods belong to the general class of surrogate-based optimization algorithms. They are specialized on the efficient optimization of computationally expensive models. This paper reviews the Broyden-based input SM algorithm, better known as aggressive space mapping (ASM), which is perhaps the SM variation with more industrial applications. The two main characteristics that explain its popularity in industry and academia are emphasized in this paper: simplicity and efficiency. The fundamentals behind the Broyden-based input SM algorithm are described, highlighting key steps for its successful implementation, as well as situations where it may fail. Recent applications of the Broyden-based input space mapping algorithm in high-tech industries located in Mexico are briefly described, including application areas such as signal integrity and high-speed interconnect design, as well as post-silicon validation of high-performance computer platforms, among others. Emerging new applications in multi-physics interconnect design and power-integrity design optimization are also mentioned.ITESO, A.C

Crossref

Repositorio Institucional del ITESO

A protocol reconfiguration and optimization system for MPI

Author: Gorentla Venkata Manjunath
Publication venue: UNM Digital Repository
Publication date: 01/05/2010
Field of study

Modern high performance computing (HPC) applications, for example adaptive mesh refinement and multi-physics codes, have dynamic communication characteristics which result in poor performance on current Message Passing Interface (MPI) implementations. The degraded application performance can be attributed to a mismatch between changing application requirements and static communication library functionality. To improve the performance of these applications, MPI libraries should change their protocol functionality in response to changing application requirements, and tailor their functionality to take advantage of hardware capabilities. This dissertation describes Protocol Reconfiguration and Optimization system for MPI (PRO-MPI), a framework for constructing profile-driven reconfigurable MPI libraries; these libraries use past application characteristics (profiles) to dynamically change their functionality to match the changing application requirements. The framework addresses the challenges of designing and implementing the reconfigurable MPI libraries, which include collecting and reasoning about application characteristics to drive the protocol reconfiguration and defining abstractions required for implementing these reconfigurations. Two prototype reconfigurable MPI implementations based on the framework - Open PRO-MPI and Cactus PRO-MPI - are also presented to demonstrate the utility of the framework. To demonstrate the effectiveness of reconfigurable MPI libraries, this dissertation presents experimental results to show the impact of using these libraries on the application performance. The results show that PRO-MPI improves the performance of important HPC applications and benchmarks. They also show that HyperCLaw performance improves by approximately 22% when exact profiles are available, and HyperCLaw performance improves by approximately 18% when only approximate profiles are available

Reconfigurable Model Execution in the OpenMDAO Framework

Author: Hwang John T.
Publication venue
Publication date
Field of study

NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile

NASA Technical Reports Server

Center for Aeronautics and Space Information Sciences

Author: Flynn Michael J.
Publication venue
Publication date
Field of study

This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets

NASA Technical Reports Server

Fast, Scalable, and Interactive Software for Landau-de Gennes Numerical Modeling of Nematic Topological Defects

Author: Beller Daniel A
Sussman Daniel M
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Numerical modeling of nematic liquid crystals using the tensorial Landau-de Gennes (LdG) theory provides detailed insights into the structure and energetics of the enormous variety of possible topological defect configurations that may arise when the liquid crystal is in contact with colloidal inclusions or structured boundaries. However, these methods can be computationally expensive, making it challenging to predict (meta)stable configurations involving several colloidal particles, and they are often restricted to system sizes well below the experimental scale. Here we present an open-source software package that exploits the embarrassingly parallel structure of the lattice discretization of the LdG approach. Our implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate simulations using both CPU and GPU resources in either single- or multiple-core configurations. We make use of an efficient minimization algorithm, the Fast Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale parallelization, requiring little additional memory or computational cost while offering performance competitive with other commonly used methods. In multi-core operation we are able to scale simulations up to supra-micron length scales of experimental relevance, and in single-core operation the simulation package includes a user-friendly GUI environment for rapid prototyping of interfacial features and the multifarious defect states they can promote. To demonstrate this software package, we examine in detail the competition between curvilinear disclinations and point-like hedgehog defects as size scale, material properties, and geometric features are varied. We also study the effects of an interface patterned with an array of topological point-defects.Comment: 16 pages, 6 figures, 1 youtube link. The full catastroph

arXiv.org e-Print Archive

eScholarship - University of California