Search CORE

35 research outputs found

The Multicomputer Toolbox - First-Generation Scalable Libraries

Author: Falgout Robert D.
Leung Alvin
Skjellum Anthony
Smith Steven G.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1993
Field of study

First-generation scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture Makefile mechanism for building applications. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zipcodemessage passing system has enabled process-grid abstractions of multicomputers, communication contexts, and process groups, all characteristics needed for building scalable libraries, and scalable application software. We describe the data-distribution-independent approach to building scalable libraries, which is needed so that applications do not unnecessarily have to redistribute data at high expense. We discuss the strategy used for implementing data-distribution mappings. We also describe high-level message-passing constructs used to achieve flexibility in transmission of data structures (Zipcode invoices). We expect Zipcode and MPI message-passing interfaces (which will incorporate many features from Zipcode, mentioned above) to co-exist in the future. We discuss progress thus far in achieving uniform interfaces for different algorithms for the same operation, which are needed to create poly-algorithms. Poly-algorithms are needed to widen the potential for scalability; uniform interfaces make simpler the testing of alternative methods with an application (whether for parallelism or for convergence, or both). We indicate that data-distribution-independent algorithms are sometimes more efficient than fixed-data-distribution counterparts, because redistribution of data can be avoided, and that this question is strongly application dependent

Syracuse University Research Facility and Collaborative Environment

Scilab to Scilab//: The Ouragan project

Author: Arbenz
Boisvert
C Gomez
D Lazure
E Caron
E Fleury
E Jeannot
F Desprez
F Lombard
F Rubi
F Suter
G Utard
Howes
J Roman
J.-M Nicod
L Philippe
M Goursat
M Quinson
P Ramet
S Chaumette
S Contassot-Vivier
S Steer
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recommended from our members

Implementing a Run-time Library for a Parallel MATLAB Compiler

Author: Malishevsky Alexey
Publication venue: Oregon State Unversity
Publication date
Field of study

Programming parallel machines has been a difficult and unrewarding task. The short lifespan of parallel machines and their incompatibility have made it difficult to utilize them. Our goal here is to create an environment for parallel computing which allows users to take advantage of parallel computers without writing parallel programs. MATLAB is accepted everywhere as a standard of computing, being very powerful, portable, and available on many computers. Its slow speed limits its utility as a language for manipulating large data sets. However, the emergence of publicly available parallel libraries like ScaLAPACK has made it possible to implement a portable MATLAB compiler targeted for parallel machines. My project was the implementation of the run-time support for such a compiler. The system implements a subset of MATLAB functionality. Benchmarking ten MATLAB scripts reveals performance gains over the MATLAB interpreter. In addition, I have studied various data distributions and their effect on parallel performance. This report describes the parallel MATLAB compiler, the run-time library, and its implementation. It presents performance data collected on a variety of parallel platforms as well as the effect of choice of the data distributions on performance

ScholarsArchive@OSU

Recommended from our members

Translating MATLAB scripts into SPMD-style parallel programs

Author: Seelam Nagajagadeswar Reddy
Publication venue: 'Oregon State University'
Publication date
Field of study

In this report., we address the issues of translating MATLAB scripts into SPMD-style C programs. The resulting programs, when linked with our run-time library are suitable for execution on parallel computers. We describe the design of the compiler and improvements made to it in the current version. We also describe some of the problems involved in the translation and present several examples showing the translation of MATLAB code into C code. Finally, we present some performance benchmarks

ScholarsArchive@OSU

Live media production: multicast optimization and visibility for clos fabric in media data centers

Author: Latif Ammar
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2020
Field of study

Media production data centers are undergoing a major architectural shift to introduce digitization concepts to media creation and media processing workflows. Content companies such as NBC Universal, CBS/Viacom and Disney are modernizing their workflows to take advantage of the flexibility of IP and virtualization. In these new environments, multicast is utilized to provide point-to-multi-point communications. In order to build point-to-multi-point trees, Multicast has an established set of control protocols such as IGMP and PIM. The existing multicast protocols do not optimize multicast tree formation for maximizing network throughput which lead to decreased fabric utilization and decreased total number of admitted flows. In addition, existing multicast protocols are not bandwidth-aware and could cause links to over-subscribe leading to packet loss and lower video quality. TV production traffic patterns are unique due to ultra high bandwidth requirements and high sensitivity to packet loss that leads to video impairments. In such environments, operators need monitoring tools that are able to proactively monitor video flows and provide actionable alerts. Existing network monitoring tools are inadequate because they are reactive by design and perform generic monitoring of flows with no insights into video domain. The first part of this dissertation includes a design and implementation of a novel Intelligent Rendezvous Point algorithm iRP for bandwidth-aware multicast routing in media DC fabrics. iRP utilizes a controller-based architecture to optimize multicast tree formation and to increase bandwidth availability in the fabric. The system offers up to 50\% increase in fabric capacity to handle multicast flows passing through the fabric. In the second part of this dissertation, DiRP algorithm is presented. DiRP is based on a distributed decision-making approach to achieve multicast tree capacity optimization while maintaining low multicast tree setup time. DiRP algorithm is tested using commercially available data center switches. DiRP algorithm offers substantially lower path setup time compared to centralized systems while maintaining bandwidth awareness when setting up the fabric. The third part of this dissertation studies the utilization of machine learning algorithms to improve on multicast efficiency in the fabric. The work includes implementation and testing of LiRP algorithm to increase iRP\u27s fabric efficiency by implementing k-fold cross validation method to predict future multicast group memberships for time-series analysis. Testing results confirm that LiRP system increases the efficiency of iRP by up to 40\% through prediction of multicast group memberships with online arrival. In the fourth part of this dissertation, The problem of live video monitoring is studied. Existing network monitoring tools are either reactive by design or perform generic monitoring of flows with no insights into video domain. MediaFlow is a robust system for active network monitoring and reporting of video quality for thousands of flows simultaneously using a fraction of the cost of traditional monitoring solutions. MediaFlow is able to detect and report on integrity of video flows at a granularity of 100 mSec at line rate for thousands of flows. The system increases video monitoring scale by a thousand-fold compared to edge monitoring solutions

Digital Commons @ New Jersey Institute of Technology (NJIT)

Algorithmic redistribution methods for block-cyclic decompositions

Author: A.P. Petitet
J.J. Dongarra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Hardware-Software Co-Design, Acceleration and Prototyping of Control Algorithms on Reconfigurable Platforms

Author: Edosa Desta Kumsa
Publication venue: Digital Scholarship@UNLV
Publication date: 01/12/2012
Field of study

Differential equations play a significant role in many disciplines of science and engineering. Solving and implementing Ordinary Differential Equations (ODEs) and partial Differential Equations (PDEs) effectively are very essential as most complex dynamic systems are modeled based on these equations. High Performance Computing (HPC) methodologies are required to compute and implement complex and data intensive applications modeled by differential equations at higher speed. There are, however, some challenges and limitations in implementing dynamic system, modeled by non-linear ordinary differential equations, on digital hardware. Modeling an integrator involves data approximation which results in accuracy error if data values are not considered properly. Accuracy and precision are dependent on the data types defined for each block of a system and subsystems. Also, digital hardware mostly works on fixed point data which leads to some data approximations. Using Field Programmable Gate Array (FPGA), it is possible to solve ordinary differential equations (ODE) at high speed. FPGA also provides scalable, flexible and reconfigurable features. The goal of this thesis is to explore and compare implementation of control algorithms on reconfigurable logic. This thesis focuses on implementing control algorithms modeled by second and fourth order PDEs and ODEs using Xilinx System Generator (XSG) and LabVIEW FPGA module synthesis tools. Xilinx System Generator for DSP allows integration of legacy HDL code, embedded IP cores, MATLAB functions, and hardware components targeted for Xilinx FPGAs to create complete system models that can be simulated and synthesized within the Simulink environment. The National Instruments (NI) LabVIEW FPGA Module extends LabVIEW graphical development to Field-Programmable Gate Arrays (FPGAs) on NI Reconfigurable I/O hardware. This thesis also focuses on efficient implementation and performance comparison of these implementations. Optimization of area, latency and power has also been explored during implementation and comparison results are discussed

University of Nevada, Las Vegas Repository

Esqueletos paralelos para la técnica de ramificación y acotación

Author: Dorta González María Isabel
Publication venue: Universidad de La Laguna, Servicio de Publicaciones
Publication date: 01/01/2004
Field of study

En un gran número de problemas combinatorios, el tiempo empleado para obtener una solución usando un computador secuencial es muy alto. Una forma de solventar este inconveniente consiste en utilizar la computación paralela. En un computador paralelo, varios procesadores colaboran para resolver simultáneamente un problema en una fracción del tiemp requerido por un sólo procesador. Entre los componentes claves necesarios para que sea posible la aplicación de la computación paralela están la arquitectura, el sistema operativo, los compiladores de lenguajes de programación, y, el más importante de todos, el algoritmo paralelo. Ningún problema se puede resolver en paralelo sin un algoritmo paralelo, puesto que los algoritmos paralelos son el núcleo de la computación paralela. El objetivo de la memoria de tesis doctoral era el desarrollo de una metodología de trabajo para abordar la resolución de problemas de optimización combinatoria mediante la técnica de Ramificación y Acotación utilizando paralelismo. Partiendo de casos concretos se generalizó una forma de trabajar que dio lugar a la resolución de problemas diversos. Para ello, se utilizó el concepto de esqueleto presentado por Murray Cole en 1987

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de La Laguna

Third CLIPS Conference Proceedings, volume 2

Author: Riley Gary
Publication venue
Publication date
Field of study

Expert systems are computer programs which emulate human expertise in well defined problem domains. The C Language Integrated Production System (CLIPS) is an expert system building tool, developed at the Johnson Space Center, which provides a complete environment for the development and delivery of rule and/or object based expert systems. CLIPS was specifically designed to provide a low cost option for developing and deploying expert system applications across a wide range of hardware platforms. The development of CLIPS has helped to improve the ability to deliver expert system technology throughout the public and private sectors for a wide range of applications and diverse computing environments. The Third Conference on CLIPS provided a forum for CLIPS users to present and discuss papers relating to CLIPS applications, uses, and extensions

NASA Technical Reports Server