35 research outputs found

    The Multicomputer Toolbox - First-Generation Scalable Libraries

    Get PDF
    First-generation scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture Makefile mechanism for building applications. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zipcodemessage passing system has enabled process-grid abstractions of multicomputers, communication contexts, and process groups, all characteristics needed for building scalable libraries, and scalable application software. We describe the data-distribution-independent approach to building scalable libraries, which is needed so that applications do not unnecessarily have to redistribute data at high expense. We discuss the strategy used for implementing data-distribution mappings. We also describe high-level message-passing constructs used to achieve flexibility in transmission of data structures (Zipcode invoices). We expect Zipcode and MPI message-passing interfaces (which will incorporate many features from Zipcode, mentioned above) to co-exist in the future. We discuss progress thus far in achieving uniform interfaces for different algorithms for the same operation, which are needed to create poly-algorithms. Poly-algorithms are needed to widen the potential for scalability; uniform interfaces make simpler the testing of alternative methods with an application (whether for parallelism or for convergence, or both). We indicate that data-distribution-independent algorithms are sometimes more efficient than fixed-data-distribution counterparts, because redistribution of data can be avoided, and that this question is strongly application dependent

    Live media production: multicast optimization and visibility for clos fabric in media data centers

    Get PDF
    Media production data centers are undergoing a major architectural shift to introduce digitization concepts to media creation and media processing workflows. Content companies such as NBC Universal, CBS/Viacom and Disney are modernizing their workflows to take advantage of the flexibility of IP and virtualization. In these new environments, multicast is utilized to provide point-to-multi-point communications. In order to build point-to-multi-point trees, Multicast has an established set of control protocols such as IGMP and PIM. The existing multicast protocols do not optimize multicast tree formation for maximizing network throughput which lead to decreased fabric utilization and decreased total number of admitted flows. In addition, existing multicast protocols are not bandwidth-aware and could cause links to over-subscribe leading to packet loss and lower video quality. TV production traffic patterns are unique due to ultra high bandwidth requirements and high sensitivity to packet loss that leads to video impairments. In such environments, operators need monitoring tools that are able to proactively monitor video flows and provide actionable alerts. Existing network monitoring tools are inadequate because they are reactive by design and perform generic monitoring of flows with no insights into video domain. The first part of this dissertation includes a design and implementation of a novel Intelligent Rendezvous Point algorithm iRP for bandwidth-aware multicast routing in media DC fabrics. iRP utilizes a controller-based architecture to optimize multicast tree formation and to increase bandwidth availability in the fabric. The system offers up to 50\% increase in fabric capacity to handle multicast flows passing through the fabric. In the second part of this dissertation, DiRP algorithm is presented. DiRP is based on a distributed decision-making approach to achieve multicast tree capacity optimization while maintaining low multicast tree setup time. DiRP algorithm is tested using commercially available data center switches. DiRP algorithm offers substantially lower path setup time compared to centralized systems while maintaining bandwidth awareness when setting up the fabric. The third part of this dissertation studies the utilization of machine learning algorithms to improve on multicast efficiency in the fabric. The work includes implementation and testing of LiRP algorithm to increase iRP\u27s fabric efficiency by implementing k-fold cross validation method to predict future multicast group memberships for time-series analysis. Testing results confirm that LiRP system increases the efficiency of iRP by up to 40\% through prediction of multicast group memberships with online arrival. In the fourth part of this dissertation, The problem of live video monitoring is studied. Existing network monitoring tools are either reactive by design or perform generic monitoring of flows with no insights into video domain. MediaFlow is a robust system for active network monitoring and reporting of video quality for thousands of flows simultaneously using a fraction of the cost of traditional monitoring solutions. MediaFlow is able to detect and report on integrity of video flows at a granularity of 100 mSec at line rate for thousands of flows. The system increases video monitoring scale by a thousand-fold compared to edge monitoring solutions

    Algorithmic redistribution methods for block-cyclic decompositions

    Full text link

    Hardware-Software Co-Design, Acceleration and Prototyping of Control Algorithms on Reconfigurable Platforms

    Full text link
    Differential equations play a significant role in many disciplines of science and engineering. Solving and implementing Ordinary Differential Equations (ODEs) and partial Differential Equations (PDEs) effectively are very essential as most complex dynamic systems are modeled based on these equations. High Performance Computing (HPC) methodologies are required to compute and implement complex and data intensive applications modeled by differential equations at higher speed. There are, however, some challenges and limitations in implementing dynamic system, modeled by non-linear ordinary differential equations, on digital hardware. Modeling an integrator involves data approximation which results in accuracy error if data values are not considered properly. Accuracy and precision are dependent on the data types defined for each block of a system and subsystems. Also, digital hardware mostly works on fixed point data which leads to some data approximations. Using Field Programmable Gate Array (FPGA), it is possible to solve ordinary differential equations (ODE) at high speed. FPGA also provides scalable, flexible and reconfigurable features. The goal of this thesis is to explore and compare implementation of control algorithms on reconfigurable logic. This thesis focuses on implementing control algorithms modeled by second and fourth order PDEs and ODEs using Xilinx System Generator (XSG) and LabVIEW FPGA module synthesis tools. Xilinx System Generator for DSP allows integration of legacy HDL code, embedded IP cores, MATLAB functions, and hardware components targeted for Xilinx FPGAs to create complete system models that can be simulated and synthesized within the Simulink environment. The National Instruments (NI) LabVIEW FPGA Module extends LabVIEW graphical development to Field-Programmable Gate Arrays (FPGAs) on NI Reconfigurable I/O hardware. This thesis also focuses on efficient implementation and performance comparison of these implementations. Optimization of area, latency and power has also been explored during implementation and comparison results are discussed

    Esqueletos paralelos para la t茅cnica de ramificaci贸n y acotaci贸n

    Get PDF
    En un gran n煤mero de problemas combinatorios, el tiempo empleado para obtener una soluci贸n usando un computador secuencial es muy alto. Una forma de solventar este inconveniente consiste en utilizar la computaci贸n paralela. En un computador paralelo, varios procesadores colaboran para resolver simult谩neamente un problema en una fracci贸n del tiemp requerido por un s贸lo procesador. Entre los componentes claves necesarios para que sea posible la aplicaci贸n de la computaci贸n paralela est谩n la arquitectura, el sistema operativo, los compiladores de lenguajes de programaci贸n, y, el m谩s importante de todos, el algoritmo paralelo. Ning煤n problema se puede resolver en paralelo sin un algoritmo paralelo, puesto que los algoritmos paralelos son el n煤cleo de la computaci贸n paralela. El objetivo de la memoria de tesis doctoral era el desarrollo de una metodolog铆a de trabajo para abordar la resoluci贸n de problemas de optimizaci贸n combinatoria mediante la t茅cnica de Ramificaci贸n y Acotaci贸n utilizando paralelismo. Partiendo de casos concretos se generaliz贸 una forma de trabajar que dio lugar a la resoluci贸n de problemas diversos. Para ello, se utiliz贸 el concepto de esqueleto presentado por Murray Cole en 1987

    Third CLIPS Conference Proceedings, volume 2

    Get PDF
    Expert systems are computer programs which emulate human expertise in well defined problem domains. The C Language Integrated Production System (CLIPS) is an expert system building tool, developed at the Johnson Space Center, which provides a complete environment for the development and delivery of rule and/or object based expert systems. CLIPS was specifically designed to provide a low cost option for developing and deploying expert system applications across a wide range of hardware platforms. The development of CLIPS has helped to improve the ability to deliver expert system technology throughout the public and private sectors for a wide range of applications and diverse computing environments. The Third Conference on CLIPS provided a forum for CLIPS users to present and discuss papers relating to CLIPS applications, uses, and extensions
    corecore