5,060 research outputs found
Scalable parallel communications
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups
Octopus - an energy-efficient architecture for wireless multimedia systems
Multimedia computing and mobile computing are two trends that will lead to a new application domain in the near future. However, the technological challenges to establishing this paradigm of computing are non-trivial. Personal mobile computing offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The approach we made to achieve such a system is to use autonomous, adaptable modules, interconnected by a switch rather than by a bus, and to offload as much as work as possible from the CPU to programmable modules that is placed in the data streams. A reconfigurable internal communication network switch called Octopus exploits locality of reference and eliminates wasteful data copies
The Design of a System Architecture for Mobile Multimedia Computers
This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies
Operating-system support for distributed multimedia
Multimedia applications place new demands upon processors, networks and operating systems. While some network designers, through ATM for example, have considered revolutionary approaches to supporting multimedia, the same cannot be said for operating systems designers. Most work is evolutionary in nature, attempting to identify additional features that can be added to existing systems to support multimedia. Here we describe the Pegasus project's attempt to build an integrated hardware and operating system environment from\ud
the ground up specifically targeted towards multimedia
Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System Performance.
The use of a network of shared, heterogeneous workstations each harboring a Reconfigurable Computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the systemâs performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of Reconfigurable Computing systems. This dissertation develops and validates an analytic performance modeling methodology for a class of fork-join algorithms executing on a High Performance Reconfigurable Computing (HPRC) platform. The model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. Three fork-join class of applications, a Boolean Satisfiability Solver, a Matrix-Vector Multiplication algorithm, and an Advanced Encryption Standard algorithm are used to validate the model with homogeneous and simulated heterogeneous workstations. A synthetic load is used to validate the model under various loading conditions including simulating heterogeneity by making some workstations appear slower than others by the use of background loading. The performance modeling methodology proves to be accurate in characterizing the effects of reconfigurable devices, application load imbalance, background user load and heterogeneity for applications running on shared, homogeneous and heterogeneous HPRC resources. The model error in all cases was found to be less than five percent for application runtimes greater than thirty seconds and less than fifteen percent for runtimes less than thirty seconds. The performance modeling methodology enables us to characterize applications running on shared HPRC resources. Cost functions are used to impose system usage policies and the results of vii the modeling methodology are utilized to find the optimal (or near-optimal) set of workstations to use for a given application. The usage policies investigated include determining the computational costs for the workstations and balancing the priority of the background user load with the parallel application. The applications studied fall within the Master-Worker paradigm and are well suited for a grid computing approach. A method for using NetSolve, a grid middleware, with the model and cost functions is introduced whereby users can produce optimal workstation sets and schedules for Master-Worker applications running on shared HPRC resources
Multilevel Parallel Communications
The research reported in this thesis investigates the use of parallelism at multiple levels to realize high-speed networks that offer advantages in throughput, cost, reliability, and flexibility over alternative approaches. This research specifically considers use of parallelism at two levels: the upper level and the lower level. At the upper level, N protocol processors perform functions included in the transport and network layers. At the lower level, M channels provide data and physical layer functions. The resulting system provides very high bandwidth to an application. A key concept of this research is the use of replicated channels to provide a single, high bandwidth channel to a single application. The parallelism provided by the network is transparent to communicating applications, thus differentiating this strategy from schemes that provide a collection of disjoint channels between applications on different nodes. Another innovative aspect of this research is that parallelism is exploited at multiple layers of the network to provide high throughput not only at the physical layer, but also at upper protocol layers. Schedulers are used to distribute data from a single stream to multiple channels and to merge data from multiple channels to reconstruct a single coherent stream. High throughput is possible by providing the combined bandwidth of multiple channels to a single source and destination through use of parallelism at multiple protocol layers. This strategy is cost effective since systems can be built using standard technologies that benefit from the economies of a broad applications base. The exotic and revolutionary components needed in non-parallel approaches to build high speed networks are not required. The replicated channels can be used to achieve high reliability as well. Multilevel parallelism is flexible since the degree of parallelism provided at any level can be matched to protocol processing demands and application requirements
Coordinating complex problem-solving among distributed intelligent agents
A process-oriented control model is described for distributed problem solving. The model coordinates the transfer and manipulation of information across independent networked applications, both intelligent and conventional. The model was implemented using SOCIAL, a set of object-oriented tools for distributing computing. Complex sequences of distributed tasks are specified in terms of high level scripts. Scripts are executed by SOCIAL objects called Manager Agents, which realize an intelligent coordination model that routes individual tasks to suitable server applications across the network. These tools are illustrated in a prototype distributed system for decision support of ground operations for NASA's Space Shuttle fleet
Operating Systems Support for End-to-End Gbps Networking
This paper argues that workstation host interfaces and operating systems are a crucial element in achieving end-to-end Gbps bandwidths for applications in distributed environments. We describe several host interface architectures, discuss the interaction between the interface and host operating system, and report on an ATM host interface built at the University of Pennsylvania. Concurrently designing a host interface and software support allows careful balancing of hardware and software functions. Key ideas include use of buffer management techniques to reduce copying and scheduling data transfers using clocked interrupts. Clocked interrupts also aid with bandwidth allocation. Our interface can deliver a sustained 130 Mbps bandwidth to applications, roughly OC-3c link speed. Ninety-three percent of the host hardware subsystem throughput is delivered to the application with a small measured impact on other applications processing
- âŚ