1,562 research outputs found
On-Line End-to-End Congestion Control
Congestion control in the current Internet is accomplished mainly by TCP/IP.
To understand the macroscopic network behavior that results from TCP/IP and
similar end-to-end protocols, one main analytic technique is to show that the
the protocol maximizes some global objective function of the network traffic.
Here we analyze a particular end-to-end, MIMD (multiplicative-increase,
multiplicative-decrease) protocol. We show that if all users of the network use
the protocol, and all connections last for at least logarithmically many
rounds, then the total weighted throughput (value of all packets received) is
near the maximum possible. Our analysis includes round-trip-times, and (in
contrast to most previous analyses) gives explicit convergence rates, allows
connections to start and stop, and allows capacities to change.Comment: Proceedings IEEE Symp. Foundations of Computer Science, 200
On the approximability of the maximum induced matching problem
In this paper we consider the approximability of the maximum induced matching problem (MIM). We give an approximation algorithm with asymptotic performance ratio <i>d</i>-1 for MIM in <i>d</i>-regular graphs, for each <i>d</i>≥3. We also prove that MIM is APX-complete in <i>d</i>-regular graphs, for each <i>d</i>≥3
Turbomachinery CFD on parallel computers
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations
Dynamically allocating sets of fine-grained processors to running computations
Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined
An Efficient Transport Protocol for delivery of Multimedia An Efficient Transport Protocol for delivery of Multimedia Content in Wireless Grids
A grid computing system is designed for solving complicated scientific and
commercial problems effectively,whereas mobile computing is a traditional
distributed system having computing capability with mobility and adopting
wireless communications. Media and Entertainment fields can take advantage from
both paradigms by applying its usage in gaming applications and multimedia data
management. Multimedia data has to be stored and retrieved in an efficient and
effective manner to put it in use. In this paper, we proposed an application
layer protocol for delivery of multimedia data in wireless girds i.e.
multimedia grid protocol (MMGP). To make streaming efficient a new video
compression algorithm called dWave is designed and embedded in the proposed
protocol. This protocol will provide faster, reliable access and render an
imperceptible QoS in delivering multimedia in wireless grid environment and
tackles the challenging issues such as i) intermittent connectivity, ii) device
heterogeneity, iii) weak security and iv) device mobility.Comment: 20 pages, 15 figures, Peer Reviewed Journa
Probabilistic structural mechanics research for parallel processing computers
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical
The Impact of Parallel Processing on Operating Systems
The base entity in computer programming is the process or task. The parallelism can be achieved by executing multiple processes on different processors. Distributed systems are managed by distributed operating systems that represent the extension for multiprocessor architectures of multitasking and multiprogramming operating systems.
Hardware Barrier Synchronization: Static Barrier MIMD (SBM)
In this paper, we give the design, and performance analysis, of a new, highly efficient, synchronization mechanism called “Static Barrier MIMD” or “SBM.” Unlike traditional barrier synchronization, the proposed barriers are designed to facilitate the use of static (compile-time) code scheduling for eliminating some synchronizations. For this reason, our barrier hardware is more general than most hardware barrier mechanisms, allowing any subset of the processors to participate in each barrier. Since code scheduling typically operates on fine-grain parallelism, it is also vital that barriers be able to execute in a small number of clock ticks. The SBM is actually only one of two new classes of barrier machines proposed to facilitate static code scheduling; the other architecture is the “Dynamic Barrier MIMD,” or “DBM,” which is described in a companion paper1. The DBM differs from the SBM in that the DBM employs more complex hardware to make the system less dependent on the precision of the static analysis and code scheduling; for example, an SBM cannot efficiently manage simultaneous execution of independent parallel programs, whereas a DBM can
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
- …