12,650 research outputs found
Computing in the RAIN: a reliable array of independent nodes
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology
The Greedy Dirichlet Process Filter - An Online Clustering Multi-Target Tracker
Reliable collision avoidance is one of the main requirements for autonomous
driving. Hence, it is important to correctly estimate the states of an unknown
number of static and dynamic objects in real-time. Here, data association is a
major challenge for every multi-target tracker. We propose a novel multi-target
tracker called Greedy Dirichlet Process Filter (GDPF) based on the
non-parametric Bayesian model called Dirichlet Processes and the fast posterior
computation algorithm Sequential Updating and Greedy Search (SUGS). By adding a
temporal dependence we get a real-time capable tracking framework without the
need of a previous clustering or data association step. Real-world tests show
that GDPF outperforms other multi-target tracker in terms of accuracy and
stability
Randomized Assignment of Jobs to Servers in Heterogeneous Clusters of Shared Servers for Low Delay
We consider the job assignment problem in a multi-server system consisting of
parallel processor sharing servers, categorized into ()
different types according to their processing capacity or speed. Jobs of random
sizes arrive at the system according to a Poisson process with rate . Upon each arrival, a small number of servers from each type is
sampled uniformly at random. The job is then assigned to one of the sampled
servers based on a selection rule. We propose two schemes, each corresponding
to a specific selection rule that aims at reducing the mean sojourn time of
jobs in the system.
We first show that both methods achieve the maximal stability region. We then
analyze the system operating under the proposed schemes as which
corresponds to the mean field. Our results show that asymptotic independence
among servers holds even when is finite and exchangeability holds only
within servers of the same type. We further establish the existence and
uniqueness of stationary solution of the mean field and show that the tail
distribution of server occupancy decays doubly exponentially for each server
type. When the estimates of arrival rates are not available, the proposed
schemes offer simpler alternatives to achieving lower mean sojourn time of
jobs, as shown by our numerical studies
A Parallel Adaptive P3M code with Hierarchical Particle Reordering
We discuss the design and implementation of HYDRA_OMP a parallel
implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M)
code HYDRA. The code is designed primarily for conducting cosmological
hydrodynamic simulations and is written in Fortran77+OpenMP. A number of
optimizations for RISC processors and SMP-NUMA architectures have been
implemented, the most important optimization being hierarchical reordering of
particles within chaining cells, which greatly improves data locality thereby
removing the cache misses typically associated with linked lists. Parallel
scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes
for a variety of modern SMP architectures. We give performance data in terms of
the number of particle updates per second, which is a more useful performance
metric than raw MFlops. A basic version of the code will be made available to
the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics
Communication
- …