193 research outputs found
On data skewness, stragglers, and MapReduce progress indicators
We tackle the problem of predicting the performance of MapReduce
applications, designing accurate progress indicators that keep programmers
informed on the percentage of completed computation time during the execution
of a job. Through extensive experiments, we show that state-of-the-art progress
indicators (including the one provided by Hadoop) can be seriously harmed by
data skewness, load unbalancing, and straggling tasks. This is mainly due to
their implicit assumption that the running time depends linearly on the input
size. We thus design a novel profile-guided progress indicator, called
NearestFit, that operates without the linear hypothesis assumption and exploits
a careful combination of nearest neighbor regression and statistical curve
fitting techniques. Our theoretical progress model requires fine-grained
profile data, that can be very difficult to manage in practice. To overcome
this issue, we resort to computing accurate approximations for some of the
quantities used in our model through space- and time-efficient data streaming
algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive
empirical assessment over the Amazon EC2 platform on a variety of real-world
benchmarks shows that NearestFit is practical w.r.t. space and time overheads
and that its accuracy is generally very good, even in scenarios where
competitors incur non-negligible errors and wide prediction fluctuations.
Overall, NearestFit significantly improves the current state-of-art on progress
analysis for MapReduce
Python as a Federation Tool for GENESIS 3.0
The GENESIS simulation platform was one of the first broad-scale modeling systems in computational biology to encourage modelers to develop and share model features and components. Supported by a large developer community, it participated in innovative simulator technologies such as benchmarking, parallelization, and declarative model specification and was the first neural simulator to define bindings for the Python scripting language. An important feature of the latest version of GENESIS is that it decomposes into self-contained software components complying with the Computational Biology Initiative federated software architecture. This architecture allows separate scripting bindings to be defined for different necessary components of the simulator, e.g., the mathematical solvers and graphical user interface. Python is a scripting language that provides rich sets of freely available open source libraries. With clean dynamic object-oriented designs, they produce highly readable code and are widely employed in specialized areas of software component integration. We employ a simplified wrapper and interface generator to examine an application programming interface and make it available to a given scripting language. This allows independent software components to be ‘glued’ together and connected to external libraries and applications from user-defined Python or Perl scripts. We illustrate our approach with three examples of Python scripting. (1) Generate and run a simple single-compartment model neuron connected to a stand-alone mathematical solver. (2) Interface a mathematical solver with GENESIS 3.0 to explore a neuron morphology from either an interactive command-line or graphical user interface. (3) Apply scripting bindings to connect the GENESIS 3.0 simulator to external graphical libraries and an open source three dimensional content creation suite that supports visualization of models based on electron microscopy and their conversion to computational models. Employed in this way, the stand-alone software components of the GENESIS 3.0 simulator provide a framework for progressive federated software development in computational neuroscience
Tracking studies of the Compact Linear Collider collimation system
A collimation system performance study includes several types of computations performed by different codes. Optics calculations are performed with codes such as MADX, tracking studies including additional effects such as wakefields, halo and tail generation, and dynamical machine alignment are done with codes such as PLACET, and energy deposition can be studied with BDSIM. More detailed studies of hadron production in the beam halo interaction with collimators are better performed with GEANT4 and FLUKA. A procedure has been developed that allows one to perform a single tracking study using several codes simultaneously. In this paper we study the performance of the Compact Linear Collider collimation system using such a procedure
Distributed Operating Systems
Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways. This paper is intended as an introduction to distributed operating systems, and especially to current university research about them. After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed. Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden. © 1985, ACM. All rights reserved
- …