5,602 research outputs found
Revisiting Matrix Product on Master-Worker Platforms
This paper is aimed at designing efficient parallel matrix-product algorithms
for heterogeneous master-worker platforms. While matrix-product is
well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm
and ScaLAPACK outer product algorithm), there are three key hypotheses that
render our work original and innovative:
- Centralized data. We assume that all matrix files originate from, and must
be returned to, the master.
- Heterogeneous star-shaped platforms. We target fully heterogeneous
platforms, where computational resources have different computing powers.
- Limited memory. Because we investigate the parallelization of large
problems, we cannot assume that full matrix panels can be stored in the worker
memories and re-used for subsequent updates (as in ScaLAPACK).
We have devised efficient algorithms for resource selection (deciding which
workers to enroll) and communication ordering (both for input and result
messages), and we report a set of numerical experiments on various platforms at
Ecole Normale Superieure de Lyon and the University of Tennessee. However, we
point out that in this first version of the report, experiments are limited to
homogeneous platforms
Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments
Scientific applications are complex, large, and often exhibit irregular and
stochastic behavior. The use of efficient loop scheduling techniques in
computationally-intensive applications is crucial for improving their
performance on high-performance computing (HPC) platforms. A number of dynamic
loop scheduling (DLS) techniques have been proposed between the late 1980s and
early 2000s, and efficiently used in scientific applications. In most cases,
the computing systems on which they have been tested and validated are no
longer available. This work is concerned with the minimization of the sources
of uncertainty in the implementation of DLS techniques to avoid unnecessary
influences on the performance of scientific applications. Therefore, it is
important to ensure that the DLS techniques employed in scientific applications
today adhere to their original design goals and specifications. The goal of
this work is to attain and increase the trust in the implementation of DLS
techniques in present studies. To achieve this goal, the performance of a
selection of scheduling experiments from the 1992 original work that introduced
factoring is reproduced and predicted via both, simulative and native
experimentation. The experiments show that the simulation reproduces the
performance achieved on the past computing platform and accurately predicts the
performance achieved on the present computing platform. The performance
reproduction and prediction confirm that the present implementation of the DLS
techniques considered both, in simulation and natively, adheres to their
original description. The results confirm the hypothesis that reproducing
experiments of identical scheduling scenarios on past and modern hardware leads
to an entirely different behavior from expected
On data skewness, stragglers, and MapReduce progress indicators
We tackle the problem of predicting the performance of MapReduce
applications, designing accurate progress indicators that keep programmers
informed on the percentage of completed computation time during the execution
of a job. Through extensive experiments, we show that state-of-the-art progress
indicators (including the one provided by Hadoop) can be seriously harmed by
data skewness, load unbalancing, and straggling tasks. This is mainly due to
their implicit assumption that the running time depends linearly on the input
size. We thus design a novel profile-guided progress indicator, called
NearestFit, that operates without the linear hypothesis assumption and exploits
a careful combination of nearest neighbor regression and statistical curve
fitting techniques. Our theoretical progress model requires fine-grained
profile data, that can be very difficult to manage in practice. To overcome
this issue, we resort to computing accurate approximations for some of the
quantities used in our model through space- and time-efficient data streaming
algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive
empirical assessment over the Amazon EC2 platform on a variety of real-world
benchmarks shows that NearestFit is practical w.r.t. space and time overheads
and that its accuracy is generally very good, even in scenarios where
competitors incur non-negligible errors and wide prediction fluctuations.
Overall, NearestFit significantly improves the current state-of-art on progress
analysis for MapReduce
Surveillance and alienation in the online economy
The critical literature on commercial monitoring and so-called ‘free labour’ (Terranova 2000) locates exploitation in realms beyond the workplace proper, noting the productivity of networked activity including the creation of user-generated-content and the profitability of commercial sites for social networking and communication. The changing context of productivity in these realms, however, requires further development of a critical concept of exploitation. This article defines exploitation as the extraction of unpaid, coerced, and alienated labour. It considers how such a definition might apply to various forms of unpaid but profit-generating online activity, arguing that commercial monitoring redoubles the conscious, intentional activity of users in ways that render it amenable to a critique of exploitation. Given the role of commercial monitoring in the emerging online economy, the paper emphasizes the importance of supplementing privacy critiques with approaches that identify the ways in which new forms of surveillance represent a form of power that seeks to manage and control consumer behaviour
Recommended from our members
A grid computing framework for commercial simulation packages
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.An increased need for collaborative research among different organizations, together with continuing advances in communication technology and computer hardware, has facilitated the development of distributed systems that can provide users non-trivial access to geographically dispersed computing resources (processors, storage, applications, data, instruments, etc.) that are administered in multiple computer domains. The term grid computing or grids is popularly used to refer to such distributed systems. A broader definition of grid computing includes the use of computing resources within an organization for running organization-specific applications. This research is in the context of using grid computing within an enterprise to maximize the use of available hardware and software resources for processing enterprise applications. Large scale scientific simulations have traditionally been the primary benefactor of grid computing. The application of this technology to simulation in industry has, however, been negligible. This research investigates how grid technology can be effectively exploited by simulation practitioners using Windows-based commercially available simulation packages to model simulations in industry. These packages are commonly referred to as Commercial Off-The-Shelf (COTS) Simulation Packages (CSPs). The study identifies several higher level grid services that could be potentially used to support the practise of simulation in industry. It proposes a grid computing framework to investigate these services in the context of CSP-based simulations. This framework is called the CSP-Grid Computing (CSP-GC) Framework. Each identified higher level grid service in this framework is referred to as a CSP-specific service. A total of six case studies are presented to experimentally evaluate how grid computing technologies can be used together with unmodified simulation packages to support some of the CSP-specific services. The contribution of this thesis is the CSP-GC framework that identifies how simulation practise in industry may benefit from the use of grid technology. A further contribution is the recognition of specific grid computing software (grid middleware) that can possibly be used together with existing CSPs to provide grid support. With its focus on end-users and end-user tools, it is intended that this research will encourage wider adoption of grid computing in the workplace and that simulation users will derive benefit from using this technology
- …