Search CORE

6,923 research outputs found

Speculative Approximations for Terascale Analytics

Author: Qin Chengjie
Rusu Florin
Publication venue
Publication date: 31/12/2014
Field of study

Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. The number of configurations is determined adaptively at runtime, while the configurations themselves are extracted from a distribution that is continuously learned following a Bayesian process. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascale-size synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a

1/20^{\text{th}}

fraction of the time

arXiv.org e-Print Archive

eScholarship - University of California

Precise Modelling of Compensating Business Transactions and its Application to BPEL

Author: Butler Michael
Ferreira Carla
Ng Muan Yong
Publication venue
Publication date: 01/01/2005
Field of study

We describe the StAC language which can be used to specify the orchestration of activities in long running business transactions. Long running business transactions use compensation to cope with exceptions. StAC supports sequential and parallel behaviour as well as exception and compensation handling. We also show how the B notation may be combined with StAC to specify the data aspects of transactions. The combination of StAC and B provides a rich formal notation which allows for succinct and precise specification of business transactions. BPEL is an industry standard language for specifying business transactions and includes compensation constructs. We show how a substantial subset of BPEL can be mapped to StAC thus demonstrating the expressiveness of StAC and providing a formal semantics for BPEL

CiteSeerX

Southampton (e-Prints Soton)

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

Author: Amela Ramon
Badia Rosa M.
Clauss Philippe
Ejarque Jorge
Ramon-Cortes Cristian
Publication venue
Publication date: 26/10/2018
Field of study

The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and Scientific Computing (PyHPC 2018

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A speculative execution approach to provide semantically aware contention management for concurrent systems

Author: Sharp Craig
Publication venue: Newcastle University
Publication date: 01/01/2013
Field of study

PhD ThesisMost modern platforms offer ample potention for parallel execution of concurrent programs yet concurrency control is required to exploit parallelism while maintaining program correctness. Pessimistic con- currency control featuring blocking synchronization and mutual ex- clusion, has given way to transactional memory, which allows the composition of concurrent code in a manner more intuitive for the application programmer. An important component in any transactional memory technique however is the policy for resolving conflicts on shared data, commonly referred to as the contention management policy. In this thesis, a Universal Construction is described which provides contention management for software transactional memory. The technique differs from existing approaches given that multiple execution paths are explored speculatively and in parallel. In the resolution of conflicts by state space exploration, we demonstrate that both concur- rent conflicts and semantic conflicts can be solved, promoting multi- threaded program progression. We de ne a model of computation called Many Systems, which defines the execution of concurrent threads as a state space management problem. An implementation is then presented based on concepts from the model, and we extend the implementation to incorporate nested transactions. Results are provided which compare the performance of our approach with an established contention management policy, under varying degrees of concurrent and semantic conflicts. Finally, we provide performance results from a number of search strategies, when nested transactions are introduced

Newcastle University eTheses

Rethinking affordance

Author: Scarlett Ashley
Zeilinger Martin
Publication venue
Publication date: 23/08/2019
Field of study

n/a – Critical survey essay retheorising the concept of 'affordance' in digital media context. Lead article in a special issue on the topic, co-edited by the authors for the journal Media Theory

Abertay Research Portal

Anglia Ruskin Research

Probabilistic data flow analysis: a linear equational approach

Author: Di Pierro Alessandra
Wiklicky Herbert
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2013
Field of study

Speculative optimisation relies on the estimation of the probabilities that certain properties of the control flow are fulfilled. Concrete or estimated branch probabilities can be used for searching and constructing advantageous speculative and bookkeeping transformations. We present a probabilistic extension of the classical equational approach to data-flow analysis that can be used to this purpose. More precisely, we show how the probabilistic information introduced in a control flow graph by branch prediction can be used to extract a system of linear equations from a program and present a method for calculating correct (numerical) solutions.Comment: In Proceedings GandALF 2013, arXiv:1307.416

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Catalogo dei prodotti della ricerca

Spiral - Imperial College Digital Repository

Open Access Repository

A Cost-based Optimizer for Gradient Descent Optimization

Author: Abadi M.
Agrawal D.
Ben-David S.
Bottou L.
Bousquet O.
Johnson R.
Kraska T.
Liu J.
Recht B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/03/2017
Field of study

As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201

arXiv.org e-Print Archive

Crossref

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

The Eureka Programming Model for Speculative Task Parallelism

Author: Imam Shams
Sarkar Vivek
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th European Conference on Object-Oriented Programming (ECOOP 2015)
Publication date: 01/01/2015
Field of study

In this paper, we describe the Eureka Programming Model (EuPM) that simplifies the expression of speculative parallel tasks, and is especially well suited for parallel search and optimization applications. The focus of this work is to provide a clean semantics for, and efficiently support, such "eureka-style" computations (EuSCs) in general structured task parallel programming models. In EuSCs, a eureka event is a point in a program that announces that a result has been found. A eureka triggered by a speculative task can cause a group of related speculative tasks to become redundant, and enable them to be terminated at well-defined program points. Our approach provides a bound on the additional work done in redundant speculative tasks after such a eureka event occurs. We identify various patterns that are supported by our eureka construct, which include search, optimization, convergence, and soft real-time deadlines. These different patterns of computations can also be safely combined or nested in the EuPM, along with regular task-parallel constructs, thereby enabling high degrees of composability and reusability. As demonstrated by our implementation, the EuPM can also be implemented efficiently. We use a cooperative runtime that uses delimited continuations to manage the termination of redundant tasks and their synchronization at join points. In contrast to current approaches, EuPM obviates the need for cumbersome manual refactoring by the programmer that may (for example) require the insertion of if checks and early return statements in every method in the call chain. Experimental results show that solutions using the EuPM simplify programmability, achieve performance comparable to hand-coded speculative task-based solutions and out-perform non-speculative task-based solutions

Dagstuhl Research Online Publication Server