Search CORE

211 research outputs found

Recommended from our members

Parallelisation of greedy algorithms for compressive sensing reconstruction

Author: Turner David William
Publication venue: University of Cambridge
Publication date: 25/07/2019
Field of study

Compressive Sensing (CS) is a technique which allows a signal to be compressed at the same time as it is captured. The process of capturing and simultaneously compressing the signal is represented as linear sampling, which can encompass a variety of physical processes or signal processing. Instead of explicitly identifying redundancies in the source signal, CS relies on the property of sparsity in order to reconstruct the compressed signal. While linear sampling is much less burdensome than conventional compression, this is more than made up for by the high computational cost of reconstructing a signal which has been captured using CS. Even when using some of the fastest reconstruction techniques, known as greedy pursuits, reconstruction of large problems can pose a significant burden, consuming a great deal of memory as well as compute time. Parallel computing is the foundation of the field of High Performance Computing (HPC). Modern supercomputers are generally composed of large clusters of standard servers, with a dedicated low-latency high-bandwidth interconnect network. On such a cluster, an appropriately written program can harness vast quantities of memory and computational power. However, in order to exploit a parallel compute resource, an algorithm usually has to be redesigned from the ground up. In this thesis I describe the development of parallel variants of two algorithms commonly used in CS reconstruction, Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP), resulting in the new distributed compute algorithms DistMP and DistOMP. I present the results from experiments showing how DistMP and DistOMP can utilise a compute cluster to solve CS problems much more quickly than a single computer could alone. Speed-up of as much as a factor of 76 is observed with DistMP when utilising 210 workers across 14 servers, compared to a single worker. Finally, I demonstrate how DistOMP can solve a problem with a 429GB equivalent sampling matrix in as little as 62 minutes using a 16-node compute cluster.Funded by an ICASE award from the Engineering and Physical Sciences Research Council, with sponsorship provided by Thales Research and Technology

Apollo (Cambridge)

XMDS2: Fast, scalable simulation of coupled stochastic partial differential equations

Author: Dennis Graham R.
Hope Joseph J.
Johnsson Mattias T.
Publication venue: 'Elsevier BV'
Publication date: 16/07/2012
Field of study

XMDS2 is a cross-platform, GPL-licensed, open source package for numerically integrating initial value problems that range from a single ordinary differential equation up to systems of coupled stochastic partial differential equations. The equations are described in a high-level XML-based script, and the package generates low-level optionally parallelised C++ code for the efficient solution of those equations. It combines the advantages of high-level simulations, namely fast and low-error development, with the speed, portability and scalability of hand-written code. XMDS2 is a complete redesign of the XMDS package, and features support for a much wider problem space while also producing faster code.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

The Australian National University

E-AMOM: An Energy-Aware Modeling and Optimization Methodology for Scientific Applications on Multicore Systems

Author: Lively Charles
Publication venue
Publication date
Field of study

Power consumption is an important constraint in achieving efficient execution on High Performance Computing Multicore Systems. As the number of cores available on a chip continues to increase, the importance of power consumption will continue to grow. In order to achieve improved performance on multicore systems scientific applications must make use of efficient methods for reducing power consumption and must further be refined to achieve reduced execution time. In this dissertation, we introduce a performance modeling framework, E-AMOM, to enable improved execution of scientific applications on parallel multicore systems with regards to a limited power budget. We develop models for each application based upon performance hardware counters. Our models utilize different performance counters for each application and for each performance component (runtime, system power consumption, CPU power consumption, and memory power consumption) that are selected via our performance-tuned principal component analysis method. Models developed through E-AMOM provide insight into the performance characteristics of each application that affect performance for each component on a parallel multicore system. Our models are more than 92% accurate across both Hybrid (MPI/OpenMP) and MPI implementations for six scientific applications. E-AMOM includes an optimization component that utilizes our models to employ run-time Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Concurrency Throttling to reduce power consumption of the scientific applications. Further, we optimize our applications based upon insights provided by the performance models to reduce runtime of the applications. Our methods and techniques are able to save up to 18% in energy consumption for Hybrid (MPI/OpenMP) and MPI scientific applications and reduce the runtime of the applications up to 11% on parallel multicore systems

Texas A&M Repository

Software Tool Evaluation Methodology

Author: Hariri Salim
Park Sung Yong
Reddy Rajashekar
Subramanyan Mahesh
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

The recent development of parallel and distributed computing software has introduced a variety of software tools that support several programming paradigms and languages. This variety of tools makes the selection of the best tool to run a given class of applications on a parallel or distributed system a non-trivial task that requires some investigation. We expect tool evaluation to receive more attention as the deployment and usage of distributed systems increases. In this paper, we present a multi-level evaluation methodology for parallel/distributed tools in which tools are evaluated from different perspectives. We apply our evaluation methodology to three message passing tools viz Express, p4, and PVM. The approach covers several important distributed systems platforms consisting of different computers (e.g., IBM-SP1, Alpha cluster, SUN workstations) interconnected by different types of networks (e.g., Ethernet, FDDI, ATM)

Syracuse University Research Facility and Collaborative Environment

Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Author: Ioannou Nikolas
Publication venue: The University of Edinburgh
Publication date: 29/11/2012
Field of study

Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Parallel programming is, thus, here to stay and programmers have to endorse it if they are to exploit such systems for their applications. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, because it offers a good trade-off between programming effort versus performance gain. Some parallel applications show limited or no scaling beyond a number of cores. Given the abundant number of cores expected in future many-cores, several cores would remain idle in such cases while execution performance stagnates. This thesis proposes using cores that do not contribute to performance improvement for running implicit fine-grain speculative threads. In particular, we present a many-core architecture and protocols that allow applications with coarse-grain explicit parallelism to further exploit implicit speculative parallelism within each thread. We show that complementing parallel programs with implicit speculative mechanisms offers significant performance improvements for a large and diverse set of parallel benchmarks. Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks. Our results show that, for a many-core comprising 128 cores supporting implicit speculative parallelism in clusters of 2 or 4 cores, performance improves on top of the highest scalability point by 44% on average for the 4-core cluster and by 31% on average for the 2-core cluster. We also show that this approach often leads to better performance and energy efficiency compared to existing alternatives such as Core Fusion and Turbo Boost. Moreover, we present a dynamic mechanism to choose the number of explicit and implicit threads, which performs within 6% of the static oracle selection of threads. To improve energy efficiency processors allow for Dynamic Voltage and Frequency Scaling (DVFS), which enables changing their performance and power consumption on-the-fly. We evaluate the amenability of the proposed explicit plus implicit threads scheme to traditional power management techniques for multithreaded applications and identify room for improvement. We thus augment prior schemes and introduce a novel multithreaded power management scheme that accounts for implicit threads and aims to minimize the Energy Delay2 product (ED2). Our scheme comprises two components: a “local” component that tries to adapt to the different program phases on a per explicit thread basis, taking into account implicit thread behavior, and a “global” component that augments the local components with information regarding inter-thread synchronization. Experimental results show a reduction of ED2 of 8% compared to having no power management, with an average reduction in power of 15% that comes at a minimal loss of performance of less than 3% on average

Edinburgh Research Archive

Performance and Energy Optimization of the Iterative Solution of Sparse Linear Systems on Multicore Processors

Author: Barreda Vayá María
Publication venue: 'Universitat Jaume I'
Publication date: 01/01/2017
Field of study

En esta tesis doctoral se aborda la solución de sistemas dispersos de ecuaciones lineales utilizando métodos iterativos precondicionados basados en subespacios de Krylov. En concreto, se centra en ILUPACK, una biblioteca que implementa precondicionadores de tipo ILU multinivel para la solución eficiente de sistemas lineales dispersos. El incremento en el número de ecuaciones, y la aparición de nuevas arquitecturas, motiva el desarrollo de una versión paralela de ILUPACK que optimice tanto el tiempo de ejecución como el consumo energético en arquitecturas multinúcleo actuales y en clusters de nodos construidos con esta tecnología. El objetivo principal de la tesis es el diseño, implementación y valuación de resolutores paralelos energéticamente eficientes para sistemas lineales dispersos orientados a procesadores multinúcleo así como aceleradores hardware como el Intel Xeon Phi. Para lograr este objetivo, se aprovecha el paralelismo de tareas mediante OmpSs y MPI, y se desarrolla un entorno automático para detectar ineficiencias energéticas.In this dissertation we target the solution of large sparse systems of linear equations using preconditioned iterative methods based on Krylov subspaces. Specifically, we focus on ILUPACK, a library that offers multi-level ILU preconditioners for the effective solution of sparse linear systems. The increase of the number of equations and the introduction of new HPC architectures motivates us to develop a parallel version of ILUPACK which optimizes both execution time and energy consumption on current multicore architectures and clusters of nodes built from this type of technology. Thus, the main goal of this thesis is the design, implementation and evaluation of parallel and energy-efficient iterative sparse linear system solvers for multicore processors as well as recent manycore accelerators such as the Intel Xeon Phi. To fulfill the general objective, we optimize ILUPACK exploiting task parallelism via OmpSs and MPI, and also develope an automatic framework to detect energy inefficiencies

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional de la Universitat Jaume I

Information Fusion for Improved Motion Estimation

Author: Peacock Andrew M
Publication venue: University of Edinburgh; College of Science and Engineering; School of Engineering and Electronics
Publication date: 01/07/2001
Field of study

studentship award number 98318229Motion Estimation is an important research field with many commercial applications including surveillance, navigation, robotics, and image compression. As a result, the field has received a great deal of attention and there exist a wide variety of Motion Estimation techniques which are often specialised for particular problems. The relative performance of these techniques, in terms of both accuracy and of computational requirements, is often found to be data dependent, and no single technique is known to outperform all others for all applications under all conditions. Information Fusion strategies seek to combine the results of different classifiers or sensors to give results of a better quality for a given problem than can be achieved by any single technique alone. Information Fusion has been shown to be of benefit to a number of applications including remote sensing, personal identity recognition, target detection, forecasting, and medical diagnosis. This thesis proposes and demonstrates that Information Fusion strategies may also be applied to combine the results of different Motion Estimation techniques in order to give more robust, more accurate and more timely motion estimates than are provided by any of the individual techniques alone. Information Fusion strategies for combining motion estimates are investigated and developed. Their usefulness is first demonstrated by combining scalar motion estimates of the frequency of rotation of spinning biological cells. Then the strategies are used to combine the results from three popular 2D Motion Estimation techniques, chosen to be representative of the main approaches in the field. Results are presented, from both real and synthetic test image sequences, which illustrate the potential benefits of Information Fusion to Motion Estimation applications. There is often a trade-off between accuracy of Motion Estimation techniques and their computational requirements. An architecture for Information Fusion that allows faster, less accurate techniques to be effectively combined with slower, more accurate techniques is described. This thesis describes a number of novel techniques for both Information Fusion and Motion Estimation which have potential scope beyond that examined here. The investigations presented in this thesis have also been reported in a number of workshop, conference and journal papers, which are listed at the end of the document

Edinburgh Research Archive

Attitudes towards old age and age of retirement across the world: findings from the future of retirement survey

Author: Khan H.
Khan H.
Leeson G.
Leeson G.
Publication venue: ISOSS
Publication date: 01/01/2013
Field of study

The 21st century has been described as the first era in human history when the world will no longer be young and there will be drastic changes in many aspects of our lives including socio-demographics, financial and attitudes towards the old age and retirement. This talk will introduce briefly about the Global Ageing Survey (GLAS) 2004 and 2005 which is also popularly known as “The Future of Retirement”. These surveys provide us a unique data source collected in 21 countries and territories that allow researchers for better understanding the individual as well as societal changes as we age with regard to savings, retirement and healthcare. In 2004, approximately 10,000 people aged 18+ were surveyed in nine counties and one territory (Brazil, Canada, China, France, Hong Kong, India, Japan, Mexico, UK and USA). In 2005, the number was increased to twenty-one by adding Egypt, Germany, Indonesia, Malaysia, Poland, Russia, Saudi Arabia, Singapore, Sweden, Turkey and South Korea). Moreover, an additional 6320 private sector employers was surveyed in 2005, some 300 in each country with a view to elucidating the attitudes of employers to issues relating to older workers. The paper aims to examine the attitudes towards the old age and retirement across the world and will indicate some policy implications

Middlesex University Research Repository

Energy Efficiency Models for Scientific Applications on Supercomputers

Author: Endrei Mark
Publication venue: 'University of Queensland Library'
Publication date: 17/01/2020
Field of study

University of Queensland eSpace