Search CORE

3,459 research outputs found

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

virtFlow: guest independent execution flow analysis across virtualized environments

Author: Dagenais Michel R.
Nemati Hani
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2020
Field of study

An agent-less technique to understand virtual machines (VMs) behavior and their changes during the VM life-cycle is essential for many performance analysis and debugging tasks in the cloud environment. Because of privacy and security issues, ease of deployment and execution overhead, the method preferably limits its data collection to the physical host level, without internal access to the VMs. We propose a host-based, precise method to recover execution flow of virtualized environments, regardless of the level of virtualization. Given a VM, the Any-Level VM Detection Algorithm (ADA) and Nested VM State Detection (NSD) Algorithm compute its execution path along with the state of virtual CPUs (vCPUs) from the host kernel trace. The state of vCPUs is displayed in an interactive trace viewer (TraceCompass) for further inspection. Then, a new approach for profiling threads and processes inside the VMs is proposed. Our proposed VM trace analysis algorithms have been open-sourced for further enhancements and to the benefit of other developers. Our new techniques are being evaluated with workloads generated by different benchmarking tools. These approaches are based on host hypervisor tracing, which brings a lower overhead (around 1%) as compared to other approaches

PolyPublie

C-RAM: Breaking Mobile Device Memory Barriers Using the Cloud

Author: Pamboris Andreas
Pietzuch Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2015
Field of study

Mobile applications are constrained by the available memory of mobile devices. We present C-RAM, a system that uses cloud-based memory to extend the memory of mobile devices. It splits application state and its associated computation between a mobile device and a cloud node to allow applications to consume more memory, while minimising the performance impact. C-RAM thus enables developers to realise new applications or port legacy desktop applications with a large memory footprint to mobile platforms without explicitly designing them to account for memory limitations. To handle network failures with partitioned application state, C-RAM uses a new snapshot-based fault tolerance mechanism in which changes to remote memory objects are periodically backed up to the device. After failure, or when network usage exceeds a given limit, the device rolls back execution to continue from the last snapshot. C-RAM supports local execution with an application state that exceeds the available device memory through a user-level virtual memory: objects are loaded on-demand from snapshots in flash memory. Our C-RAM prototype supports Objective-C applications on the unmodified iOS platform. With C-RAM, applications can consume 10× more memory than the device capacity, with a negligible impact on application performance. In some cases, C-RAM even achieves a significant speed-up in execution time (up to 9.7×)

CLoK

Spiral - Imperial College Digital Repository

Adaptive heterogeneous parallelism for semi-empirical lattice dynamics in computational materials science.

Author: Garba Michael
Publication venue
Publication date: 30/04/2015
Field of study

With the variability in performance of the multitude of parallel environments available today, the conceptual overhead created by the need to anticipate runtime information to make design-time decisions has become overwhelming. Performance-critical applications and libraries carry implicit assumptions based on incidental metrics that are not portable to emerging computational platforms or even alternative contemporary architectures. Furthermore, the significance of runtime concerns such as makespan, energy efficiency and fault tolerance depends on the situational context. This thesis presents a case study in the application of both Mattsons prescriptive pattern-oriented approach and the more principled structured parallelism formalism to the computational simulation of inelastic neutron scattering spectra on hybrid CPU/GPU platforms. The original ad hoc implementation as well as new patternbased and structured implementations are evaluated for relative performance and scalability. Two new structural abstractions are introduced to facilitate adaptation by lazy optimisation and runtime feedback. A deferred-choice abstraction represents a unified space of alternative structural program variants, allowing static adaptation through model-specific exhaustive calibration with regards to the extrafunctional concerns of runtime, average instantaneous power and total energy usage. Instrumented queues serve as mechanism for structural composition and provide a representation of extrafunctional state that allows realisation of a market-based decentralised coordination heuristic for competitive resource allocation and the Lyapunov drift algorithm for cooperative scheduling

Open Access Institutional Repository at Robert Gordon University

Agentless robust load sharing strategy for utilising hetero-geneous resources over wide area network

Author: Natthakrit Sanguandikul
Publication venue: Maejo University
Publication date: 01/06/2011
Field of study

Resource monitoring and performance prediction services have always been regarded as important keys to improving the performance of load sharing strategy. However, the traditional methodologies usually require specific performance information, which can only be collected by installing proprietary agents on all participating resources. This requirement of implementing a single unified monitoring service may not be feasible because of the differences in the underlying systems and organisation policies. To address this problem, we define a new load sharing strategy which bases the load decision on a simple performance estimation that can be measured easily at the coordinator node. Our proposed strategy relies on a stage-based dynamic task allocation to handle the imprecision of our performance estimation and to correct load distribution on-the-fly. The simulation results showed that the performance of our strategy is comparable or better than traditional strategies, especially when the performance information from the monitoring service is not accurate

Directory of Open Access Journals

Data Replication and Its Alignment with Fault Management in the Cloud Environment

Author: Xie Fei
Publication venue: School of Computing and Information Technology
Publication date: 01/01/2021
Field of study

Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time

Research Online

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv

Author: Crusoe Michael R.
Goble Carole
Khan Farah Zaib
Lonie Andrew
Sinnott Richard O.
Soiland-Reyes Stian
Publication venue
Publication date: 04/12/2018
Field of study

Background: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still several challenges associated with the effective sharing, publication and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms. Results: Based on best practice recommendations identified from literature on workflow design, sharing and publishing, we define a hierarchical provenance framework to achieve uniformity in the provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realise this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We utilise open source community-driven standards; interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric Research Objects (RO) generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups. Conclusions: The underlying principles of the standards utilised by CWLProv enable semantically-rich and executable Research Objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, re-use the methods for partial re-runs, or reproduce the analysis to validate the published findings.Submitted to GigaScience (GIGA-D-18-00483

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

Recommended from our members

Improving shared access to Cloud of Things resources.

Author: Al Rawahi AS
Publication venue
Publication date: 01/10/2019
Field of study

Cloud of Things (CoT) is an emerging paradigm that integrates Cloud Computing and Internet of Things (IoT) to support a wide range of real-world applications. Resource allocation plays a vital role in CoT, especially when allocating IoT physical resources to Cloud-based applications to ensure seamless application execution. Due to the heterogeneity and the constrained capacities of IoT resources, resource allocation is a challenge. This complexity leads to missing/limiting shared access to the IoT physical resources and consequently lessen the reusability of the resources across multiple applications. This issue results in, 1) replicating IoT deployments making them expensive and not feasible for many prospective users, 2) existing IoT infrastructures are over-provisioned to meet the unpredictable application requirements in which resources may be signiﬁcantly underutilised, and 3) the adoption of CoT is slowed. Improving shared access to CoT resources can provide eﬃcient resource allocation, improve resource utilisation and likely to reduce the cost of IoT deployments. Existing solutions include small-scale, hardware and platform-dependent mechanisms to enable or improve shared access to IoT resources. The research presented in this thesis considers trading CoT resources in a marketplace as an approach to improve shared access to CoT resources. It proposes a solution to Cot resource allocation that re-imagines CoT resources as commodities that can be provided and consumed by the marketplace participants. The novel contributions of the research presented in this thesis are summarised as follows: 1) a model to describe and quantify the value of CoT resources, 2) a resource sharing and allocation strategy called Exclusive Shared Access (ESA) to CoT resources, 3) a QoS-aware optimisation model for trading CoT resources as a single and multipleobjective optimisation problem, and 4) a marketplace architecture and experimental evaluation to verify its performance and scalability

Nottingham Trent Institutional Repository (IRep)