Search CORE

159 research outputs found

Median architecture by accumulative parallel counters

Author: Cadenas J.O.
Cadenas J.O.
Megson G.M.
Megson G.M.
Sherratt R.S.
Sherratt R.S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The time to process each of W/B processing blocks of a median calculation method on a set of N W-bit integers is improved here by a factor of three compared to the literature. Parallelism uncovered in blocks containing B-bit slices are exploited by independent accumulative parallel counters so that the median is calculated faster than any known previous method for any N, W values. The improvements to the method are discussed in the context of calculating the median for a moving set of N integers for which a pipelined architecture is developed. An extra benefit of smaller area for the architecture is also reported

Central Archive at the University of Reading

LSBU Research Open

Kent Academic Repository

WestminsterResearch

A job response time prediction method for production Grid computing environments

Author: Goyeneche A.
Goyeneche A.
Publication venue
Publication date: 01/01/2010
Field of study

A major obstacle to the widespread adoption of Grid Computing in both the scientific community and industry sector is the difficulty of knowing in advance a job submission running cost that can be used to plan a correct allocation of resources. Traditional distributed computing solutions take advantage of homogeneous and open environments to propose prediction methods that use a detailed analysis of the hardware and software components. However, production Grid computing environments, which are large and use a complex and dynamic set of resources, present a different challenge. In Grid computing the source code of applications, programme libraries, and third-party software are not always available. In addition, Grid security policies may not agree to run hardware or software analysis tools to generate Grid components models. The objective of this research is the prediction of a job response time in production Grid computing environments. The solution is inspired by the concept of predicting future Grid behaviours based on previous experiences learned from heterogeneous Grid workload trace data. The research objective was selected with the aim of improving the Grid resource usability and the administration of Grid environments. The predicted data can be used to allocate resources in advance and inform forecasted finishing time and running costs before submission. The proposed Grid Computing Response Time Prediction (GRTP) method implements several internal stages where the workload traces are mined to produce a response time prediction for a given job. In addition, the GRTP method assesses the predicted result against the actual target job’s response time to inference information that is used to tune the methods setting parameters. The GRTP method was implemented and tested using a cross-validation technique to assess how the proposed solution generalises to independent data sets. The training set was taken from the Grid environment DAS (Distributed ASCI Supercomputer). The two testing sets were taken from AuverGrid and Grid5000 Grid environments Three consecutive tests assuming stable jobs, unstable jobs, and using a job type method to select the most appropriate prediction function were carried out. The tests offered a significant increase in prediction performance for data mining based methods applied in Grid computing environments. For instance, in Grid5000 the GRTP method answered 77 percent of job prediction requests with an error of less than 10 percent. While in the same environment, the most effective and accurate method using workload traces was only able to predict 32 percent of the cases within the same range of error. The GRTP method was able to handle unexpected changes in resources and services which affect the job response time trends and was able to adapt to new scenarios. The tests showed that the proposed GRTP method is capable of predicting job response time requests and it also improves the prediction quality when compared to other current solutions

WestminsterResearch

Stochastic computing system hardware design for convolutional neural networks optimized for accuracy area and energy efficiency

Author: Hamdan Hamdan Usamah
Publication venue
Publication date: 01/01/2020
Field of study

Stochastic computing (SC) is an alternative computing paradigm that can lead to designs that oﬀer lower area and power consumption compared to that of the conventional binary-encoded (BE) deterministic computing. In SC, numbers are encoded as a bit-stream of ‘0’s and ‘1’s, where SC computation elements (or functions) operate on one or more bit-streams. To obtain accurate results, some functions require the bit-streams to be correlated, while others require uncorrelated bit-streams or a combination of both. The relationship between SC function accuracy and correlation is not well studied in previous works. Thus, managing the correlation across the SC system is a key challenge in the eﬀort to achieve optimum accuracy. In addition, to perform SC computation, the input values are converted from BE domain to SC; then on the completion of the computation, back to BE to obtain the results. The conversion processes require circuitry that typically consume over 80% of the overall SC system area, hence this is another key challenge of the problem. To address the above mentioned challenges, this thesis proposes a framework of an end-to-end system design optimized for accuracy and area. The framework provides guidelines to design an eﬀective SC function or system that exploit correlation. This framework is applied in designing the SC functional units and the complete SC system for convolutional neural network (CNN), which is the dominant approach in the implementation of recognition systems. This thesis shows that although CNN is a compute-intensive and resource-demanding algorithm, through the proposed SC design framework, it is possible to implement CNN in an embedded system with limited area and power budget. Several novel SC- based functions are proposed that outperform previous works and obtain signiﬁcant area savings and high accuracy to replace the BE equivalent functions. These functions include inner product, max pooling, ReLU activation function, and average pooling. Then, some training considerations are speciﬁed to enable achieving low error rates for SC-based CNN. Experimental results show that the SC-based CNN attained no or minor accuracy degradation compared to BE counterpart. SC-based CNN achieves 99.6% and 96.25% classiﬁcation accuracy using MNIST digit classiﬁcation and AT&T face recognition datasets, respectively. Moreover, the SC-based CNN of ResNet-20 model achieves 86.5% classiﬁcation accuracy using CIFAR-10 object dataset. To rapidly map an SC system into FPGA, a generic design strategy for high-level synthesis of SC computation engines is proposed. The SC-based CNN hardware on FPGA obtains the lowest resource utilization compared to previous works on FPGA-based CNN accelerators. In addition, the proposed hardware architecture achieves 277.46 GOP/s/W energy eﬃciency, which outperforms previous works

Universiti Teknologi Malaysia Institutional Repository

Advancing Feedback-Driven Optimization for Modern Computing.

Author: Zhou Mingzhou
Publication venue: W&M ScholarWorks
Publication date: 01/01/2015
Field of study

College of William & Mary: W&M Publish

Recommended from our members

Complexity-reduced hardware-based track-trigger for CMS upgrade

Author: Ghorbani Maziar
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe Compact Muon Solenoid (CMS) detector at the Large Hadron Collider (LHC) is designed to study the results of proton-proton collisions. The Tracker sub-detector is designed to detect and reconstruct the trajectories of charged particles produced by the collisions. During the lifetime of the CMS detector, there have been several upgrades aimed at increasing the chance of discovering new physics through increased luminosity levels and instrumentation of advanced technology. The High-Luminosity upgrade optimises the LHC to accelerate high-energy particles with an average of 200 proton-proton interactions per bunch crossing. The Level-1 Trigger system promptly analyses and filters collisions using hardware to reduce the data volume in real-time. For the upgrade, the trigger mechanism will use a particle trajectory estimator that discriminates between particles based on their transverse momentum (pT ). Particles with pT ≥ 2 GeV/c will be transmitted to the Level-1 Track-Trigger system for trajectory reconstruction within a fixed 3 μs latency. This thesis presents a novel Hardware-based Multivariate Linear Fitter (MVLF) system focusing on robustness in tracking efficiency and reduction in logic resource usage within the specified latency. The system components are implemented in Field Programmable Gate Arrays (FPGA), targeting 16 nm FinFET UltraScale+ silicon technology. The development was performed using the High-Level Synthesis (HLS) automation tools and the Hardware acceleration platform for Application-Specific Integrated Circuits (ASIC). A firmware demonstrator has been assembled to verify the feasibility and compatibility of the scaled system with the CMS Level-1 Track-Trigger infrastructure. The system’s performance is compared to past and current system developments, and the results are presented accordingly

Brunel University Research Archive

A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

Author: Arifin Sutjipoto
Arifin Sutjipoto
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/10/2008
Field of study

VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

Spiral - Imperial College Digital Repository

A Scalable Approach to Modeling on Accelerated Neuromorphic Hardware.

Author: Arnold Elias
Baumbach Andreas
Billaudelle Sebastian
Breitwieser Oliver
Cramer Benjamin
Czierlinski Milena
Ebert Falk
Emmel Arne
Göltz Julian
Ilmberger Joscha
Kaiser Jakob
Karasenko Vitali
Kleider Mitja
Leibfried Aron
Mauch Christian
Müller Eric
Pehle Christian
Schemmel Johannes
Schmitt Sebastian
Spilger Philipp
Stock Raphael
Stradmann Yannik
Weis Johannes
Publication venue: 'Frontiers Media SA'
Publication date: 21/03/2022
Field of study

Neuromorphic systems open up opportunities to enlarge the explorative space for computational research. However, it is often challenging to unite efficiency and usability. This work presents the software aspects of this endeavor for the BrainScaleS-2 system, a hybrid accelerated neuromorphic hardware architecture based on physical modeling. We introduce key aspects of the BrainScaleS-2 Operating System: experiment workflow, API layering, software design, and platform operation. We present use cases to discuss and derive requirements for the software and showcase the implementation. The focus lies on novel system and software features such as multi-compartmental neurons, fast re-configuration for hardware-in-the-loop training, applications for the embedded processors, the non-spiking operation mode, interactive platform access, and sustainable hardware/software co-development. Finally, we discuss further developments in terms of hardware scale-up, system usability, and efficiency

arXiv.org e-Print Archive

PubMed Central

Bern Open Repository and Information System (BORIS)

Datacenter Traffic Control: Understanding Techniques and Trade-offs

Author: Noormohammadpour Mohammad
Raghavendra Cauligi S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2017
Field of study

Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

ZENODO

FigShare

Ceramic component development analysis -- Volume 1. Final report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Author: Borowiec D
Friday A
Garraghan P
Harper R
Yang R
Yeung G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/05/2021
Field of study

To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this article we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model’s computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5 percent for GPU resource utilization, 23.7–30.7 percent for makespan reduction and 68.3 percent in job wait time reduction

Lancaster E-Prints

White Rose Research Online