Search CORE

82 research outputs found

Using AVX2 Instruction Set to Increase Performance of High Performance Computing Code

Author: Gepner Pawel
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 19/12/2017
Field of study

In this paper we discuss new Intel instruction extensions - Intel Advance Vector Extensions 2 (AVX2) and what these bring to high performance computing (HPC). To illustrate this new systems utilizing AVX2 are evaluated to demonstrate how to effectively exploit AVX2 for HPC types of the code and expose the situation when AVX2 might not be the most effective way to increase performance

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Fast fourier transforms and the Fujitsu VPP300

Author: Keating Geoffrey
Publication venue
Publication date: 30/08/2018
Field of study

The Australian National University

KfK-SUPRENUM-Seminar 19.-20.10.1989. Tagungsbericht

Author: Trauboth H.
Publication venue
Publication date: 26/01/1999
Field of study

KITopen

Memristive crossbars as hardware accelerators: modelling, design and new uses

Author: Joksas Dovydas
Publication venue: UCL (University College London)
Publication date: 28/07/2022
Field of study

Digital electronics has given rise to reliable, affordable, and scalable computing devices. However, new computing paradigms present challenges. For example, machine learning requires repeatedly processing large amounts of data; this creates a bottleneck in conventional computers, where computing and memory are separated. To add to that, Moore’s “law” is plateauing and is thus unlikely to address the increasing demand for computational power. In-memory computing, and specifically hardware accelerators for linear algebra, may address both of these issues. Memristive crossbar arrays are a promising candidate for such hardware accelerators. Memristive devices are fast, energy-efficient, and—when arranged in a crossbar structure—can compute vector-matrix products. Unfortunately, they come with their own set of limitations. The analogue nature of these devices makes them stochastic and thus less reliable compared to digital devices. It does not, however, necessarily make them unsuitable for computing. Nevertheless, successful deployment of analogue hardware accelerators requires a proper understanding of their drawbacks, ways of mitigating the effects of undesired physical behaviour, and applications where some degree of stochasticity is tolerable. In this thesis, I investigate the effects of nonidealities in memristive crossbar arrays, introduce techniques of minimising those negative effects, and present novel crossbar circuit designs for new applications. I mostly focus on physical implementations of neural networks and investigate the influence of device nonidealities on classification accuracy. To make memristive neural networks more reliable, I explore committee machines, rearrangement of crossbar lines, nonideality-aware training, and other techniques. I find that they all may contribute to the higher accuracy of physically implemented neural networks, often comparable to the accuracy of their digital counterparts. Finally, I introduce circuits that extend dot product computations to higher-rank arrays, different linear algebra operations, and quaternion vectors and matrices. These present opportunities for using crossbar arrays in new ways, including the processing of coloured images

UCL Discovery

Reconstrução/processamento de imagem médica com GPU em tomossíntese

Author: Azevedo Bernardo Lopes de Sá
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2011
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia BiomédicaA Tomossíntese Digital Mamária (DBT) é uma recente técnica de imagem médica tridimensional baseada na mamografia digital que permite uma melhor observação dos tecidos sobrepostos, principalmente em mamas densas. Esta técnica consiste na obtenção de múltiplas imagens (cortes) do volume a reconstruir, permitindo dessa forma um diagnóstico mais eficaz, uma vez que os vários tecidos não se encontram sobrepostos numa imagem 2D. Os algoritmos de reconstrução de imagem usados em DBT são bastante similares aos usados em Tomografia Computorizada (TC). Existem duas classes de algoritmos de reconstrução de imagem: analíticos e iterativos. No âmbito deste trabalho foram implementados dois algoritmos iterativos de reconstrução: Maximum Likelihood – Expectation Maximization (ML-EM) e Ordered Subsets – Expectation Maximization (OS-EM). Os algoritmos iterativos permitem melhores resultados, no entanto são computacionalmente muito pesados, pelo que, os algoritmos analíticos têm sido preferencialmente usados em prática clínica. Com os avanços tecnológicos na área dos computadores, já é possível diminuir consideravelmente o tempo que leva para reconstruir uma imagem com um algoritmo iterativo. Os algoritmos foram implementados com recurso à programação em placas gráficas − General-Purpose computing on Graphics Processing Units (GPGPU). A utilização desta técnica permite usar uma placa gráfica (GPU – Graphics Processing Unit) para processar tarefas habitualmente designadas para o processador de um computador (CPU – Central Processing Unit) ao invés da habitual tarefa do processamento gráfico a que são associadas as GPUs. Para este projecto foi usado uma GPU NVIDIA®, recorrendo-se à arquitectura Compute Unified Device Architecture (CUDA™) para codificar os algoritmos de reconstrução. Os resultados mostraram que a implementação dos algoritmos em GPU permitiu uma diminuição do tempo de reconstrução em, aproximadamente, 6,2 vezes relativamente ao tempo obtido em CPU. No respeitante à qualidade de imagem, a GPU conseguiu atingir um nível de detalhe similar às imagens da CPU, apesar de diferenças pouco significativas

Repositório da Universidade Nova de Lisboa

Commodity clusters: performance comparison between PCs and workstations

Author: Armstrong R.
Carter R.
Laroco J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Workstation clusters were originally developed as a way to leverage the better cost basis of UNIX workstations to perform computations previously handled only by relatively more expensive supercomputers. Commodity workstation clusters take this evolutionary process one step further by replacing equivalent proprietary workstation functionality with less expensive PC technology. As PC technology encroaches on proprietary UNIX workstation vendor markets, these vendors will see a declining share of the overall market. As technology advances continue, the ability to upgrade a workstations performance plays a large role in cost analysis. For example, a major upgrade to a typical UNIX workstation means replacing the whole machine. As major revisions to the UNIX vendor`s product line come out, brand new systems are introduced. IBM compatibles, however, are modular by design, and nothing need to be replaced except the components that are truly improved. The DAISy cluster, for example, is about to undergo a major upgrade from 90MHz Pentiums to 200MHz Pentium Pros. All of the memory -- the system`s largest expense -- and disks, power supply, etc., can be reused. As a result, commodity workstation clusters ought to gain an increasingly large share of the distributed computing market

Crossref

UNT Digital Library

NASA Tech Briefs, February 1991

Author
Publication venue
Publication date
Field of study

Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

NASA Technical Reports Server

A comparison of statistical machine learning methods in heartbeat detection and classification

Author: A.L. Goldberger
G.J. McLachlan
H. Feichtinger
J.A. Freeman
P. Chazal de
R.A. Johnson
R.O. Duda
T. Ince
Y.H. Hu
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2012
Field of study

In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

Crossref

Research Archive of Indian Institute of Technology Hyderabad

Design and Modelling of Small Scale Low Temperature Power Cycles

Author: Wronski Jorrit
Publication venue: DTU Mechanical Engineering
Publication date: 01/01/2015
Field of study

Online Research Database In Technology