82 research outputs found

    Using AVX2 Instruction Set to Increase Performance of High Performance Computing Code

    Get PDF
    In this paper we discuss new Intel instruction extensions - Intel Advance Vector Extensions 2 (AVX2) and what these bring to high performance computing (HPC). To illustrate this new systems utilizing AVX2 are evaluated to demonstrate how to effectively exploit AVX2 for HPC types of the code and expose the situation when AVX2 might not be the most effective way to increase performance

    KfK-SUPRENUM-Seminar 19.-20.10.1989. Tagungsbericht

    Get PDF

    Memristive crossbars as hardware accelerators: modelling, design and new uses

    Get PDF
    Digital electronics has given rise to reliable, affordable, and scalable computing devices. However, new computing paradigms present challenges. For example, machine learning requires repeatedly processing large amounts of data; this creates a bottleneck in conventional computers, where computing and memory are separated. To add to that, Moore’s “law” is plateauing and is thus unlikely to address the increasing demand for computational power. In-memory computing, and specifically hardware accelerators for linear algebra, may address both of these issues. Memristive crossbar arrays are a promising candidate for such hardware accelerators. Memristive devices are fast, energy-efficient, and—when arranged in a crossbar structure—can compute vector-matrix products. Unfortunately, they come with their own set of limitations. The analogue nature of these devices makes them stochastic and thus less reliable compared to digital devices. It does not, however, necessarily make them unsuitable for computing. Nevertheless, successful deployment of analogue hardware accelerators requires a proper understanding of their drawbacks, ways of mitigating the effects of undesired physical behaviour, and applications where some degree of stochasticity is tolerable. In this thesis, I investigate the effects of nonidealities in memristive crossbar arrays, introduce techniques of minimising those negative effects, and present novel crossbar circuit designs for new applications. I mostly focus on physical implementations of neural networks and investigate the influence of device nonidealities on classification accuracy. To make memristive neural networks more reliable, I explore committee machines, rearrangement of crossbar lines, nonideality-aware training, and other techniques. I find that they all may contribute to the higher accuracy of physically implemented neural networks, often comparable to the accuracy of their digital counterparts. Finally, I introduce circuits that extend dot product computations to higher-rank arrays, different linear algebra operations, and quaternion vectors and matrices. These present opportunities for using crossbar arrays in new ways, including the processing of coloured images

    Reconstrução/processamento de imagem médica com GPU em tomossíntese

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia BiomédicaA Tomossíntese Digital Mamária (DBT) é uma recente técnica de imagem médica tridimensional baseada na mamografia digital que permite uma melhor observação dos tecidos sobrepostos, principalmente em mamas densas. Esta técnica consiste na obtenção de múltiplas imagens (cortes) do volume a reconstruir, permitindo dessa forma um diagnóstico mais eficaz, uma vez que os vários tecidos não se encontram sobrepostos numa imagem 2D. Os algoritmos de reconstrução de imagem usados em DBT são bastante similares aos usados em Tomografia Computorizada (TC). Existem duas classes de algoritmos de reconstrução de imagem: analíticos e iterativos. No âmbito deste trabalho foram implementados dois algoritmos iterativos de reconstrução: Maximum Likelihood – Expectation Maximization (ML-EM) e Ordered Subsets – Expectation Maximization (OS-EM). Os algoritmos iterativos permitem melhores resultados, no entanto são computacionalmente muito pesados, pelo que, os algoritmos analíticos têm sido preferencialmente usados em prática clínica. Com os avanços tecnológicos na área dos computadores, já é possível diminuir consideravelmente o tempo que leva para reconstruir uma imagem com um algoritmo iterativo. Os algoritmos foram implementados com recurso à programação em placas gráficas − General-Purpose computing on Graphics Processing Units (GPGPU). A utilização desta técnica permite usar uma placa gráfica (GPU – Graphics Processing Unit) para processar tarefas habitualmente designadas para o processador de um computador (CPU – Central Processing Unit) ao invés da habitual tarefa do processamento gráfico a que são associadas as GPUs. Para este projecto foi usado uma GPU NVIDIA®, recorrendo-se à arquitectura Compute Unified Device Architecture (CUDA™) para codificar os algoritmos de reconstrução. Os resultados mostraram que a implementação dos algoritmos em GPU permitiu uma diminuição do tempo de reconstrução em, aproximadamente, 6,2 vezes relativamente ao tempo obtido em CPU. No respeitante à qualidade de imagem, a GPU conseguiu atingir um nível de detalhe similar às imagens da CPU, apesar de diferenças pouco significativas

    Commodity clusters: performance comparison between PCs and workstations

    Full text link
    Workstation clusters were originally developed as a way to leverage the better cost basis of UNIX workstations to perform computations previously handled only by relatively more expensive supercomputers. Commodity workstation clusters take this evolutionary process one step further by replacing equivalent proprietary workstation functionality with less expensive PC technology. As PC technology encroaches on proprietary UNIX workstation vendor markets, these vendors will see a declining share of the overall market. As technology advances continue, the ability to upgrade a workstations performance plays a large role in cost analysis. For example, a major upgrade to a typical UNIX workstation means replacing the whole machine. As major revisions to the UNIX vendor`s product line come out, brand new systems are introduced. IBM compatibles, however, are modular by design, and nothing need to be replaced except the components that are truly improved. The DAISy cluster, for example, is about to undergo a major upgrade from 90MHz Pentiums to 200MHz Pentium Pros. All of the memory -- the system`s largest expense -- and disks, power supply, etc., can be reused. As a result, commodity workstation clusters ought to gain an increasingly large share of the distributed computing market

    NASA Tech Briefs, February 1991

    Get PDF
    Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Design and Modelling of Small Scale Low Temperature Power Cycles

    Get PDF
    corecore