Search CORE

48 research outputs found

A Library for Pattern-based Sparse Matrix Vector Multiply

Author: Back Godmar
Belgin Mehmet
Ribbens Calvin
Publication venue
Publication date: 01/01/2009
Field of study

Pattern-based Representation (PBR) is a novel approach to improving the performance of Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that many matrices can be divided into blocks that share a small number of distinct patterns, we generate custom multiplication kernels for frequently recurring block patterns. The resulting reduction in index overhead significantly reduces memory bandwidth requirements and improves performance. Unlike existing methods, PBR requires neither detection of dense blocks nor zero filling, making it particularly advantageous for matrices that lack dense nonzero concentrations. SMVM kernels for PBR can benefit from explicit prefetching and vectorization, and are amenable to parallelization. The analysis and format conversion to PBR is implemented as a library, making it suitable for applications that generate matrices dynamically at runtime. We present sequential and parallel performance results for PBR on two current multicore architectures, which show that PBR outperforms available alternatives for the matrices to which it is applicable, and that the analysis and conversion overhead is amortized in realistic application scenarios

Computer Science Technical Reports @Virginia Tech

Exploitation of Dynamic Communication Patterns through Static Analysis

Author: de Supinski B
Kranzlmueller D
Panas T
Preissl R
Quinlan D
Schulz M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/06/2010
Field of study

Abstract not provide

Crossref

UNT Digital Library

Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications

Author: Cedrón Francisco
Pastur-Romay L.A.
Pazos A.
Porto-Pazos Ana B.
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

[Abstract] Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2014/049Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/039Instituto de Salud Carlos III; PI13/0028

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor

Author: Bailey D
Ganapathi A
IBM Blue Gene Team
Kamil S
Nguyen A
Peng L
Sosa C and International Business Machines Corporation
Williams S
Publication venue: 'SAGE Publications'
Publication date: 17/01/2012
Field of study

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the Central Processing Unit (CPU). We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM Blue Gene/P supercomputer's PowerPC 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7x speedup over the best previously published results

arXiv.org e-Print Archive

Crossref

B3: Fuzzy-Based Data Center Load Optimization in Cloud Computing

Author: A. Vincent Antony Kumar
M. Jaiganesh
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Cloud computing started a new era in getting variety of information puddles through various internet connections by any connective devices. It provides pay and use method for grasping the services by the clients. Data center is a sophisticated high definition server, which runs applications virtually in cloud computing. It moves the application, services, and data to a large data center. Data center provides more service level, which covers maximum of users. In order to find the overall load efficiency, the utilization service in data center is a definite task. Hence, we propose a novel method to find the efficiency of the data center in cloud computing. The goal is to optimize date center utilization in terms of three big factors—Bandwidth, Memory, and Central Processing Unit (CPU) cycle. We constructed a fuzzy expert system model to obtain maximum Data Center Load Efficiency (DCLE) in cloud computing environments. The advantage of the proposed system lies in DCLE computing. While computing, it allows regular evaluation of services to any number of clients. This approach indicates that the current cloud needs an order of magnitude in data center management to be used in next generation computing

Crossref

Directory of Open Access Journals

Building Scientific Clouds: The Distributed, Peer-to-Peer Approach

Author: Vadakedathu Linton
Publication venue: Clemson University Libraries
Publication date: 01/05/2010
Field of study

The Scientific community is constantly growing in size. The increase in personnel number and projects have resulted in the requirement of large amounts of storage, CPU power and other computing resources. It has also become necessary to acquire these resources in an affordable manner that is sensitive to work loads. In this thesis, the author presents a novel approach that provides the communication platform that will support such large scale scientific projects. These resources could be difficult to acquire due to NATs, firewalls and other site-based restrictions and policies. Methods used to overcome these hurdles have been discussed in detail along with other advantages of using such a system, which include: increased availability of necessary computing infrastructure; increased grid resource utilization; reduced user dependability; reduced job execution time. Experiments conducted included local infrastructure on the Clemson University Campus as well as resources provided by other federated grid sites

Clemson University: TigerPrints

Indirect cube: A power-efficient topology for compute clusters

Author: Boden
Cunningham
Dally
Javier Navaridas
José Miguel-Alonso
Miguel-Alonso
Navaridas
Navaridas
Puente
Seitz
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Prediction based task scheduling in distributed computing

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1995
Field of study

Crossref