Search CORE

5,952 research outputs found

Fine-grained timing using genetic programming

Author: A.F. Webster
J. Kelsey
N.L. Binkert
P. Kocher
P.C. Kocher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In previous work, we have demonstrated that it is possible to use Genetic Programming to minimise the resource consumption of software, such as its power consumption or execution time. In this paper, we investigate the extent to which Genetic Programming can be used to gain fine-grained control over software timing. We introduce the ideas behind our work, and carry out experimentation to find that Genetic Programming is indeed able to produce software with unusual and desirable timing properties, where it is not obvious how a manual approach could replicate such results. In general, we discover that Genetic Programming is most effective in controlling statistical properties of software rather than precise control over its timing for individual inputs. This control may find useful application in cryptography and embedded systems

Crossref

Enlighten

Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data

Author: Cerrada Barrios José Luis
Crawford Broderick
Fernández Díaz Ramón
Gómez Pulido Juan Antonio
Lanza Gutiérrez José Manuel
Soto Guzmán Ricardo
Trinidad Amado Sebastián
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

ANTECEDENTES: las metaheurísticas se utilizan ampliamente para resolver grandes problemas de optimización combinatoria en bioinformática debido al enorme conjunto de posibles soluciones. Dos problemas representativos son la selección de genes para la clasificación del cáncer y el agrupamiento de los datos de expresión génica. En la mayoría de los casos, estas metaheurísticas, así como otras técnicas no lineales, aplican una función de adecuación a cada solución posible con una población de tamaño limitado, y ese paso involucra latencias más altas que otras partes de los algoritmos, lo cual es la razón por la cual el tiempo de ejecución de las aplicaciones dependerá principalmente del tiempo de ejecución de la función de aptitud. Además, es habitual encontrar formulaciones aritméticas de punto flotante para las funciones de fitness. De esta manera, una paralelización cuidadosa de estas funciones utilizando la tecnología de hardware reconfigurable acelerará el cálculo, especialmente si se aplican en paralelo a varias soluciones de la población. RESULTADOS: una paralelización de grano fino de dos funciones de aptitud de punto flotante de diferentes complejidades y características involucradas en el biclustering de los datos de expresión génica y la selección de genes para la clasificación del cáncer permitió obtener mayores aceleraciones y cómputos de potencia reducida con respecto a los microprocesadores habituales. CONCLUSIONES: Los resultados muestran mejores rendimientos utilizando tecnología de hardware reconfigurable en lugar de los microprocesadores habituales, en términos de tiempo de consumo y consumo de energía, no solo debido a la paralelización de las operaciones aritméticas, sino también gracias a la evaluación de aptitud concurrente para varios individuos de la población en La metaheurística. Esta es una buena base para crear soluciones aceleradas y de bajo consumo de energía para escenarios informáticos intensivos.BACKGROUND: Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. RESULTS: A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. CONCLUSIONS: The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.• Ministerio de Economía y Competitividad y Fondos FEDER. Contrato TIN2012-30685 (I+D+i) • Gobierno de Extremadura. Ayuda GR15011 para grupos TIC015 • CONICYT/FONDECYT/REGULAR/1160455. Beca para Ricardo Soto Guzmán • CONICYT/FONDECYT/REGULAR/1140897. Beca para Broderick CrawfordpeerReviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

PubMed Central

Dehesa. Repositorio Institucional de la Universidad de Extremadura

Hardware support for Local Memory Transactions on GPU Architectures

Author: Asenjo-Plaza Rafael
Kaeli David
Navarro Ángeles
Plata-Gonzalez Oscar Guillermo
Ubal Rafael
Villegas Alejandro
Publication venue
Publication date: 26/06/2015
Field of study

Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction - Multiple Thread (SIMT) fashion. However, the SIMT execution model is not efficient when code includes critical sections to protect the access to data shared by the running threads. In addition, GPUs offer two shared spaces to the threads, local memory and global memory. Typical solutions to thread synchronization include the use of atomics to implement locks, the serialization of the execution of the critical section, or delegating the execution of the critical section to the host CPU, leading to suboptimal performance. In the multi-core CPU world, transactional memory (TM) was proposed as an alternative to locks to coordinate concurrent threads. Some solutions for GPUs started to appear in the literature. In contrast to these earlier proposals, our approach is to design hardware support for TM in two levels. The first level is a fast and lightweight solution for coordinating threads that share the local memory, while the second level coordinates threads through the global memory. In this paper we present GPU-LocalTM as a hardware TM (HTM) support for the first level. GPU-LocalTM offers simple conflict detection and version management mechanisms that minimize the hardware resources required for its implementation. For the workloads studied, GPU-LocalTM provides between 1.25-80X speedup over serialized critical sections, while the overhead introduced by transaction management is lower than 20%.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Evolving hardware with genetic algorithms

Author: Kerr Kevin
Publication venue: RIT Scholar Works
Publication date: 01/07/1998
Field of study

Genetic techniques are applied to the problem of electronic circuit design, with an emphasis on VLSI circuits. The goal is to have a tool which has the performance and flexibility to attack a wide range of problems. A genetic algorithm is used to design a circuit specified by the desired input /output characteristics. A software system is implemented to synthesize and optimize circuits using an asynchronous parallel genetic algorithm. The software is designed with object-oriented constructs in order to maintain scalability and provide for future enhancements. The system is executed on a heterogeneous network of workstations ranging from Sun Sparc Ultras to HP multiprocessors. Testing of this software is done with examples of both digital and analog CMOS VLSI circuits. Performance is measured in both the quality of the solutions and in the time it took to evolve them

RIT Scholar Works

NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

Author: Cheung K
Luk W
Schultz SR
Publication venue: 'Frontiers Media SA'
Publication date: 11/12/2015
Field of study

© 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation

Spiral - Imperial College Digital Repository

High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

Author: Akoglu Ali
Benkrid Khaled
Ling Cheng
Liu Ying
Song Yang
Tian Xiang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Phenotypic landscape inference reveals multiple evolutionary paths to C $_4$ photosynthesis

Author: Abramoff
Adams
Akyildiz
Aubry
Beebe
Brown
Brown
Bruhl
Bustin
Casati
Cheng
Christin
Christin
Christin
Christin
Christin
Devi
Devi
Edwards
Furbank
Gavrilets
Goldstein
Gowik
Griffiths
Hatch
Hattersley
Hattersley
Heckman
Hibberd
Hill
Holaday
Holaday
Holaday
Hylton
Johnson
Kajala
Keeley
Kennedy
Khoshravesh
Kidwell
Koteyeva
Kozmik
Ku
Ku
Ku
Ku
Langdale
Livak
Lobkovsky
McKown
McKown
Mooers
Moore
Muhaidat
Osborne
P’yankov
Rabiner
Rajendrudu
Rathnam
Rathnam
Rawsthorne
Rawsthorne
Revuz
Rosche
Roweis
Sage
Sage
Sage
Santos
Sayre
Slack
Smith
Steiner
Ueno
Ueno
Vicentini
Vogan
Voznesenskaya
Voznesenskaya
Voznesenskaya
Wasserman
Weinreich
Whibley
Williams
Wilson
Wiludda
Wittkopp
Wright
Zhu
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2013
Field of study

_4

photosynthesis has independently evolved from the ancestral C

_3

pathway in at least 60 plant lineages, but, as with other complex traits, how it evolved is unclear. Here we show that the polyphyletic appearance of C

_4

photosynthesis is associated with diverse and flexible evolutionary paths that group into four major trajectories. We conducted a meta-analysis of 18 lineages containing species that use C

_3

, C

_4

, or intermediate C

_3

-C

_4

forms of photosynthesis to parameterise a 16-dimensional phenotypic landscape. We then developed and experimentally verified a novel Bayesian approach based on a hidden Markov model that predicts how the C

_4

phenotype evolved. The alternative evolutionary histories underlying the appearance of C

_4

photosynthesis were determined by ancestral lineage and initial phenotypic alterations unrelated to photosynthesis. We conclude that the order of C

_4

trait acquisition is flexible and driven by non-photosynthetic drivers. This flexibility will have facilitated the convergent evolution of this complex trait

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

PubMed Central

Spiral - Imperial College Digital Repository