Search CORE

30 research outputs found

Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data

Author: Cerrada Barrios José Luis
Crawford Broderick
Fernández Díaz Ramón
Gómez Pulido Juan Antonio
Lanza Gutiérrez José Manuel
Soto Guzmán Ricardo
Trinidad Amado Sebastián
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

ANTECEDENTES: las metaheurísticas se utilizan ampliamente para resolver grandes problemas de optimización combinatoria en bioinformática debido al enorme conjunto de posibles soluciones. Dos problemas representativos son la selección de genes para la clasificación del cáncer y el agrupamiento de los datos de expresión génica. En la mayoría de los casos, estas metaheurísticas, así como otras técnicas no lineales, aplican una función de adecuación a cada solución posible con una población de tamaño limitado, y ese paso involucra latencias más altas que otras partes de los algoritmos, lo cual es la razón por la cual el tiempo de ejecución de las aplicaciones dependerá principalmente del tiempo de ejecución de la función de aptitud. Además, es habitual encontrar formulaciones aritméticas de punto flotante para las funciones de fitness. De esta manera, una paralelización cuidadosa de estas funciones utilizando la tecnología de hardware reconfigurable acelerará el cálculo, especialmente si se aplican en paralelo a varias soluciones de la población. RESULTADOS: una paralelización de grano fino de dos funciones de aptitud de punto flotante de diferentes complejidades y características involucradas en el biclustering de los datos de expresión génica y la selección de genes para la clasificación del cáncer permitió obtener mayores aceleraciones y cómputos de potencia reducida con respecto a los microprocesadores habituales. CONCLUSIONES: Los resultados muestran mejores rendimientos utilizando tecnología de hardware reconfigurable en lugar de los microprocesadores habituales, en términos de tiempo de consumo y consumo de energía, no solo debido a la paralelización de las operaciones aritméticas, sino también gracias a la evaluación de aptitud concurrente para varios individuos de la población en La metaheurística. Esta es una buena base para crear soluciones aceleradas y de bajo consumo de energía para escenarios informáticos intensivos.BACKGROUND: Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. RESULTS: A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. CONCLUSIONS: The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.• Ministerio de Economía y Competitividad y Fondos FEDER. Contrato TIN2012-30685 (I+D+i) • Gobierno de Extremadura. Ayuda GR15011 para grupos TIC015 • CONICYT/FONDECYT/REGULAR/1160455. Beca para Ricardo Soto Guzmán • CONICYT/FONDECYT/REGULAR/1140897. Beca para Broderick CrawfordpeerReviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

PubMed Central

Dehesa. Repositorio Institucional de la Universidad de Extremadura

Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data

Author: A Rathod
AI Funie
AR Omondi
B Liu
B Pontes
B Sukhwani
Broderick Crawford
C Ambroise
C Maxfield
CW Ahn
D Buell
D Pelta
DA Patterson
DB Thomas
EB Huerta
EJN Segundo
F Divina
F Vahid
G Chrysos
GB Fogel
H Emam
J Gonzalez-Dominguez
JI Hidalgo
Jose L. Cerrada-Barrios
Jose M. Lanza-Gutierrez
Juan A. Gomez-Pulido
K Glette
M Gokhale
M Khabzaoui
MC Herbordt
MS Mohamad
N Nedjah
P Layzell
R Baraglia
R Peesapati
Ramon A. Fernandez-Diaz
Ricardo Soto
RP Sidhu
S Bleuler
S Che
Sebastian Trinidad-Amado
V Sriram
VA Pedroni
W Tang
Y Zhang
Z Michalewicz
Z Vasicek
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

Author: A Ben-Dor
A Dharan
A Prelic
A Schliep
A Tanay
A Yip
B Pontes
C Cano
C Gallo
DD Lewis
EL Lehmann
F Angiulli
F Divina
GF Berriz
H Turner
H Wang
IS Dhillon
J Liu
J Yang
JA Hartigan
Jin-Kao Hao
JS Aguilar-Ruiz
K Bryan
K Cheng
L Lazzeroni
L Teng
Mourad Elloumi
R Agrawal
R Balasubramaniyan
S Barkow
S Bergmann
S Bleuler
S Mitra
S Tavazoie
SC Madeira
SC Madeira
SD Peddada
T Hofmann
U Maulik
W Gaul
Wassim Ayadi
X Liu
Y Cheng
Y Cheng
Y Christinat
Y Luan
Y Okada
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. Methods We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. Results The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Okina

Biclustering of Gene Expression Data by Correlation-Based Scatter Search

Author: Aguilar Ruiz Jesús Salvador
Nepomuceno Chamorro Juan Antonio
Troncoso Lora Alicia
Publication venue
Publication date: 01/01/2011
Field of study

BACKGROUND: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database

Springer - Publisher Connector

PubMed Central

idUS. Depósito de Investigación Universidad de Sevilla

Aco-based feature selection algorithm for classification

Author: Al-mazini Hassan Fouad Abbas
Publication venue
Publication date: 01/01/2022
Field of study

Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

Universiti Utara Malaysia: UUM eTheses

Preventing premature convergence and proving the optimality in evolutionary algorithms

Author: Alliot Jean-Marc
Durand Nicolas
Gotteland Jean-Baptiste
Vanaret Charlie
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

http://ea2013.inria.fr//proceedings.pdfInternational audienceEvolutionary Algorithms (EA) usually carry out an efficient exploration of the search-space, but get often trapped in local minima and do not prove the optimality of the solution. Interval-based techniques, on the other hand, yield a numerical proof of optimality of the solution. However, they may fail to converge within a reasonable time due to their inability to quickly compute a good approximation of the global minimum and their exponential complexity. The contribution of this paper is a hybrid algorithm called Charibde in which a particular EA, Differential Evolution, cooperates with a Branch and Bound algorithm endowed with interval propagation techniques. It prevents premature convergence toward local optima and outperforms both deterministic and stochastic existing approaches. We demonstrate its efficiency on a benchmark of highly multimodal problems, for which we provide previously unknown global minima and certification of optimality

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression

Author: Perrin Dimitri
Publication venue: Dublin City University. School of Computing
Publication date: 01/01/2008
Field of study

Cite as: Perrin, Dimitri (2008) Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression. PhD thesis, Dublin City University

Irish Universities

Queensland University of Technology ePrints Archive

DCU Online Research Access Service

A multi-objective genetic algorithm for biclustering of gene expression data with probabilistic encoding and overlapping control

Author: Marcozzi Michaël
Publication venue
Publication date: 29/09/2010
Field of study

Repository of the University of Namur

Multiobjective optimization in bioinformatics and computational biology

Author: Handl Julia
Kell Douglas B.
Knowles Joshua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2007
Field of study

The University of Manchester - Institutional Repository

Matrix Reordering Methods for Table and Network Visualization

Author: Alper
Andrecut
Applegate
Atkins
Bar-Joseph
Batagelj
Behrisch
Bentley
Bertin
Bertin
Borg
Brandes
Brandes
Breiger
Brusco
Brusco
BRUSCO
Caraux
Chan
Chen
Cheng
Croes
Cuthill
Deutsch
Díaz
Eades
Eisen
Fiedler
Friendly
Friendly
Gansner
Gelfand
GEORGE
Ghoniem
Gibbs
Gregor
Gruvaeus
Hahsler
Harel
Hartigan
Henry
Hill
Hubert
Hubert
Jin
KAISER
Kaiser
Kendall
KIM
King
Knyazev
Koren
Koren
Lazzeroni
Lee
Lenstra
Leung
Liiv
Liu
Liu
Lozano
Lozano
Madeira
Mafteiu-Scai
McCormick
McGrath
McQuitty
Mueller
Murali
Mäkinen
Mäkinen
Niermann
Osei-Kuffuor
Petit
PICH
Prelić
Rao
Raspaud
Robinson
Rodgers
Rosen
Siek
Siirtola
Sloan
Sloan
Spence
Spenke
Tanay
Turner
von Rüden
Watkins
WILKINSON
Wong
Wu
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

International audienceThis survey provides a description of algorithms to reorder visual matrices of tabular data and adjacency matrix of networks. The goal of this survey is to provide a comprehensive list of reordering algorithms published in different fields such as statistics, bioinformatics, or graph theory. While several of these algorithms are described in publications and others are available in software libraries and programs, there is little awareness of what is done across all fields. Our survey aims at describing these reordering algorithms in a unified manner to enable a wide audience to understand their differences and subtleties. We organize this corpus in a consistent manner, independently of the application or research field. We also provide practical guidance on how to select appropriate algorithms depending on the structure and size of the matrix to reorder, and point to implementations when available

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

HAL-Rennes 1