20 research outputs found
Parallel evolutionary biclustering of short-term electric energy consumption
Presentación realizada en el marco del Proyecto PINV18-661: Análisis de la eficiencia energética en edificios no residenciales mediante técnicas metaheurísticas y de inteligencia artificial.CONACYT - Consejo Nacional de Ciencias y TecnologíaPROCIENCI
Binary Particle Swarm Optimization based Biclustering of Web usage Data
Web mining is the nontrivial process to discover valid, novel, potentially
useful knowledge from web data using the data mining techniques or methods. It
may give information that is useful for improving the services offered by web
portals and information access and retrieval tools. With the rapid development
of biclustering, more researchers have applied the biclustering technique to
different fields in recent years. When biclustering approach is applied to the
web usage data it automatically captures the hidden browsing patterns from it
in the form of biclusters. In this work, swarm intelligent technique is
combined with biclustering approach to propose an algorithm called Binary
Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The
main objective of this algorithm is to retrieve the global optimal bicluster
from the web usage data. These biclusters contain relationships between web
users and web pages which are useful for the E-Commerce applications like web
advertising and marketing. Experiments are conducted on real dataset to prove
the efficiency of the proposed algorithms
SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine
Traditional medicine typically applies one-size-fits-all treatment for the
entire patient population whereas precision medicine develops tailored
treatment schemes for different patient subgroups. The fact that some factors
may be more significant for a specific patient subgroup motivates clinicians
and medical researchers to develop new approaches to subgroup detection and
analysis, which is an effective strategy to personalize treatment. In this
study, we propose a novel patient subgroup detection method, called Supervised
Biclustring (SUBIC) using convex optimization and apply our approach to detect
patient subgroups and prioritize risk factors for hypertension (HTN) in a
vulnerable demographic subgroup (African-American). Our approach not only finds
patient subgroups with guidance of a clinically relevant target variable but
also identifies and prioritizes risk factors by pursuing sparsity of the input
variables and encouraging similarity among the input variables and between the
input and target variable
An Archived Multi Objective Simulated Annealing Method to Discover Biclusters in Microarray Data
With the advent of microarray technology it has been possible to measure thousands of expression values of genes in a single experiment. Analysis of large scale geonomics data, notably gene expression, has initially focused on clustering methods. Recently, biclustering techniques were proposed for revealing submatrices showing unique patterns. Biclustering or simultaneous clustering of both genes and conditions is challenging particularly for the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining. In biclustering of microarray data, several objectives have to be optimized simultaneously and often these objectives are in conflict with each other. A multi objective model is very suitable for solving this problem. Our method proposes a algorithm which is based on multi objective Simulated Annealing for discovering biclusters in gene expression data. Experimental result in bench mark data base present a significant improvement in overlap among biclusters and coverage of elements in gene expression and quality of biclusters
Biclustering in data mining using a memetic multi-objective evolutionary algorithm
In this paper, a new memetic strategy that integrates a multi-objective evolutionary algorithm (the SPEA2) with a local search technique for data mining is presented. The algorithm explores a Term Frequency-Inverse Document Frequency (TF-IDF) data matrix in order to find biclusters that fulfill several objectives. The case of study was a dataset corresponding to the Reuters-21578 corpus. Our algorithm performed satisfactorily, finding biclusters that have large size and coherent values, yielding to undeniably promising outcomes. Nonetheless, more experiments with data from other corpus are necessary, thus leading to more concluding resultsWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data
ANTECEDENTES: las metaheurísticas se utilizan ampliamente para resolver grandes problemas de optimización combinatoria en bioinformática debido al enorme conjunto de posibles soluciones. Dos problemas representativos son la selección de genes para la clasificación del cáncer y el agrupamiento de los datos de expresión génica. En la mayoría de los casos, estas metaheurísticas, así como otras técnicas no lineales, aplican una función de adecuación a cada solución posible con una población de tamaño limitado, y ese paso involucra latencias más altas que otras partes de los algoritmos, lo cual es la razón por la cual el tiempo de ejecución de las aplicaciones dependerá principalmente del tiempo de ejecución de la función de aptitud. Además, es habitual encontrar formulaciones aritméticas de punto flotante para las funciones de fitness. De esta manera, una paralelización cuidadosa de estas funciones utilizando la tecnología de hardware reconfigurable acelerará el cálculo, especialmente si se aplican en paralelo a varias soluciones de la población. RESULTADOS: una paralelización de grano fino de dos funciones de aptitud de punto flotante de diferentes complejidades y características involucradas en el biclustering de los datos de expresión génica y la selección de genes para la clasificación del cáncer permitió obtener mayores aceleraciones y cómputos de potencia reducida con respecto a los microprocesadores habituales. CONCLUSIONES: Los resultados muestran mejores rendimientos utilizando tecnología de hardware reconfigurable en lugar de los microprocesadores habituales, en términos de tiempo de consumo y consumo de energía, no solo debido a la paralelización de las operaciones aritméticas, sino también gracias a la evaluación de aptitud concurrente para varios individuos de la población en La metaheurística. Esta es una buena base para crear soluciones aceleradas y de bajo consumo de energía para escenarios informáticos intensivos.BACKGROUND: Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. RESULTS: A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. CONCLUSIONS: The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.• Ministerio de Economía y Competitividad y Fondos FEDER. Contrato TIN2012-30685 (I+D+i)
• Gobierno de Extremadura. Ayuda GR15011 para grupos TIC015
• CONICYT/FONDECYT/REGULAR/1160455. Beca para Ricardo Soto Guzmán
• CONICYT/FONDECYT/REGULAR/1140897. Beca para Broderick CrawfordpeerReviewe
Biclustering of Gene Expression Data by Correlation-Based Scatter Search
BACKGROUND: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database