5 research outputs found
Biclustering of Gene Expression Data by Correlation-Based Scatter Search
BACKGROUND: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database
A mixture of physicochemical and evolutionary–based feature extraction approaches for protein fold recognition
Griffith Sciences, Griffith School of EngineeringFull Tex
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Biclustering sobre datos de expresión génica basado en búsqueda dispersa
Falta palabras claveLos datos de expresión génica, y su particular naturaleza e importancia,
motivan no sólo el desarrollo de nuevas técnicas sino la formulación de
nuevos problemas como el problema del biclustering. El biclustering es una
técnica de aprendizaje no supervisado que agrupa tanto genes como
condiciones. Este doble agrupamiento lo diferencia del clustering
tradicional sobre este tipo de datos ya que éste sólo agrupa o bien genes o
condiciones.
La presente tesis presenta un nuevo algoritmo de biclustering que permite
el estudio de distintos criterios de búsqueda. Dicho algoritmo utiliza
esquema de búsqueda dispersa, o scatter search, que independiza el
mecanismo de búsqueda del criterio empleado.
Se han estudiado tres criterios de búsqueda diferentes que motivan las tres
principales aportaciones de la tesis. En primer lugar se estudia la
correlación lineal entre los genes, que se integra como parte de la función
objetivo empleada por el algoritmo de biclustering. La correlación lineal
permite encontrar biclusters con patrones de desplazamiento y escalado, lo
que mejora propuestas anteriores. En segundo lugar, y motivado por el
significado biológico de los patrones de activación-inhibición entre genes,
se modifica la correlación lineal de manera que se contemplen estos
patrones. Por último, se ha tenido en cuenta la información disponible
sobre genes en repositorios públicos, como la ontologÃa de genes GO, y se
incorpora dicha información como parte del criterio de búsqueda. Se añade
un término extra que refleja, por cada bicluster que se evalúe, la calidad de
ese grupo de genes según su información almacenada en GO. Se estudian
dos posibilidades para dicho término de integración de información
biológica, se comparan entre sà y se comprueba que los resultados son
mejores cuando se usa información biológica en el algoritmo de
biclustering.
Las tres aportaciones descritas, junto con una serie de pasos intermedios,
han dado lugar a resultados publicados tanto en revistas como en
conferencias nacionales e internacionales