Search CORE

White Rose Research Online

Regulatory motif discovery using a population clustering evolutionary algorithm

Author: Lones Michael A.
Tyrrell Andy M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2007
Field of study

This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

Extracting expression modules from perturbational gene expression compendia

Author: A Joshi
A Prelić
A Tanay
A Tanay
AL Barabási
AW Rives
C Stark
CE Horak
CT Harbison
D Pe'er
DJ Reiss
Dk Lee
E Ragni
E Ravasz
E Segal
E Segal
G Getz
G Lesage
GD Bader
GK Smyth
H Kitano
I Laloux
I Laloux
J Ihmels
J Ihmels
J Supper
JA Ubersax
JDJ Han
L Lazzeroni
LA Amaral
LF Wu
LH Hartwell
M Ashburner
M Gaisne
M Halkidi
M Schmid
Martin Kuiper
MB Eisen
MG Walker
MZ Bao
N Bolshakova
N Metropolis
P D'haeseleer
Patrick Van Dijck
Q Sheng
R Albert
R Shamir
R Tanaka
S Barkow
S Bergmann
S Bergmann
S Erdman
S Hohmann
S Kirkpatrick
S Maere
SC Madeira
SK Kim
Steven Maere
T Ideker
T Michoel
TR Hughes
W Zhang
X Cui
Y Benjamini
Y Cheng
Y Kluger
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Compendia of gene expression profiles under chemical and genetic perturbations constitute an invaluable resource from a systems biology perspective. However, the perturbational nature of such data imposes specific challenges on the computational methods used to analyze them. In particular, traditional clustering algorithms have difficulties in handling one of the prominent features of perturbational compendia, namely partial coexpression relationships between genes. Biclustering methods on the other hand are specifically designed to capture such partial coexpression patterns, but they show a variety of other drawbacks. For instance, some biclustering methods are less suited to identify overlapping biclusters, while others generate highly redundant biclusters. Also, none of the existing biclustering tools takes advantage of the staple of perturbational expression data analysis: the identification of differentially expressed genes. Results We introduce a novel method, called ENIGMA, that addresses some of these issues. ENIGMA leverages differential expression analysis results to extract expression modules from perturbational gene expression data. The core parameters of the ENIGMA clustering procedure are automatically optimized to reduce the redundancy between modules. In contrast to the biclusters produced by most other methods, ENIGMA modules may show internal substructure, i.e. subsets of genes with distinct but significantly related expression patterns. The grouping of these (often functionally) related patterns in one module greatly aids in the biological interpretation of the data. We show that ENIGMA outperforms other methods on artificial datasets, using a quality criterion that, unlike other criteria, can be used for algorithms that generate overlapping clusters and that can be modified to take redundancy between clusters into account. Finally, we apply ENIGMA to the Rosetta compendium of expression profiles for <it>Saccharomyces cerevisiae </it>and we analyze one pheromone response-related module in more detail, demonstrating the potential of ENIGMA to generate detailed predictions. Conclusion It is increasingly recognized that perturbational expression compendia are essential to identify the gene networks underlying cellular function, and efforts to build these for different organisms are currently underway. We show that ENIGMA constitutes a valuable addition to the repertoire of methods to analyze such data.</p

Springer - Publisher Connector

Ghent University Academic Bibliography

The Jackson Laboratory: The Mouseion at the JAXlibrary

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Author: Coller Hilary A
Flamholz Avi I
Hibbs Matthew A
Huttenhower Curtis
Landis Jessica N
Myers Chad L
Olszewski Kellen L
Sahi Sauhard
Siemers Nathan O
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p

Springer - Publisher Connector

Construction, visualisation, and clustering of transcription networks from microarray expression data.

Author: Brosch Markus
Enright Anton J
Freeman Tom C
Freilich Shiri
Goldovsky Leon
Grocock Russell J
Mazière Pierre
Thornton Janet
van Dongen Stijn
Publication venue: PLoS Comput Biol
Publication date: 01/01/2007
Field of study

Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D)

CiteSeerX

Edinburgh Research Explorer

Apollo (Cambridge)

From Machine Learning to Learning Machines - A Perspective toward Personalized Medicine

Author: Malay Bhattacharyya
Publication venue
Publication date: 03/04/2012
Field of study

We describe how to learn a network using a bottom-up approach by building networks from expression profiles. Then we can analyze these networks with different graph mining approaches and by studying topological behaviors. Finally, how we can achieve personalized medicine from the network biology

Helmholtz Zentrum für Infektionsforschung Repository

Nature Precedings

Microevolution of Group A Streptococci In Vivo: Capturing Regulatory Networks Engaged in Sociomicrobiology, Niche Adaptation, and Hypervirulence

Author: Bruce J. Aronow
Gursharan S. Chhatwal
Malak Kotb
Mark J. Walker
Michael Kubal
Niyaz Ahmed
Ramy K. Aziz
Rita Kansal
Sarah L. Rowe
William L. Taylor
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The onset of infection and the switch from primary to secondary niches are dramatic environmental changes that not only alter bacterial transcriptional programs, but also perturb their sociomicrobiology, often driving minor subpopulations with mutant phenotypes to prevail in specific niches. Having previously reported that M1T1 Streptococcus pyogenes become hypervirulent in mice due to selection of mutants in the covRS regulatory genes, we set out to dissect the impact of these mutations in vitro and in vivo from the impact of other adaptive events. Using a murine subcutaneous chamber model to sample the bacteria prior to selection or expansion of mutants, we compared gene expression dynamics of wild type (WT) and previously isolated animal-passaged (AP) covS mutant bacteria both in vitro and in vivo, and we found extensive transcriptional alterations of pathoadaptive and metabolic gene sets associated with invasion, immune evasion, tissue-dissemination, and metabolic reprogramming. In contrast to the virulence-associated differences between WT and AP bacteria, Phenotype Microarray analysis showed minor in vitro phenotypic differences between the two isogenic variants. Additionally, our results reflect that WT bacteria's rapid host-adaptive transcriptional reprogramming was not sufficient for their survival, and they were outnumbered by hypervirulent covS mutants with SpeB−/Sdahigh phenotype, which survived up to 14 days in mice chambers. Our findings demonstrate the engagement of unique regulatory modules in niche adaptation, implicate a critical role for bacterial genetic heterogeneity that surpasses transcriptional in vivo adaptation, and portray the dynamics underlying the selection of hypervirulent covS mutants over their parental WT cells

Public Library of Science (PLOS)