18,794 research outputs found
Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis
<p>Abstract</p> <p>Background</p> <p>One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself.</p> <p>Results</p> <p>We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets.</p> <p>Conclusion</p> <p>This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at <url>http://www.abcc.ncifcrf.gov/wps/wps_index.php</url></p
Transformation of metabolism with age and lifestyle in Antarctic seals: a case study of systems biology approach to cross-species microarray experiment
*_Background:_* The metabolic transformation that changes Weddell seal pups born on land into aquatic animals is not only interesting for the study of general biology, but it also provides a model for the acquired and congenital muscle disorders which are associated with oxygen metabolism in skeletal muscle. However, the analysis of gene expression in seals is hampered by the lack of specific microarrays and the very limited annotation of known Weddell seal (_Leptonychotes weddellii_) genes.

*_Results:_* Muscle samples from newborn, juvenile, and adult Weddell seals were collected during an Antarctic expedition. Extracted RNA was hybridized on Affymetrix Human Expression chips. Preliminary studies showed a detectable signal from at least 7000 probe sets present in all samples and replicates. Relative expression levels for these genes was used for further analysis of the biological pathways implicated in the metabolism transformation which occurs in the transition from newborn, to juvenile, to adult seals. Cytoskeletal remodeling, WNT signaling, FAK signaling, hypoxia-induced HIF1 activation, and insulin regulation were identified as being among the most important biological pathways involved in transformation. 

*_Conclusion:_* In spite of certain losses in specificity and sensitivity, the cross-species application of gene expression microarrays is capable of solving challenging puzzles in biology. A Systems Biology approach based on gene interaction patterns can compensate adequately for the lack of species-specific genomics information.

Recommended from our members
A systems biology design and implementation of novel bioinformatics software tools for high throughput gene expression analysis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Microarray technology has revolutionized the field of molecular biology by offering an efficient and cost effective platform for the simultaneous quantification of thousands of genes or even entire genomes in a single experiment. Unlike southern blotting, which is restricted to the measurement of one gene at-a-time, microarrays offer biologists with the opportunity to carry out genome-wide experiments in order to help them gain a systems level understanding of cell regulation and control. The application of bioinformatics in the milieu of gene expression analysis has attracted a great deal of attention in the recent past due to specific algorithms and software solutions that attempt to illustrate complex multidimensional microarray data in a biologically coherent fashion so that it can be understood by the biologist. This has given rise to some exciting prospects for deciphering microarray data, by helping us refine our comprehension pertinent to the underlying physiological dynamics of disease.
Although much progress is being made in the development of specialized bioinformatics software pipelines with the purpose of decoding large volumes of gene expression data in the context of systems biology, several loopholes exist. Perhaps most notable of these loopholes is the fact that there is an increasing demand for software solutions that specialize in automating the comparison of multiple gene expression profiles, derived from microarray experiments sharing a common biological theme. This is no doubt an important challenge, since common genes across different biological conditions having similar expression patterns are likely to be involved in the same biological process and hence, may share the same regulatory signatures. The potential benefits of this in refining our understanding of the physiology of disease are undeniable.
The research presented in this thesis provides a systematic walkthrough of a series of software pipelines developed for the purpose of streamlining gene expression analysis in a systems biology context. Firstly, we present BiSAn, a software tool that deciphers expression data from the perspective of transcriptional regulation. Following this, we present Genome Interaction Analyzer (GIA), which analyzes microarray data in the integrative framework of transcription factor binding sites, protein-protein interactions and molecular pathways. The final contribution is a software pipeline called MicroPath, which analyzes multiple sets of gene expression profiles and attempts to extract common regulatory signatures that may be implicating the biological question
Discovering study-specific gene regulatory networks
This article has been made available through the Brunel Open Access Publishing Fund.Microarrays are commonly used in biology because of their ability to simultaneously measure thousands of genes under different conditions. Due to their structure, typically containing a high amount of variables but far fewer samples, scalable network analysis techniques are often employed. In particular, consensus approaches have been recently used that combine multiple microarray studies in order to find networks that are more robust. The purpose of this paper, however, is to combine multiple microarray studies to automatically identify subnetworks that are distinctive to specific experimental conditions rather than common to them all. To better understand key regulatory mechanisms and how they change under different conditions, we derive unique networks from multiple independent networks built using glasso which goes beyond standard correlations. This involves calculating cluster prediction accuracies to detect the most predictive genes for a specific set of conditions. We differentiate between accuracies calculated using cross-validation within a selected cluster of studies (the intra prediction accuracy) and those calculated on a set of independent studies belonging to different study clusters (inter prediction accuracy). Finally, we compare our method's results to related state-of-the art techniques. We explore how the proposed pipeline performs on both synthetic data and real data (wheat and Fusarium). Our results show that subnetworks can be identified reliably that are specific to subsets of studies and that these networks reflect key mechanisms that are fundamental to the experimental conditions in each of those subsets
Error, reproducibility and sensitivity : a pipeline for data processing of Agilent oligonucleotide expression arrays
Background
Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples.
Results
We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log2 units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators.
Conclusions
This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells
Model-based clustering with data correction for removing artifacts in gene expression data
The NIH Library of Integrated Network-based Cellular Signatures (LINCS)
contains gene expression data from over a million experiments, using Luminex
Bead technology. Only 500 colors are used to measure the expression levels of
the 1,000 landmark genes measured, and the data for the resulting pairs of
genes are deconvolved. The raw data are sometimes inadequate for reliable
deconvolution leading to artifacts in the final processed data. These include
the expression levels of paired genes being flipped or given the same value,
and clusters of values that are not at the true expression level. We propose a
new method called model-based clustering with data correction (MCDC) that is
able to identify and correct these three kinds of artifacts simultaneously. We
show that MCDC improves the resulting gene expression data in terms of
agreement with external baselines, as well as improving results from subsequent
analysis.Comment: 28 page
How to understand the cell by breaking it: network analysis of gene perturbation screens
Modern high-throughput gene perturbation screens are key technologies at the
forefront of genetic research. Combined with rich phenotypic descriptors they
enable researchers to observe detailed cellular reactions to experimental
perturbations on a genome-wide scale. This review surveys the current
state-of-the-art in analyzing perturbation screens from a network point of
view. We describe approaches to make the step from the parts list to the wiring
diagram by using phenotypes for network inference and integrating them with
complementary data sources. The first part of the review describes methods to
analyze one- or low-dimensional phenotypes like viability or reporter activity;
the second part concentrates on high-dimensional phenotypes showing global
changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
- …