130 research outputs found

    Analysis with respect to instrumental variables for the exploration of microarray data structures

    Get PDF
    BACKGROUND: Evaluating the importance of the different sources of variations is essential in microarray data experiments. Complex experimental designs generally include various factors structuring the data which should be taken into account. The objective of these experiments is the exploration of some given factors while controlling other factors. RESULTS: We present here a family of methods, the analyses with respect to instrumental variables, which can be easily applied to the particular case of microarray data. An illustrative example of analysis with instrumental variables is given in the case of microarray data investigating the effect of beverage intake on peripheral blood gene expression. This approach is compared to an ANOVA-based gene-by-gene statistical method. CONCLUSION: Instrumental variables analyses provide a simple way to control several sources of variation in a multivariate analysis of microarray data. Due to their flexibility, these methods can be associated with a large range of ordination techniques combined with one or several qualitative and/or quantitative descriptive variables

    Public data and open source tools for multi-assay genomic investigation of disease

    Get PDF
    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods

    Investigating the global genomic diversity of Escherichia coli using a multi-genome DNA microarray platform with novel gene prediction strategies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The gene content of a diverse group of 183 unique <it>Escherichia coli </it>and <it>Shigella </it>isolates was determined using the Affymetrix GeneChip<sup>® </sup><it>E. coli </it>Genome 2.0 Array, originally designed for transcriptome analysis, as a genotyping tool. The probe set design utilized by this array provided the opportunity to determine the gene content of each strain very accurately and reliably. This array constitutes 10,112 independent genes representing four individual <it>E. coli </it>genomes, therefore providing the ability to survey genes of several different pathogen types. The entire ECOR collection, 80 EHEC-like isolates, and a diverse set of isolates from our FDA strain repository were included in our analysis.</p> <p>Results</p> <p>From this study we were able to define sets of genes that correspond to, and therefore define, the EHEC pathogen type. Furthermore, our sampling of 63 unique strains of O157:H7 showed the ability of this array to discriminate between closely related strains. We found that individual strains of O157:H7 differed, on average, by 197 probe sets. Finally, we describe an analysis method that utilizes the power of the probe sets to determine accurately the presence/absence of each gene represented on this array.</p> <p>Conclusions</p> <p>These elements provide insights into understanding the microbial diversity that exists within extant <it>E. coli </it>populations. Moreover, these data demonstrate that this novel microarray-based analysis is a powerful tool in the field of molecular epidemiology and the newly emerging field of microbial forensics.</p

    Supervised multivariate analysis of sequence groups to identify specificity determining residues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments.</p> <p>Results</p> <p>We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids.</p> <p>Conclusion</p> <p>This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.</p

    Integrated Analysis of Multiple Microarray Datasets Identifies a Reproducible Survival Predictor in Ovarian Cancer

    Get PDF
    BACKGROUND: Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival. METHODOLOGY/PRINCIPAL FINDINGS: Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation ("batch-effect"). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2(nd) validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p < 0.01), 1(st) validation set (median OS 32 months versus not-yet-reached, p = 0.026) and 2(nd) validation set (median OS 43 versus 61 months, p = 0.013) maintaining independent prognostic power in multivariate analysis. There was strong molecular correspondence of the respective high- and low-risk tumors between training and 1(st) validation set. Low and high-risk tumors were enriched for favorable and unfavorable molecular subtypes and pathways, previously defined in the public 2(nd) validation set. CONCLUSIONS/SIGNIFICANCE: Integration of previously generated cancer microarray datasets may lead to robust and widely applicable survival predictors. These predictors are not simply a compilation of prognostic genes but appear to track true molecular phenotypes of good- and poor-outcome

    Stability of gene contributions and identification of outliers in multivariate analysis of microarray data

    Get PDF
    BACKGROUND: Multivariate ordination methods are powerful tools for the exploration of complex data structures present in microarray data. These methods have several advantages compared to common gene-by-gene approaches. However, due to their exploratory nature, multivariate ordination methods do not allow direct statistical testing of the stability of genes. RESULTS: In this study, we developed a computationally efficient algorithm for: i) the assessment of the significance of gene contributions and ii) the identification of sample outliers in multivariate analysis of microarray data. The approach is based on the use of resampling methods including bootstrapping and jackknifing. A statistical package of R functions was developed. This package includes tools for both inferring the statistical significance of gene contributions and identifying outliers among samples. CONCLUSION: The methodology was successfully applied to three published data sets with varying levels of signal intensities. Its relevance was compared with alternative methods. Overall, it proved to be particularly effective for the evaluation of the stability of microarray data

    A Resource for Discovering Specific and Universal Biomarkers for Distributed Stem Cells

    Get PDF
    Specific and universal biomarkers for distributed stem cells (DSCs) have been elusive. A major barrier to discovery of such ideal DSC biomarkers is difficulty in obtaining DSCs in sufficient quantity and purity. To solve this problem, we used cell lines genetically engineered for conditional asymmetric self-renewal, the defining DSC property. In gene microarray analyses, we identified 85 genes whose expression is tightly asymmetric self-renewal associated (ASRA). The ASRA gene signature prescribed DSCs to undergo asymmetric self-renewal to a greater extent than committed progenitor cells, embryonic stem cells, or induced pluripotent stem cells. This delineation has several significant implications. These include: 1) providing experimental evidence that DSCs in vivo undergo asymmetric self-renewal as individual cells; 2) providing an explanation why earlier attempts to define a common gene expression signature for DSCs were unsuccessful; and 3) predicting that some ASRA proteins may be ideal biomarkers for DSCs. Indeed, two ASRA proteins, CXCR6 and BTG2, and two other related self-renewal pattern associated (SRPA) proteins identified in this gene resource, LGR5 and H2A.Z, display unique asymmetric patterns of expression that have a high potential for universal and specific DSC identification

    Combinations of newly confirmed Glioma-Associated loci link regions on chromosomes 1 and 9 to increased disease risk

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Glioblastoma multiforme (GBM) tends to occur between the ages of 45 and 70. This relatively early onset and its poor prognosis make the impact of GBM on public health far greater than would be suggested by its relatively low frequency. Tissue and blood samples have now been collected for a number of populations, and predisposing alleles have been sought by several different genome-wide association (GWA) studies. The Cancer Genome Atlas (TCGA) at NIH has also collected a considerable amount of data. Because of the low concordance between the results obtained using different populations, only 14 predisposing single nucleotide polymorphism (SNP) candidates in five genomic regions have been replicated in two or more studies. The purpose of this paper is to present an improved approach to biomarker identification.</p> <p>Methods</p> <p>Association analysis was performed with control of population stratifications using the EIGENSTRAT package, under the null hypothesis of "no association between GBM and control SNP genotypes," based on an additive inheritance model. Genes that are strongly correlated with identified SNPs were determined by linkage disequilibrium (LD) or expression quantitative trait locus (eQTL) analysis. A new approach that combines meta-analysis and pathway enrichment analysis identified additional genes.</p> <p>Results</p> <p>(i) A meta-analysis of SNP data from TCGA and the Adult Glioma Study identifies 12 predisposing SNP candidates, seven of which are reported for the first time. These SNPs fall in five genomic regions (5p15.33, 9p21.3, 1p21.2, 3q26.2 and 7p15.3), three of which have not been previously reported. (ii) 25 genes are strongly correlated with these 12 SNPs, eight of which are known to be cancer-associated. (iii) The relative risk for GBM is highest for risk allele combinations on chromosomes 1 and 9. (iv) A combined meta-analysis/pathway analysis identified an additional four genes. All of these have been identified as cancer-related, but have not been previously associated with glioma. (v) Some SNPs that do not occur reproducibly across populations are in reproducible (invariant) pathways, suggesting that they affect the same biological process, and that population discordance can be partially resolved by evaluating processes rather than genes.</p> <p>Conclusion</p> <p>We have uncovered 29 glioma-associated gene candidates; 12 of them known to be cancer related (<it>p </it>= 1. 4 × 10<sup>-6</sup>), providing additional statistical support for the relevance of the new candidates. This additional information on risk loci is potentially important for identifying Caucasian individuals at risk for glioma, and for assessing relative risk.</p

    Therapeutic Implications of GIPC1 Silencing in Cancer

    Get PDF
    GIPC1 is a cytoplasmic scaffold protein that interacts with numerous receptor signaling complexes, and emerging evidence suggests that it plays a role in tumorigenesis. GIPC1 is highly expressed in a number of human malignancies, including breast, ovarian, gastric, and pancreatic cancers. Suppression of GIPC1 in human pancreatic cancer cells inhibits in vivo tumor growth in immunodeficient mice. To better understand GIPC1 function, we suppressed its expression in human breast and colorectal cancer cell lines and human mammary epithelial cells (HMECs) and assayed both gene expression and cellular phenotype. Suppression of GIPC1 promotes apoptosis in MCF-7, MDA-MD231, SKBR-3, SW480, and SW620 cells and impairs anchorage-independent colony formation of HMECs. These observations indicate GIPC1 plays an essential role in oncogenic transformation, and its expression is necessary for the survival of human breast and colorectal cancer cells. Additionally, a GIPC1 knock-down gene signature was used to interrogate publically available breast and ovarian cancer microarray datasets. This GIPC1 signature statistically correlates with a number of breast and ovarian cancer phenotypes and clinical outcomes, including patient survival. Taken together, these data indicate that GIPC1 inhibition may represent a new target for therapeutic development for the treatment of human cancers

    SIGNATURE: A workbench for gene expression signature analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The biological phenotype of a cell, such as a characteristic visual image or behavior, reflects activities derived from the expression of collections of genes. As such, an ability to measure the expression of these genes provides an opportunity to develop more precise and varied sets of phenotypes. However, to use this approach requires computational methods that are difficult to implement and apply, and thus there is a critical need for intelligent software tools that can reduce the technical burden of the analysis. Tools for gene expression analyses are unusually difficult to implement in a user-friendly way because their application requires a combination of biological data curation, statistical computational methods, and database expertise.</p> <p>Results</p> <p>We have developed SIGNATURE, a web-based resource that simplifies gene expression signature analysis by providing software, data, and protocols to perform the analysis successfully. This resource uses Bayesian methods for processing gene expression data coupled with a curated database of gene expression signatures, all carried out within a GenePattern web interface for easy use and access.</p> <p>Conclusions</p> <p>SIGNATURE is available for public use at <url>http://genepattern.genome.duke.edu/signature/</url>.</p
    corecore