9 research outputs found

    Multi-scale approaches for the statistical analysis of microarray data (with an application to 3D vesicle tracking)

    Get PDF
    The recent developments in experimental methods for gene data analysis, called microarrays, provide the possibility of interrogating changes in the expression of a vast number of genes in cell or tissue cultures and thus in depth exploration of disease conditions. As part of an ongoing program of research in Guy A. Rutter (G.A.R.) laboratory, Department of Biochemistry, University of Bristol, UK, with support from the Welcome Trust, we study the impact of established and of potentially new methods to the statistical analysis of gene expression data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge.</p> <p>Results</p> <p>To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature.</p> <p>Conclusions</p> <p>We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC.</p

    Complex sense-antisense architecture of TNFAIP1/POLDIP2 on 17q11.2 represents a novel transcriptional structural-functional gene module involved in breast cancer progression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A sense-antisense gene pair (SAGP) is a gene pair where two oppositely transcribed genes share a common nucleotide sequence region. In eukaryotic genomes, SAGPs can be organized in complex sense-antisense architectures (CSAGAs) in which at least one sense gene shares loci with two or more antisense partners. As shown in several case studies, SAGPs may be involved in cancers, neurological diseases and complex syndromes. However, CSAGAs have not yet been characterized in the context of human disease or cancer.</p> <p>Results</p> <p>We characterize five genes (<it>TMEM97</it>, <it>IFT20</it>, <it>TNFAIP1</it>, <it>POLDIP2 </it>and <it>TMEM199</it>) organized in a CSAGA on 17q11.2 (we term this the <it>TNFAIP1/POLDIP2 </it>CSAGA) and demonstrate their strong and reproducible co-regulatory transcription pattern in breast cancer tumours. Genes of the <it>TNFAIP1/POLDIP2 </it>CSAGA are located inside the smallest region of recurrent amplification on 17q11.2 and their expression profile correlates with the DNA copy number of the region. Survival analysis of a group of 410 breast cancer patients revealed significant survival-associated individual genes and gene pairs in the <it>TNFAIP1/POLDIP2 </it>CSAGA. Moreover, several of the gene pairs associated with survival, demonstrated synergistic effects. Expression of genes-members of the <it>TNFAIP1/POLDIP2 </it>CSAGA also strongly correlated with expression of genes of <it>ERBB2 </it>core region of recurrent amplification on 17q12. We clearly demonstrate that the observed co-regulatory transcription profile of the <it>TNFAIP1/POLDIP2 </it>CSAGA is maintained not only by a DNA amplification mechanism, but also by chromatin remodelling and local transcription activation.</p> <p>Conclusion</p> <p>We have identified a novel <it>TNFAIP1/POLDIP2 </it>CSAGA and characterized its co-regulatory transcription profile in cancerous breast tissues. We suggest that the <it>TNFAIP1/POLDIP2 </it>CSAGA represents a clinically significant transcriptional structural-functional gene module associated with amplification of the genomic region on 17q11.2 and correlated with expression ERBB2 amplicon core genes in breast cancer. Co-expression pattern of this module correlates with histological grades and a poor prognosis in breast cancer when over-expressed. <it>TNFAIP1/POLDIP2 </it>CSAGA maps the risks of breast cancer relapse onto the complex genomic locus on 17q11.2.</p

    Macrostate Identification from Biomolecular Simulations through Time Series Analysis

    No full text
    This paper builds upon the need for a more descriptive and accurate understanding of the landscape of intermolecular interactions, particularly those involving macromolecules such as proteins. For this, we need methods that move away from the single conformation description of binding events, toward a descriptive free energy landscape where different macrostates can coexist. Molecular dynamics simulations and molecular mechanics Poisson–Boltzmann surface area (MM-PBSA) methods provide an excellent approach for such a dynamic description of the binding events. An alternative to the standard method of the statistical reporting of such results is proposed

    Transposon insertional mutagenesis in mice identifies human breast cancer susceptibility genes and signatures for stratification

    No full text
    Robust prognostic gene signatures and therapeutic targets are difficult to derive from expression profiling because of the significant heterogeneity within breast cancer (BC) subtypes. Here, we performed forward genetic screening in mice using Sleeping Beauty transposon mutagenesis to identify candidate BC driver genes in an unbiased manner, using a stabilized N-terminal truncated β-catenin gene as a sensitizer. We identified 134 mouse susceptibility genes from 129 common insertion sites within 34 mammary tumors. Of these, 126 genes were orthologous to protein-coding genes in the human genome (hereafter, human BC susceptibility genes, hBCSGs), 70% of which are previously reported cancer-associated genes, and ∼16% are known BC suppressor genes. Network analysis revealed a gene hub consisting of E1A binding protein P300 (EP300), CD44 molecule (CD44), neurofibromin (NF1) and phosphatase and tensin homolog (PTEN), which are linked to a significant number of mutated hBCSGs. From our survival prediction analysis of the expression of human BC genes in 2,333 BC cases, we isolated a six-gene-pair classifier that stratifies BC patients with high confidence into prognostically distinct low-, moderate-, and high-risk subgroups. Furthermore, we proposed prognostic classifiers identifying three basal and three claudin-low tumor subgroups. Intriguingly, our hBCSGs are mostly unrelated to cell cycle/mitosis genes and are distinct from the prognostic signatures currently used for stratifying BC patients. Our findings illustrate the strength and validity of integrating functional mutagenesis screens in mice with human cancer transcriptomic data to identify highly prognostic BC subtyping biomarkers.ASTAR (Agency for Sci., Tech. and Research, S’pore
    corecore