1,857 research outputs found

    Zipf's Law in Gene Expression

    Get PDF
    Using data from gene expression databases on various organisms and tissues, including yeast, nematodes, human normal and cancer tissues, and embryonic stem cells, we found that the abundances of expressed genes exhibit a power-law distribution with an exponent close to -1, i.e., they obey Zipf's law. Furthermore, by simulations of a simple model with an intra-cellular reaction network, we found that Zipf's law of chemical abundance is a universal feature of cells where such a network optimizes the efficiency and faithfulness of self-reproduction. These findings provide novel insights into the nature of the organization of reaction dynamics in living cells.Comment: revtex, 11 pages, 3 figures, submitted to Phys. Rev. Let

    STATISTICAL METHODS FOR THE ANALYSIS OF CANCER GENOME SEQUENCING DATA

    Get PDF
    The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied

    Simcluster: clustering enumeration gene expression data on the simplex space

    Get PDF
    Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data

    Genomic run-on evaluates transcription rates for all yeast genes and identifies gene regulatory mechanisms

    Get PDF
    Most studies of eukaryotic gene regulation have been done looking at mature mRNA levels. Nevertheless, the steady-state mRNA level is the result of two opposing factors: transcription rate (TR) and mRNA degradation. Both can be important points to regulate gene expression. Here we show a new method that combines the use of nylon macroarrays and in vivo radioactive labeling of nascent RNA to quantify TRs, mRNA levels, and mRNA stabilities for all the S. cerevisiae genes. We found that during the shift from glucose to galactose, most genes undergo drastic changes in TR and mRNA stability. However, changes in mRNA levels are less pronounced. Some genes, such as those encoding mitochondrial proteins, are coordinately regulated in mRNA stability behaving as decay regulons. These results indicate that, although TR is the main determinant of mRNA abundance in yeast, modulation of mRNA stability is a key factor for gene regulation

    Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models

    Get PDF
    Advances in sequencing technology are resulting in the rapid emergence of large numbers of complete genome sequences. High throughput annotation and metabolic modeling of these genomes is now a reality. The high throughput reconstruction and analysis of genome-scale transcriptional regulatory networks represents the next frontier in microbial bioinformatics. The fruition of this next frontier will depend upon the integration of numerous data sources relating to mechanisms, components, and behavior of the transcriptional regulatory machinery, as well as the integration of the regulatory machinery into genome-scale cellular models. Here we review existing repositories for different types of transcriptional regulatory data, including expression data, transcription factor data, and binding site locations, and we explore how these data are being used for the reconstruction of new regulatory networks. From template network based methods to de novo reverse engineering from expression data, we discuss how regulatory networks can be reconstructed and integrated with metabolic models to improve model predictions and performance. Finally, we explore the impact these integrated models can have in simulating phenotypes, optimizing the production of compounds of interest or paving the way to a whole-cell model.J.P.F. acknowledges funding from [SFRH/BD/70824/2010] of the FCT (Portuguese Foundation for Science and Technology) PhD program. The work was supported in part by the ERDF—European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness), National Funds through the FCT within projects [FCOMP-01-0124-FEDER015079] (ToMEGIM—Computational Tools for Metabolic Engineering using Genome-scale Integrated Models) and FCOMP-01-0124-FEDER009707 (HeliSysBio—molecular Systems Biology in Helicobacter pylori), the U.S. Department of Energy under contract [DE-ACO2-06CH11357] and the National Science Foundation under [0850546]

    Somatic mutations in the chromatin remodeling gene ARID1A occur in several tumor types

    Get PDF
    Mutations in the chromatin remodeling gene ARID1A have recently been identified in the majority of ovarian clear cell carcinomas (OCCCs). To determine the prevalence of mutations in other tumor types, we evaluated 759 malignant neoplasms including those of the pancreas, breast, colon, stomach, lung, prostate, brain, and blood (leukemias). We identified truncating mutations in 6% of the neoplasms studied; nontruncating somatic mutations were identified in an additional 0.4% of neoplasms. Mutations were most commonly found in gastrointestinal samples with 12 of 119 (10%) colorectal and 10 of 100 (10%) gastric neoplasms, respectively, harboring changes. More than half of the mutated colorectal and gastric cancers displayed microsatellite instability (MSI) and the mutations in these tumors were out‐of‐frame insertions or deletions at mononucleotide repeats. Mutations were also identified in 2–8% of tumors of the pancreas, breast, brain (medulloblastomas), prostate, and lung, and none of these tumors displayed MSI. These findings suggest that the aberrant chromatin remodeling consequent to ARID1A inactivation contributes to a variety of different types of neoplasms. Hum Mutat 33:100–103, 2012. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/89516/1/humu_21633_sm_Mat.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/89516/2/21633_ftp.pd

    Analysis of the functional repertoire of a mutant form of survivin, K129E, which has been linked to lung cancer

    Get PDF
    Background Survivin is a protein that is normally present only in G2 and M-phases in somatic cells, however, in cancer cells, it is expressed throughout the cell cycle. A prosurvival factor, survivin is both an inhibitor of apoptosis and an essential mitotic protein, thus it has attracted much attention as a target for new oncotherapies. Despite its prevalence in cancer, reports of survivin mutations have mostly been restricted to loci within its promoter, which increase the abundance of the protein. To date the only published mutation within the coding sequence is an adenine > guanine substitution in exon 4. This polymorphism, which was found in a cohort of Korean lung cancer patients, causes a lysine > glutamic acid mutation (K129E) in the protein. However, whether it plays a causative role in cancer has not been addressed. Methods Using site directed mutagenesis we recapitulate K129E expression in cultured human cells and assess its anti-apoptotic and mitotic activities. Results K129E retains its anti-apoptotic activity, but causes errors in mitosis and cytokinesis, which may be linked to its reduced affinity for borealin. Conclusion K129E expression can induce genomic instability by introducing mitotic aberrations, thus it may play a causative role in cancer
    corecore