41 research outputs found

    Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

    Get PDF
    BACKGROUND: The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. RESULTS: Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. CONCLUSION: Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes

    Dynamic resolution of functionally related gene sets in response to acute heat stress

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Using a gene clustering strategy we determined intracellular pathway relationships within skeletal myotubes in response to an acute heat stress stimuli. Following heat shock, the transcriptome was analyzed by microarray in a temporal fashion to characterize the dynamic relationship of signaling pathways.</p> <p>Results</p> <p>Bioinformatics analyses exposed coordination of functionally-related gene sets, depicting mechanism-based responses to heat shock. Protein turnover-related pathways were significantly affected including protein folding, pre-mRNA processing, mRNA splicing, proteolysis and proteasome-related pathways. Many responses were transient, tending to normalize within 24 hours.</p> <p>Conclusion</p> <p>In summary, we show that the transcriptional response to acute cell stress is largely transient and proteosome-centric.</p

    Uncovering mechanisms of transcriptional regulations by systematic mining of cis regulatory elements with gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Contrary to the traditional biology approach, where the expression patterns of a handful of genes are studied at a time, microarray experiments enable biologists to study the expression patterns of many genes simultaneously from gene expression profile data and decipher the underlying hidden biological mechanism from the observed gene expression changes. While the statistical significance of the gene expression data can be deduced by various methods, the biological interpretation of the data presents a challenge.</p> <p>Results</p> <p>A method, called CisTransMine, is proposed to help infer the underlying biological mechanisms for the observed gene expression changes in microarray experiments. Specifically, this method will predict potential cis-regulatory elements in promoter regions which could regulate gene expression changes. This approach builds on the MotifADE method published in 2004 and extends it with two modifications: up-regulated genes and down-regulated genes are tested separately and in addition, tests have been implemented to identify combinations of transcription factors that work synergistically. The method has been applied to a genome wide expression dataset intended to study myogenesis in a mouse C2C12 cell differentiation model. The results shown here both confirm the prior biological knowledge and facilitate the discovery of new biological insights.</p> <p>Conclusion</p> <p>The results validate that the CisTransMine approach is a robust method to uncover the hidden transcriptional regulatory mechanisms that can facilitate the discovery of mechanisms of transcriptional regulation.</p

    Identification of Novel Genes and Pathways Regulating SREBP Transcriptional Activity

    Get PDF
    BACKGROUND: Lipid metabolism in mammals is orchestrated by a family of transcription factors called sterol regulatory element-binding proteins (SREBPs) that control the expression of genes required for the uptake and synthesis of cholesterol, fatty acids, and triglycerides. SREBPs are thus essential for insulin-induced lipogenesis and for cellular membrane homeostasis and biogenesis. Although multiple players have been identified that control the expression and activation of SREBPs, gaps remain in our understanding of how SREBPs are coordinated with other physiological pathways. METHODOLOGY: To identify novel regulators of SREBPs, we performed a genome-wide cDNA over-expression screen to identify proteins that might modulate the transcription of a luciferase gene driven from an SREBP-specific promoter. The results were verified through secondary biological assays and expression data were analyzed by a novel application of the Gene Set Enrichment Analysis (GSEA) method. CONCLUSIONS/SIGNIFICANCE: We screened 10,000 different cDNAs and identified a number of genes and pathways that have previously not been implicated in SREBP control and cellular cholesterol homeostasis. These findings further our understanding of lipid biology and should lead to new insights into lipid associated disorders

    DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq data

    No full text
    Heterogeneous tissues are frequently collected (e.g. blood, tumor etc.) from humans or model animals. Therefore mRNA-Seq samples are often heterogeneous with regard to those cell types, which render it difficult to distinguish whether gene expression variation reflects a shift in cell populations, a chang

    Type Package Title Deconvolution of Heterogeneous Tissue Samples for mRNA-Seq data Version 1.3.0 Date 2013-01-22

    No full text
    Description DeconSeq is an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It modeled expression levels from heterogeneous cell populations in mRNA-Seq as the weighted average of expression from different constituting cell types and predicted cell type proportions of single expression profiles. License GPL-2 R topics documented: DeconRNASeq-package.................................. 2 all.datasets.......................................... 3 array.proportions...................................... 3 array.signatures....................................... 4 condplot........................................... 4 datasets........................................... 5 decon.bootstrap....................................... 6 DeconRNASeq....................................... 6 fraction........................................... 8 liver_kidney......................................... 9 multiplot...........................................
    corecore