258 research outputs found

    High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur

    Get PDF
    Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.

    RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

    Get PDF
    Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods

    Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle.

    Get PDF
    Despite half a century of research, the biology of dinoflagellates remains enigmatic: they defy many functional and genetic traits attributed to typical eukaryotic cells. Genomic approaches to study dinoflagellates are often stymied due to their large, multi-gigabase genomes. Members of the genus Symbiodinium are photosynthetic endosymbionts of stony corals that provide the foundation of coral reef ecosystems. Their smaller genome sizes provide an opportunity to interrogate evolution and functionality of dinoflagellate genomes and endosymbiosis. We sequenced the genome of the ancestral Symbiodinium microadriaticum and compared it to the genomes of the more derived Symbiodinium minutum and Symbiodinium kawagutii and eukaryote model systems as well as transcriptomes from other dinoflagellates. Comparative analyses of genome and transcriptome protein sets show that all dinoflagellates, not only Symbiodinium, possess significantly more transmembrane transporters involved in the exchange of amino acids, lipids, and glycerol than other eukaryotes. Importantly, we find that only Symbiodinium harbor an extensive transporter repertoire associated with the provisioning of carbon and nitrogen. Analyses of these transporters show species-specific expansions, which provides a genomic basis to explain differential compatibilities to an array of hosts and environments, and highlights the putative importance of gene duplications as an evolutionary mechanism in dinoflagellates and Symbiodinium

    Computational analyses of eukaryotic promoters

    Get PDF
    Computational analysis of eukaryotic promoters is one of the most difficult problems in computational genomics and is essential for understanding gene expression profiles and reverse-engineering gene regulation network circuits. Here I give a basic introduction of the problem and recent update on both experimental and computational approaches. More details may be found in the extended references. This review is based on a summer lecture given at Max Planck Institute at Berlin in 2005

    E2F5 status significantly improves malignancy diagnosis of epithelial ovarian cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ovarian epithelial cancer (OEC) usually presents in the later stages of the disease. Factors, especially those associated with cell-cycle genes, affecting the genesis and tumour progression for ovarian cancer are largely unknown. We hypothesized that over-expressed transcription factors (TFs), as well as those that are driving the expression of the OEC over-expressed genes, could be the key for OEC genesis and potentially useful tissue and serum markers for malignancy associated with OEC.</p> <p>Methods</p> <p>Using a combination of computational (selection of candidate TF markers and malignancy prediction) and experimental approaches (tissue microarray and western blotting on patient samples) we identified and evaluated E2F5 transcription factor involved in cell proliferation, as a promising candidate regulatory target in early stage disease. Our hypothesis was supported by our tissue array experiments that showed E2F5 expression only in OEC samples but not in normal and benign tissues, and by significantly positively biased expression in serum samples done using western blotting studies.</p> <p>Results</p> <p>Analysis of clinical cases shows that of the E2F5 status is characteristic for a different population group than one covered by CA125, a conventional OEC biomarker. E2F5 used in different combinations with CA125 for distinguishing malignant cyst from benign cyst shows that the presence of CA125 or E2F5 increases sensitivity of OEC detection to 97.9% (an increase from 87.5% if only CA125 is used) and, more importantly, the presence of both CA125 and E2F5 increases specificity of OEC to 72.5% (an increase from 55% if only CA125 is used). This significantly improved accuracy suggests possibility of an improved diagnostics of OEC. Furthermore, detection of malignancy status in 86 cases (38 benign, 48 early and late OEC) shows that the use of E2F5 status in combination with other clinical characteristics allows for an improved detection of malignant cases with sensitivity, specificity, F-measure and accuracy of 97.92%, 97.37%, 97.92% and 97.67%, respectively.</p> <p>Conclusions</p> <p>Overall, our findings, in addition to opening a realistic possibility for improved OEC diagnosis, provide an indirect evidence that a cell-cycle regulatory protein E2F5 might play a significant role in OEC pathogenesis.</p

    Semantic prioritization of novel causative genomic variants

    Get PDF
    Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.NS was funded by Wellcome Trust (Grant 100585/Z/12/Z) and the National Institute for Health Research Cambridge Biomedical Research Centre. IB, RBMR, MK, YH, VBB, RH were funded by the King Abdullah University of Science and Technology. GVG acknowledges funding from the National Science Foundation (NSF grant number: IOS-1340112) and the European Commision H2020 (Grant Agreement No. 731075)

    ContDist: a tool for the analysis of quantitative gene and promoter properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The understanding of how promoter regions regulate gene expression is complicated and far from being fully understood. It is known that histones' regulation of DNA compactness, DNA methylation, transcription factor binding sites and CpG islands play a role in the transcriptional regulation of a gene. Many high-throughput techniques exist nowadays which permit the detection of epigenetic marks and regulatory elements in the promoter regions of thousands of genes. However, so far the subsequent analysis of such experiments (e.g. the resulting gene lists) have been hampered by the fact that currently no tool exists for a detailed analysis of the promoter regions.</p> <p>Results</p> <p>We present ContDist, a tool to statistically analyze quantitative gene and promoter properties. The software includes approximately 200 quantitative features of gene and promoter regions for 7 commonly studied species. In contrast to "traditionally" ontological analysis which only works on qualitative data, all the features in the underlying annotation database are quantitative gene and promoter properties.</p> <p>Utilizing the strong focus on the promoter region of this tool, we show its usefulness in two case studies; the first on differentially methylated promoters and the second on the fundamental differences between housekeeping and tissue specific genes. The two case studies allow both the confirmation of recent findings as well as revealing previously unreported biological relations.</p> <p>Conclusion</p> <p>ContDist is a new tool with two important properties: 1) it has a strong focus on the promoter region which is usually disregarded by virtually all ontology tools and 2) it uses quantitative (continuously distributed) features of the genes and its promoter regions which are not available in any other tool. ContDist is available from <url>http://web.bioinformatics.cicbiogune.es/CD/ContDistribution.php</url></p

    Estrogen-Dependent Gene Expression in the Mouse Ovary

    Get PDF
    Estrogen (E) plays a pivotal role in regulating the female reproductive system, particularly the ovary. However, the number and type of ovarian genes influenced by estrogen remain to be fully elucidated. In this study, we have utilized wild-type (WT) and aromatase knockout (ArKO; estrogen free) mouse ovaries as an in vivo model to profile estrogen dependent genes. RNA from each individual ovary (n = 3) was analyzed by a microarray-based screen using Illumina Sentrix Mouse WG-6 BeadChip (45,281 transcripts). Comparative analysis (GeneSpring) showed differential expression profiles of 450 genes influenced by E, with 291 genes up-regulated and 159 down-regulated by 2-fold or greater in the ArKO ovary compared to WT. Genes previously reported to be E regulated in ArKO ovaries were confirmed, in addition to novel genes not previously reported to be expressed or regulated by E in the ovary. Of genes involved in 5 diverse functional processes (hormonal processes, reproduction, sex differentiation and determination, apoptosis and cellular processes) 78 had estrogen-responsive elements (ERE). These analyses define the transcriptome regulated by E in the mouse ovary. Further analysis and investigation will increase our knowledge pertaining to how E influences follicular development and other ovarian functions

    Gene prediction in metagenomic fragments: A large scale machine learning approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions.</p> <p>Results</p> <p>We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability.</p> <p>Conclusion</p> <p>Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p

    Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

    Get PDF
    BACKGROUND: A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. RESULTS: Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. CONCLUSION: We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins
    corecore