142 research outputs found

    Extensive Copy-Number Variation of Young Genes across Stickleback Populations

    Get PDF
    MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Genome Majority Vote Improves Gene Predictions

    Get PDF
    Recent studies have noted extensive inconsistencies in gene start sites among orthologous genes in related microbial genomes. Here we provide the first documented evidence that imposing gene start consistency improves the accuracy of gene start-site prediction. We applied an algorithm using a genome majority vote (GMV) scheme to increase the consistency of gene starts among orthologs. We used a set of validated Escherichia coli genes as a standard to quantify accuracy. Results showed that the GMV algorithm can correct hundreds of gene prediction errors in sets of five or ten genomes while introducing few errors. Using a conservative calculation, we project that GMV would resolve many inconsistencies and errors in publicly available microbial gene maps. Our simple and logical solution provides a notable advance toward accurate gene maps

    A Meta-Analysis of Microarray Gene Expression in Mouse Stem Cells: Redefining Stemness

    Get PDF
    While much progress has been made in understanding stem cell (SC) function, a complete description of the molecular mechanisms regulating SCs is not yet established. This lack of knowledge is a major barrier holding back the discovery of therapeutic uses of SCs. We investigated the value of a novel meta-analysis of microarray gene expression in mouse SCs to aid the elucidation of regulatory mechanisms common to SCs and particular SC types.We added value to previously published microarray gene expression data by characterizing the promoter type likely to regulate transcription. Promoters of up-regulated genes in SCs were characterized in terms of alternative promoter (AP) usage and CpG-richness, with the aim of correlating features known to affect transcriptional control with SC function. We found that SCs have a higher proportion of up-regulated genes using CpG-rich promoters compared with the negative controls. Comparing subsets of SC type with the controls a slightly different story unfolds. The differences between the proliferating adult SCs and the embryonic SCs versus the negative controls are statistically significant. Whilst the difference between the quiescent adult SCs compared with the negative controls is not. On examination of AP usage, no difference was observed between SCs and the controls. However, comparing the subsets of SC type with the controls, the quiescent adult SCs are found to up-regulate a larger proportion of genes that have APs compared to the controls and the converse is true for the proliferating adult SCs and the embryonic SCs.These findings suggest that looking at features associated with control of transcription is a promising future approach for characterizing “stemness” and that further investigations of stemness could benefit from separate considerations of different SC states. For example, “proliferating-stemness” is shown here, in terms of promoter usage, to be distinct from “quiescent-stemness”

    Ortho2ExpressMatrix—a web server that interprets cross-species gene expression data by gene family information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of gene families is pivotal for the understanding of gene evolution across different organisms and such phylogenetic background is often used to infer biochemical functions of genes. Modern high-throughput experiments offer the possibility to analyze the entire transcriptome of an organism; however, it is often difficult to deduct functional information from that data.</p> <p>Results</p> <p>To improve functional interpretation of gene expression we introduce Ortho2ExpressMatrix, a novel tool that integrates complex gene family information, computed from sequence similarity, with comparative gene expression profiles of two pre-selected biological objects: gene families are displayed with two-dimensional matrices. Parameters of the tool are object type (two organisms, two individuals, two tissues, etc.), type of computational gene family inference, experimental meta-data, microarray platform, gene annotation level and genome build. Family information in Ortho2ExpressMatrix bases on computationally different protein family approaches such as EnsemblCompara, InParanoid, SYSTERS and Ensembl Family. Currently, respective all-against-all associations are available for five species: human, mouse, worm, fruit fly and yeast. Additionally, microRNA expression can be examined with respect to miRBase or TargetScan families. The visualization, which is typical for Ortho2ExpressMatrix, is performed as matrix view that displays functional traits of genes (differential expression) as well as sequence similarity of protein family members (BLAST e-values) in colour codes. Such translations are intended to facilitate the user's perception of the research object.</p> <p>Conclusions</p> <p>Ortho2ExpressMatrix integrates gene family information with genome-wide expression data in order to enhance functional interpretation of high-throughput analyses on diseases, environmental factors, or genetic modification or compound treatment experiments. The tool explores differential gene expression in the light of orthology, paralogy and structure of gene families up to the point of ambiguity analyses. Results can be used for filtering and prioritization in functional genomic, biomedical and systems biology applications. The web server is freely accessible at <url>http://bioinf-data.charite.de/o2em/cgi-bin/o2em.pl</url>.</p

    pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    Get PDF
    The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions.To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data.By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature

    The malignant phenotype in breast cancer is driven by eIF4A1-mediated changes in the translational landscape

    Get PDF
    Human mRNA DeXD/H-box helicases are ubiquitous molecular motors that are required for the majority of cellular processes that involve RNA metabolism. One of the most abundant is eIF4A, which is required during the initiation phase of protein synthesis to unwind regions of highly structured mRNA that would otherwise impede the scanning ribosome. Dysregulation of protein synthesis is associated with tumorigenesis, but little is known about the detailed relationships between RNA helicase function and the malignant phenotype in solid malignancies. Therefore, immunohistochemical analysis was performed on over 3000 breast tumors to investigate the relationship among expression of eIF4A1, the helicase-modulating proteins eIF4B, eIF4E and PDCD4, and clinical outcome. We found eIF4A1, eIF4B and eIF4E to be independent predictors of poor outcome in ER-negative disease, while in contrast, the eIF4A1 inhibitor PDCD4 was related to improved outcome in ER-positive breast cancer. Consistent with these data, modulation of eIF4A1, eIF4B and PCDC4 expression in cultured MCF7 cells all restricted breast cancer cell growth and cycling. The eIF4A1-dependent translatome of MCF7 cells was defined by polysome profiling, and was shown to be highly enriched for several classes of oncogenic genes, including G-protein constituents, cyclins and protein kinases, and for mRNAs with G/C-rich 5′UTRs with potential to form G-quadruplexes and with 3′UTRs containing microRNA target sites. Overall, our data show that dysregulation of mRNA unwinding contributes to the malignant phenotype in breast cancer via preferential translation of a class of genes involved in pro-oncogenic signaling at numerous levels. Furthermore, immunohistochemical tests are promising biomarkers for tumors sensitive to anti-helicase therapies

    The Tetraodon nigroviridis reference transcriptome: Developmental transition, length retention and microsynteny of long non-coding RNAs in a compact vertebrate genome

    Get PDF
    Pufferfish such as fugu and tetraodon carry the smallest genomes among all vertebrates and are ideal for studying genome evolution. However, comparative genomics using these species is hindered by the poor annotation of their genomes. We performed RNA sequencing during key stages of maternal to zygotic transition of Tetraodon nigroviridis and report its first developmental transcriptome. We assembled 61,033 transcripts (23,837 loci) representing 80% of the annotated gene models and 3816 novel coding transcripts from 2667 loci. We demonstrate the similarities of gene expression profiles between pufferfish and zebrafish during maternal to zygotic transition and annotated 1120 long non-coding RNAs (lncRNAs) many of which differentially expressed during development. The promoters for 60% of the assembled transcripts result validated by CAGE-seq. Despite the extreme compaction of the tetraodon genome and the dramatic loss of transposons, the length of lncRNA exons remain comparable to that of other vertebrates and a small set of lncRNAs appears enriched for transposable elements suggesting a selective pressure acting on lncRNAs length and composition. Finally, a set of lncRNAs are microsyntenic between teleost and vertebrates, which indicates potential regulatory interactions between lncRNAs and their flanking coding genes. Our work provides a fundamental molecular resource for vertebrate comparative genomics and embryogenesis studies

    Identifying Consensus Disease Pathways in Parkinson's Disease Using an Integrative Systems Biology Approach

    Get PDF
    Parkinson's disease (PD) has had six genome-wide association studies (GWAS) conducted as well as several gene expression studies. However, only variants in MAPT and SNCA have been consistently replicated. To improve the utility of these approaches, we applied pathway analyses integrating both GWAS and gene expression. The top 5000 SNPs (p<0.01) from a joint analysis of three existing PD GWAS were identified and each assigned to a gene. For gene expression, rather than the traditional comparison of one anatomical region between sets of patients and controls, we identified differentially expressed genes between adjacent Braak regions in each individual and adjusted using average control expression profiles. Over-represented pathways were calculated using a hyper-geometric statistical comparison. An integrated, systems meta-analysis of the over-represented pathways combined the expression and GWAS results using a Fisher's combined probability test. Four of the top seven pathways from each approach were identical. The top three pathways in the meta-analysis, with their corrected p-values, were axonal guidance (p = 2.8E-07), focal adhesion (p = 7.7E-06) and calcium signaling (p = 2.9E-05). These results support that a systems biology (pathway) approach will provide additional insight into the genetic etiology of PD and that these pathways have both biological and statistical support to be important in PD
    corecore