675 research outputs found

    Expedited batch processing and analysis of transposon insertions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion.</p> <p>Findings</p> <p>In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats.</p> <p>Conclusions</p> <p>The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p

    Linking microarray reporters with protein functions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways.</p> <p>Results</p> <p>This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways.</p> <p>Conclusion</p> <p>Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.</p

    Linkage mapping bovine EST-based SNP

    Get PDF
    BACKGROUND: Existing linkage maps of the bovine genome primarily contain anonymous microsatellite markers. These maps have proved valuable for mapping quantitative trait loci (QTL) to broad regions of the genome, but more closely spaced markers are needed to fine-map QTL, and markers associated with genes and annotated sequence are needed to identify genes and sequence variation that may explain QTL. RESULTS: Bovine expressed sequence tag (EST) and bacterial artificial chromosome (BAC)sequence data were used to develop 918 single nucleotide polymorphism (SNP) markers to map genes on the bovine linkage map. DNA of sires from the MARC reference population was used to detect SNPs, and progeny and mates of heterozygous sires were genotyped. Chromosome assignments for 861 SNPs were determined by twopoint analysis, and positions for 735 SNPs were established by multipoint analyses. Linkage maps of bovine autosomes with these SNPs represent 4585 markers in 2475 positions spanning 3058 cM . Markers include 3612 microsatellites, 913 SNPs and 60 other markers. Mean separation between marker positions is 1.2 cM. New SNP markers appear in 511 positions, with mean separation of 4.7 cM. Multi-allelic markers, mostly microsatellites, had a mean (maximum) of 216 (366) informative meioses, and a mean 3-lod confidence interval of 3.6 cM Bi-allelic markers, including SNP and other marker types, had a mean (maximum) of 55 (191) informative meioses, and were placed within a mean 8.5 cM 3-lod confidence interval. Homologous human sequences were identified for 1159 markers, including 582 newly developed and mapped SNP. CONCLUSION: Addition of these EST- and BAC-based SNPs to the bovine linkage map not only increases marker density, but provides connections to gene-rich physical maps, including annotated human sequence. The map provides a resource for fine-mapping quantitative trait loci and identification of positional candidate genes, and can be integrated with other data to guide and refine assembly of bovine genome sequence. Even after the bovine genome is completely sequenced, the map will continue to be a useful tool to link observable phenotypes and animal genotypes to underlying genes and molecular mechanisms influencing economically important beef and dairy traits

    Reliability analysis of the Ahringer Caenorhabditis elegans RNAi feeding library: a guide for genome-wide screens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Ahringer <it>C. elegans </it>RNAi feeding library prepared by cloning genomic DNA fragments has been widely used in genome-wide analysis of gene function. However, the library has not been thoroughly validated by direct sequencing, and there are potential errors, including: 1) mis-annotation (the clone with the retired gene name should be remapped to the actual target gene); 2) nonspecific PCR amplification; 3) cross-RNAi; 4) mis-operation such as sample loading error, <it>etc</it>.</p> <p>Results</p> <p>Here we performed a reliability analysis on the Ahringer <it>C. elegans </it>RNAi feeding library, which contains 16,256 bacterial strains, using a bioinformatics approach. Results demonstrated that most (98.3%) of the bacterial strains in the library are reliable. However, we also found that 2,851 (17.54%) bacterial strains need to be re-annotated even they are reliable. Most of these bacterial strains are the clones having the retired gene names. Besides, 28 strains are grouped into unreliable category and 226 strains are marginal because of probably expressing unrelated double-stranded RNAs (dsRNAs). The accuracy of the prediction was further confirmed by direct sequencing analysis of 496 bacterial strains. Finally, a freely accessible database named CelRNAi (<url>http://biocompute.bmi.ac.cn/CelRNAi/</url>) was developed as a valuable complement resource for the feeding RNAi library by providing the predicted information on all bacterial strains. Moreover, submission of the direct sequencing result or any other annotations for the bacterial strains to the database are allowed and will be integrated into the CelRNAi database to improve the accuracy of the library. In addition, we provide five candidate primer sets for each of the unreliable and marginal bacterial strains for users to construct an alternative vector for their own RNAi studies.</p> <p>Conclusions</p> <p>Because of the potential unreliability of the Ahringer <it>C. elegans </it>RNAi feeding library, we strongly suggest the user examine the reliability information of the bacterial strains in the CelRNAi database before performing RNAi experiments, as well as the post-RNAi experiment analysis.</p

    Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3

    Get PDF
    Background: Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described. Results: Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates. Conclusion: We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution

    Genomorama: genome visualization and analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to visualize genomic features and design experimental assays that can target specific regions of a genome is essential for modern biology. To assist in these tasks, we present Genomorama, a software program for interactively displaying multiple genomes and identifying potential DNA hybridization sites for assay design.</p> <p>Results</p> <p>Useful features of Genomorama include genome search by DNA hybridization (probe binding and PCR amplification), efficient multi-scale display and manipulation of multiple genomes, support for many genome file types and the ability to search for and retrieve data from the National Center for Biotechnology Information (NCBI) Entrez server.</p> <p>Conclusion</p> <p>Genomorama provides an efficient computational platform for visualizing and analyzing multiple genomes.</p

    Structural Heterogeneity and Quantitative FRET Efficiency Distributions of Polyprolines through a Hybrid Atomistic Simulation and Monte Carlo Approach

    Get PDF
    Förster Resonance Energy Transfer (FRET) experiments probe molecular distances via distance dependent energy transfer from an excited donor dye to an acceptor dye. Single molecule experiments not only probe average distances, but also distance distributions or even fluctuations, and thus provide a powerful tool to study biomolecular structure and dynamics. However, the measured energy transfer efficiency depends not only on the distance between the dyes, but also on their mutual orientation, which is typically inaccessible to experiments. Thus, assumptions on the orientation distributions and averages are usually made, limiting the accuracy of the distance distributions extracted from FRET experiments. Here, we demonstrate that by combining single molecule FRET experiments with the mutual dye orientation statistics obtained from Molecular Dynamics (MD) simulations, improved estimates of distances and distributions are obtained. From the simulated time-dependent mutual orientations, FRET efficiencies are calculated and the full statistics of individual photon absorption, energy transfer, and photon emission events is obtained from subsequent Monte Carlo (MC) simulations of the FRET kinetics. All recorded emission events are collected to bursts from which efficiency distributions are calculated in close resemblance to the actual FRET experiment, taking shot noise fully into account. Using polyproline chains with attached Alexa 488 and Alexa 594 dyes as a test system, we demonstrate the feasibility of this approach by direct comparison to experimental data. We identified cis-isomers and different static local environments as sources of the experimentally observed heterogeneity. Reconstructions of distance distributions from experimental data at different levels of theory demonstrate how the respective underlying assumptions and approximations affect the obtained accuracy. Our results show that dye fluctuations obtained from MD simulations, combined with MC single photon kinetics, provide a versatile tool to improve the accuracy of distance distributions that can be extracted from measured single molecule FRET efficiencies

    Id4 and FABP7 are preferentially expressed in cells with astrocytic features in oligodendrogliomas and oligoastrocytomas

    Get PDF
    BACKGROUND: Oligodendroglioma (ODG) and oligoastrocytoma (OAC) are diffusely infiltrating primary brain tumors whose pathogenesis remains unclear. We previously identified a group of genes whose expression was inversely correlated with survival in a cohort of patients with glioblastoma (GBM), and some of these genes are also reportedly expressed in ODG and OAC. We examined the expression patterns and localization of these survival-associated genes in ODG and OAC in order to analyze their possible roles in the oncogenesis of these two tumor types. METHODS: We used UniGene libraries derived from GBM and ODG specimens to examine the expression levels of the transcripts for each of the 50 GBM survival-associated genes. We used immunohistochemistry and cDNA microarrays to examine expression of selected survival-associated genes and Id4, a gene believed to control the timing of oligodendrocyte development. The expression of FABP7 and Id4 and the survival of patients with ODG and OAC were also analyzed. RESULTS: Transcripts of most survival-associated genes as well as Id4 were present in both GBM and ODG tumors, whereas protein expression of Id4 and one of the survival-associated genes, brain-type fatty acid-binding protein (FABP7), was present in cells with astrocytic features, including reactive and neoplastic astrocytes, but not in neoplastic oligodendrocytes. Id4 was co-expressed with FABP7 in microgemistocytes in ODG and in neoplastic astrocytes in OAC. Id4 and FABP7 expression, however, did not correlate with the clinical outcome of patients with ODG or OAC tumors. CONCLUSION: Expression of Id4 and some of our previously identified GBM survival-associated genes is present in developing or mature oligodendrocytes. However, protein expression of Id4 and FABP7 in GBM, ODG, and OAC suggests that this group of functionally important genes might demonstrate two patterns of expression in these glioma subtypes: one group is universally expressed in glioma cells, and the other group of genes is expressed primarily in neoplastic astrocytes but not in neoplastic oligodendrocytes. Differential protein expression of these two groups of genes in ODG and OAC may be related to the cellular origins and the histological features of the neoplastic cells
    corecore