27 research outputs found

    ReCombine: A Suite of Programs for Detection and Analysis of Meiotic Recombination in Whole-Genome Datasets

    Get PDF
    In meiosis, the exchange of DNA between chromosomes by homologous recombination is a critical step that ensures proper chromosome segregation and increases genetic diversity. Products of recombination include reciprocal exchanges, known as crossovers, and non-reciprocal gene conversions or non-crossovers. The mechanisms underlying meiotic recombination remain elusive, largely because of the difficulty of analyzing large numbers of recombination events by traditional genetic methods. These traditional methods are increasingly being superseded by high-throughput techniques capable of surveying meiotic recombination on a genome-wide basis. Next-generation sequencing or microarray hybridization is used to genotype thousands of polymorphic markers in the progeny of hybrid yeast strains. New computational tools are needed to perform this genotyping and to find and analyze recombination events. We have developed a suite of programs, ReCombine, for using short sequence reads from next-generation sequencing experiments to genotype yeast meiotic progeny. Upon genotyping, the program CrossOver, a component of ReCombine, then detects recombination products and classifies them into categories based on the features found at each location and their distribution among the various chromatids. CrossOver is also capable of analyzing segregation data from microarray experiments or other sources. This package of programs is designed to allow even researchers without computational expertise to use high-throughput, whole-genome methods to study the molecular mechanisms of meiotic recombination

    HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

    Get PDF
    Background: High-throughput sequencing of an organism’s transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on prebuilt gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6 % of 39 splice sites and 1.4% of 59 splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available a

    RNA-seq analyses of blood-induced changes in gene expression in the mosquito vector species, Aedes aegypti

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hematophagy is a common trait of insect vectors of disease. Extensive genome-wide transcriptional changes occur in mosquitoes after blood meals, and these are related to digestive and reproductive processes, among others. Studies of these changes are expected to reveal molecular targets for novel vector control and pathogen transmission-blocking strategies. The mosquito <it>Aedes aegypti </it>(Diptera, Culicidae), a vector of Dengue viruses, Yellow Fever Virus (YFV) and Chikungunya virus (CV), is the subject of this study to look at genome-wide changes in gene expression following a blood meal.</p> <p>Results</p> <p>Transcriptional changes that follow a blood meal in <it>Ae. aegypti </it>females were explored using RNA-seq technology. Over 30% of more than 18,000 investigated transcripts accumulate differentially in mosquitoes at five hours after a blood meal when compared to those fed only on sugar. Forty transcripts accumulate only in blood-fed mosquitoes. The list of regulated transcripts correlates with an enhancement of digestive activity and a suppression of environmental stimuli perception and innate immunity. The alignment of more than 65 million high-quality short reads to the <it>Ae. aegypti </it>reference genome permitted the refinement of the current annotation of transcript boundaries, as well as the discovery of novel transcripts, exons and splicing variants. <it>Cis</it>-regulatory elements (CRE) and <it>cis</it>-regulatory modules (CRM) enriched significantly at the 5'end flanking sequences of blood meal-regulated genes were identified.</p> <p>Conclusions</p> <p>This study provides the first global view of the changes in transcript accumulation elicited by a blood meal in <it>Ae. aegypti </it>females. This information permitted the identification of classes of potentially co-regulated genes and a description of biochemical and physiological events that occur immediately after blood feeding. The data presented here serve as a basis for novel vector control and pathogen transmission-blocking strategies including those in which the vectors are modified genetically to express anti-pathogen effector molecules.</p

    IMSA: Integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background

    Get PDF
    Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18

    Datasets.

    No full text
    <p>*The 48-bp reads in the NCBI SRA set have a 2 bp initial barcode that was trimmed, resulting in 46 bp reads.</p><p>Datasets used for benchmark tests. For <i>H. sapiens</i> and <i>P. falciparum</i>, two times are given for TopHat. For <i>H. sapiens</i>, the longer time is with more sensitive settings, but the shorter time resulted in less than 5% fewer junctions at a similar specificity. For <i>P. falciparum</i>, the longer time is with more sensitive but less stringent settings whereas the shorter time is for the more stringent settings that resulted in significantly fewer junctions but with a much higher specificity.</p

    XBP1 non-canonical intron.

    No full text
    <p>HMMSplicer discovers the non-canonical <i>XBP1</i> intron. HMMSplicer identifies three reads containing the non-canonical CA-AG splice site in <i>XBP1</i>. Because the reads are fairly evenly split, both read-halves aligned to the genome. The edges identified by HMMSplicer are 2 and 4 bp off from the actual splice site because the sequence at the beginning of the intron repeats the sequence at the beginning of the subsequent exon. When identical junctions are collapsed, there are two junctions, one with a score of 1024 and one with a score of 1030, which puts them in the top 0.5% of the collapsed non-canonical junctions.</p

    HMMSplicer pipeline.

    No full text
    <p>After removing reads that have full-length alignments to the genome, reads are divided in half and aligned to the genome (step 1 as defined in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#s4" target="_blank">Materials and Methods</a>). The HMM is trained using a subset of the read-half alignments (step 2a). The HMM bins quality scores into five levels. Although only three levels are shown in this overview for simplification, the values for all five levels can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-t001" target="_blank">Table 1</a>. The trained HMM is then used to determine the splice position within each read-half alignment (step 2b). The remaining second piece of the read is then matched downstream to find the other intron edge (step 3). The initial set of splice junctions then proceed to rescue (step 4) and filter and collapse (step 5) to generate the final set of splice junctions.</p

    Overview of HMMSplicer and TopHat results in (a) <i>A. thaliana</i>, and (b) <i>P. falciparum</i> and (c) <i>H. sapiens</i>.

    No full text
    <p>For each dataset, HMMSplicer results are shown at five different score thresholds. The numbers on the bottom axis (200 to 600) are the thresholds for junctions with multiple reads; the threshold was set 200 points higher for junctions with a single read. The * indicates HMMSplicer's default score threshold. SpliceMap results are shown for the <i>A. thaliana</i> dataset only, as SpliceMap cannot be run datasets with reads less than 50 nt long. For <i>P. falciparum</i>, TopHat was run with two different parameter sets. TopHat A was run with a segment length of 23 resulting in more junctions but a lower specificity whereas TopHat B used the default segment length of 25 resulting in fewer junctions with more specificity.</p
    corecore