44 research outputs found

    Basecalling for Traces Derived for Multiple Templates

    Get PDF
    Three methods for analyzing sequencing traces derived from sequencing reactions containing two DNA templates are presented. All rely on alignment to a segment of assembled genomic sequence containing the original template sequence. Spliced alignment algorithms are used so that traces derived from processed mRNA can be analyzed. The main application of these techniques is the elucidation of alternately spliced transcripts. Several experimental verification of one of the techniques is presented including testing on a set of 48 alternately spliced targets from the human genome and 47 negative controls

    Gene prediction and verification in a compact genome with numerous small introns

    Get PDF
    The genomes of clusters of related eukaryotes are now being sequenced at an increasing rate, creating a need for accurate, low-cost annotation of exon–intron structures. In this paper, we demonstrate that reverse transcription-polymerase chain reaction (RT–PCR) and direct sequencing based on predicted gene structures satisfy this need, at least for single-celled eukaryotes. The TWINSCAN gene prediction algorithm was adapted for the fungal pathogen Cryptococcus neoformans by using a precise model of intron lengths in combination with ungapped alignments between the genome sequences of the two closely related Cryptococcus varieties. This approach resulted in ∼60% of known genes being predicted exactly right at every coding base and splice site. When previously unannotated TWINSCAN predictions were tested by RT–PCR and direct sequencing, 75% of targets spanning two predicted introns were amplified and produced high-quality sequence. When targets spanning the complete predicted open reading frame were tested, 72% of them amplified and produced high-quality sequence. We conclude that sequencing a small number of expressed sequence tags (ESTs) to provide training data, running TWINSCAN on an entire genome, and then performing RT–PCR and direct sequencing on all of its predictions would be a cost-effective method for obtaining an experimentally verified genome annotation

    Foregut microbiome in development of esophageal adenocarcinoma

    Get PDF
    Esophageal adenocarcinoma (EA), the type of cancer linked to heartburn due to gastroesophageal reflux diseases (GERD), has increased six fold in the past 30 years. This cannot currently be explained by the usual environmental or by host genetic factors. EA is the end result of a sequence of GERD-related diseases, preceded by reflux esophagitis (RE) and Barrett’s esophagus (BE). Preliminary studies by Pei and colleagues at NYU on elderly male veterans identified two types of microbiotas in the esophagus. Patients who carry the type II microbiota are >15 fold likely to have esophagitis and BE than those harboring the type I microbiota. In a small scale study, we also found that 3 of 3 cases of EA harbored the type II biota. The findings have opened a new approach to understanding the recent surge in the incidence of EA. 

Our long-term goal is to identify the cause of GERD sequence. The hypothesis to be tested is that changes in the foregut microbiome are associated with EA and its precursors, RE and BE in GERD sequence. We will conduct a case control study to demonstrate the microbiome disease association in every stage of GERD sequence, as well as analyze the trend in changes in the microbiome along disease progression toward EA, by two specific aims. Aim 1 is to conduct a comprehensive population survey of the foregut microbiome and demonstrate its association with GERD sequence. Furthermore, spatial relationship between the esophageal microbiota and upstream (mouth) and downstream (stomach) foregut microbiotas as well as temporal stability of the microbiome-disease association will also be examined. Aim 2 is to define the distal esophageal metagenome and demonstrate its association with GERD sequence. Detailed analyses will include pathway-disease and gene-disease associations. Archaea, fungi and viruses, if identified, also will be correlated with the diseases. A significant association between the foregut microbiome and GERD sequence, if demonstrated, will be the first step for eventually testing whether an abnormal microbiome is required for the development of the sequence of phenotypic changes toward EA. If EA and its precursors represent a microecological disease, treating the cause of GERD might become possible, for example, by normalizing the microbiota through use of antibiotics, probiotics, or prebiotics. Causative therapy of GERD could prevent its progression and reverse the current trend of increasing incidence of EA

    Molecular simulation of multi-component adsorption processes related to carbon capture in a high surface area, disordered activated carbon

    Get PDF
    AbstractWe employ a previously developed model of a high surface area activated carbon, based on a random packing of small fragments of a carbon sheet, functionalized with hydroxyl surface groups, to explore adsorption of water and multicomponent mixtures under conditions representing typical carbon capture processes. Adsorption of water is initialized and proceeds through the growth of clusters around the surface groups, in a process predominantly governed by hydrogen bond interactions. In contrast, energetically favorable locations for carbon dioxide molecules are different from that for water, with the main contribution coming from the Lennard-Jones interactions with the extended surfaces of the fragments. This explains why over a broad range of conditions small amounts of water do not have any substantial impact on adsorption of carbon dioxide and other species in activated carbons. From the studies of various carbon capture processes, the model material shows promising properties for pre-combustion capture due to large capacity at high pressures and other favorable characteristics

    Constrained Optimization for Validation-Guided Conditional Random Field Learning

    No full text
    Conditional random fields(CRFs) are a class of undirected graphical models which have been widely used for classifying and labeling sequence data. The training of CRFs is typically formulated as an unconstrained optimization problem that maximizes the conditional likelihood. However, maximum likelihood training is prone to overfitting. To address this issue, we propose a novel constrained nonlinear optimization formulation in which the prediction accuracy of cross-validation sets are included as constraints. Instead of requiring multiple passes of training, the constrained formulation allows the cross-validation be handled in one pass of constrained optimization. The new formulation is discontinuous, and classical Lagrangian based constraint handling methods are not applicable. A new constrained optimization algorithm based on the recently proposed extended saddle point theory is developed to learn the constrained CRF model. Experimental results on gene and stock-price prediction tasks show that the constrained formulation is able to significantly improve the generalization ability of CRF training

    A tale of two templates: Automatically resolving double traces has many applications, including efficient PCR-based elucidation of alternative splices

    Get PDF
    Trace Recalling is a novel method for deconvoluting double traces that result from simultaneously sequencing two DNA templates. Trace Recalling identifies up to two bases at each position of such a trace. The resulting ambiguity sequence is aligned to the genome, identifying one template sequence. A second template sequence is then inferred from this alignment. This technique makes possible many exciting biological applications. Here we present two such applications, alternate splice finding and elucidation of multiple insertion sites in a random insertional mutagenesis library. Our results demonstrate that RT–PCR followed by Trace Recalling is a more efficient and cost effective way to find alternate splices than traditional methods. We also present a method for mapping double-insertion events in a random insertional-mutagenesis library
    corecore