81 research outputs found

    Identification of NAD interacting residues in proteins

    Get PDF
    Background: Small molecular cofactors or ligands play a crucial role in the proper functioning of cells. Accurate annotation of their target proteins and binding sites is required for the complete understanding of reaction mechanisms. Nicotinamide adenine dinucleotide (NAD+ or NAD) is one of the most commonly used organic cofactors in living cells, which plays a critical role in cellular metabolism, storage and regulatory processes. In the past, several NAD binding proteins (NADBP) have been reported in the literature, which are responsible for a wide-range of activities in the cell. Attempts have been made to derive a rule for the binding of NAD+ to its target proteins. However, so far an efficient model could not be derived due to the time consuming process of structure determination, and limitations of similarity based approaches. Thus a sequence and non-similarity based method is needed to characterize the NAD binding sites to help in the annotation. In this study attempts have been made to predict NAD binding proteins and their interacting residues (NIRs) from amino acid sequence using bioinformatics tools. Results: We extracted 1556 proteins chains from 555 NAD binding proteins whose structure is available in Protein Data Bank. Then we removed all redundant protein chains and finally obtained 195 non-redundant NAD binding protein chains, where no two chains have more than 40% sequence identity. In this study all models were developed and evaluated using five-fold cross validation technique on the above dataset of 195 NAD binding proteins. While certain type of residues are preferred (e.g. Gly, Tyr, Thr, His) in NAD interaction, residues like Ala, Glu, Leu, Lys are not preferred. A support vector machine (SVM) based method has been developed using various window lengths of amino acid sequence for predicting NAD interacting residues and obtained maximum Matthew's correlation coefficient (MCC) 0.47 with accuracy 74.13% at window length 17. We also developed a SVM based method using evolutionary information in the form of position specific scoring matrix (PSSM) and obtained maximum MCC 0.75 with accuracy 87.25%. Conclusion: For the first time a sequence-based method has been developed for the prediction of NAD binding proteins and their interacting residues, in the absence of any prior structural information. The present model will aid in the understanding of NAD+ dependent mechanisms of action in the cell. To provide service to the scientific community, we have developed a user-friendly web server, which is available from URL http://www.imtech.res.in/raghava/nadbinder/

    Monitoring Influenza Activity in the United States: A Comparison of Traditional Surveillance Systems with Google Flu Trends

    Get PDF
    Google Flu Trends was developed to estimate US influenza-like illness (ILI) rates from internet searches; however ILI does not necessarily correlate with actual influenza virus infections.Influenza activity data from 2003-04 through 2007-08 were obtained from three US surveillance systems: Google Flu Trends, CDC Outpatient ILI Surveillance Network (CDC ILI Surveillance), and US Influenza Virologic Surveillance System (CDC Virus Surveillance). Pearson's correlation coefficients with 95% confidence intervals (95% CI) were calculated to compare surveillance data. An analysis was performed to investigate outlier observations and determine the extent to which they affected the correlations between surveillance data. Pearson's correlation coefficient describing Google Flu Trends and CDC Virus Surveillance over the study period was 0.72 (95% CI: 0.64, 0.79). The correlation between CDC ILI Surveillance and CDC Virus Surveillance over the same period was 0.85 (95% CI: 0.81, 0.89). Most of the outlier observations in both comparisons were from the 2003-04 influenza season. Exclusion of the outlier observations did not substantially improve the correlation between Google Flu Trends and CDC Virus Surveillance (0.82; 95% CI: 0.76, 0.87) or CDC ILI Surveillance and CDC Virus Surveillance (0.86; 95%CI: 0.82, 0.90).This analysis demonstrates that while Google Flu Trends is highly correlated with rates of ILI, it has a lower correlation with surveillance for laboratory-confirmed influenza. Most of the outlier observations occurred during the 2003-04 influenza season that was characterized by early and intense influenza activity, which potentially altered health care seeking behavior, physician testing practices, and internet search behavior

    Genome of the Avirulent Human-Infective Trypanosome—Trypanosoma rangeli

    Get PDF
    Background: Trypanosoma rangeli is a hemoflagellate protozoan parasite infecting humans and other wild and domestic mammals across Central and South America. It does not cause human disease, but it can be mistaken for the etiologic agent of Chagas disease, Trypanosoma cruzi. We have sequenced the T. rangeli genome to provide new tools for elucidating the distinct and intriguing biology of this species and the key pathways related to interaction with its arthropod and mammalian hosts.  Methodology/Principal Findings: The T. rangeli haploid genome is ,24 Mb in length, and is the smallest and least repetitive trypanosomatid genome sequenced thus far. This parasite genome has shorter subtelomeric sequences compared to those of T. cruzi and T. brucei; displays intraspecific karyotype variability and lacks minichromosomes. Of the predicted 7,613 protein coding sequences, functional annotations could be determined for 2,415, while 5,043 are hypothetical proteins, some with evidence of protein expression. 7,101 genes (93%) are shared with other trypanosomatids that infect humans. An ortholog of the dcl2 gene involved in the T. brucei RNAi pathway was found in T. rangeli, but the RNAi machinery is non-functional since the other genes in this pathway are pseudogenized. T. rangeli is highly susceptible to oxidative stress, a phenotype that may be explained by a smaller number of anti-oxidant defense enzymes and heatshock proteins.  Conclusions/Significance: Phylogenetic comparison of nuclear and mitochondrial genes indicates that T. rangeli and T. cruzi are equidistant from T. brucei. In addition to revealing new aspects of trypanosome co-evolution within the vertebrate and invertebrate hosts, comparative genomic analysis with pathogenic trypanosomatids provides valuable new information that can be further explored with the aim of developing better diagnostic tools and/or therapeutic targets

    The Bicoid Stability Factor Controls Polyadenylation and Expression of Specific Mitochondrial mRNAs in Drosophila melanogaster

    Get PDF
    The bicoid stability factor (BSF) of Drosophila melanogaster has been reported to be present in the cytoplasm, where it stabilizes the maternally contributed bicoid mRNA and binds mRNAs expressed from early zygotic genes. BSF may also have other roles, as it is ubiquitously expressed and essential for survival of adult flies. We have performed immunofluorescence and cell fractionation analyses and show here that BSF is mainly a mitochondrial protein. We studied two independent RNAi knockdown fly lines and report that reduced BSF protein levels lead to a severe respiratory deficiency and delayed development at the late larvae stage. Ubiquitous knockdown of BSF results in a severe reduction of the polyadenylation tail lengths of specific mitochondrial mRNAs, accompanied by an enrichment of unprocessed polycistronic RNA intermediates. Furthermore, we observed a significant reduction in mRNA steady state levels, despite increased de novo transcription. Surprisingly, mitochondrial de novo translation is increased and abnormal mitochondrial translation products are present in knockdown flies, suggesting that BSF also has a role in coordinating the mitochondrial translation in addition to its role in mRNA maturation and stability. We thus report a novel function of BSF in flies and demonstrate that it has an important intra-mitochondrial role, which is essential for maintaining mtDNA gene expression and oxidative phosphorylation

    Chromatin-associated regulation of sorbitol synthesis in flower buds of peach

    Full text link
    [EN] Key message PpeS6PDH gene is postulated to mediate sorbitol synthesis in flower buds of peach concomitantly with specific chromatin modifications. Abstract Perennial plants have evolved an adaptive mechanism involving protection of meristems within specialized structures named buds in order to survive low temperatures and water deprivation during winter. A seasonal period of dormancy further improves tolerance of buds to environmental stresses through specific mechanisms poorly known at the molecular level. We have shown that peach PpeS6PDH gene is down-regulated in flower buds after dormancy release, concomitantly with changes in the methylation level at specific lysine residues of histone H3 (H3K27 and H3K4) in the chromatin around the translation start site of the gene. PpeS6PDH encodes a NADPH-dependent sorbitol-6-phosphate dehydrogenase, the key enzyme for biosynthesis of sorbitol. Consistently, sorbitol accumulates in dormant buds showing higher PpeS6PDH expression. Moreover, PpeS6PDH gene expression is affected by cold and water deficit stress. Particularly, its expression is up-regulated by low temperature in buds and leaves, whereas desiccation treatment induces PpeS6PDH in buds and represses the gene in leaves. These data reveal the concurrent participation of chromatin modification mechanisms, transcriptional regulation of PpeS6PDH and sorbitol accumulation in flower buds of peach. In addition to its role as a major translocatable photosynthate in Rosaceae species, sorbitol is a widespread compatible solute and cryoprotectant, which suggests its participation in tolerance to environmental stresses in flower buds of peach.This work was funded by the Instituto Nacional de Investigacion y Tecnologia Agraria y Alimentaria (INIA)-FEDER (RF2013-00043-C02-02) and the Ministry of Science and Innovation of Spain (AGL2010-20595). AL was funded by a fellowship co-financed by the European Social Fund and the Instituto Valenciano de Investigaciones Agrarias (IVIA).Lloret, A.; Martinez Fuentes, A.; Agustí Fonfría, M.; Badenes, ML.; Rios, G. (2017). Chromatin-associated regulation of sorbitol synthesis in flower buds of peach. Plant Molecular Biology. 95(4-5):507-517. https://doi.org/10.1007/s11103-017-0669-6S507517954-5Andersen CL, Jensen JL, Ørntoft TF (2004) Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res 64:5245–5250. doi: 10.1158/0008-5472.CAN-04-0496Bai S, Saito T, Ito A et al (2016) Small RNA and PARE sequencing in flower bud reveal the involvement of sRNAs in endodormancy release of Japanese pear (Pyrus pyrifolia ‘Kosui’). BMC Genomics 17:230. doi: 10.1186/s12864-016-2514-8Bielenberg DG, Wang Y, Li Z et al (2008) Sequencing and annotation of the evergrowing locus in peach (Prunus persica [L.] Batsch) reveals a cluster of six MADS-box transcription factors as candidate genes for regulation of terminal bud formation. Tree Genet Genomes 4:495–507. doi: 10.1007/s11295-007-0126-9Bieleski RL (1969) Accumulation and translocation of sorbitol in apple phloem. Aust J Biol Sci 22:611–620. doi: 10.1071/BI9690611Bieleski RL (1982) Sugar alcohols. In: Loewus F, Tanner W (eds) Encyclopedia of plant physiology, new series 13A. Springer-Verlag, Berlin, pp 158–192Bortiri E, Oh SH, Gao FY, Potter D (2002) The phylogenetic utility of nucleotide sequences of sorbitol 6-phosphate dehydrogenase in Prunus (Rosaceae). Am J Bot 89:1697–1708. doi: 10.3732/ajb.89.10.1697Chouard P (1960) Vernalization and its relations to dormancy. Annu Rev Plant Physiol 11:191–238. doi: 10.1146/annurev.pp.11.060160.001203Conde D, Le Gac AL, Perales M et al (2017) Chilling-responsive DEMETER-LIKE DNA demethylase mediates in poplar bud break. Plant Cell Environ 40:2236–2249. doi: 10.1111/pce.13019Couvillon GA, Erez A (1985) Influence of prolonged exposure to chilling temperatures on bud break and heat requirement for bloom of several fruit species. J Amer Soc Hort Sci 110:47–50de la Fuente L, Conesa A, Lloret A, Badenes ML, Ríos G (2015) Genome-wide changes in histone H3 lysine 27 trimethylation associated with bud dormancy release in peach. Tree Genet Genomes 11:45. doi: 10.1007/s11295-015-0869-7Deng W, Buzas DM, Ying H et al (2013) Arabidopsis polycomb repressive complex 2 binding sites contain putative GAGA factor binding motifs within coding regions of genes. BMC Genomics 14:593. doi: 10.1186/1471-2164-14-593Escobar-Gutiérrez AJ, Gaudillère JP (1996) Distribution, metabolism and role of sorbitol in higher plants—A review. Agronomie 16:281–298. doi: 10.1051/agro:19960502Escobar-Gutiérrez AJ, Zipperlin B, Carbonne F, Moing A, Gaudillére JP (1998) Photosynthesis, carbon partitioning and metabolite content during drought stress in peach seedlings. Aust J Plant Physiol 25:197–205. doi: 10.1071/PP97121Eshghi S, Tafazoli E, Dokhani S, Rahemi M, Emam Y (2007) Changes in carbohydrate contents in shoot tips, leaves and roots of strawberry (Fragaria x ananassa Duch) during flower-bud differentiation. Sci Hortic 113:255–260. doi: 10.1016/j.scienta.2007.03.014Everard JD, Cantini C, Grumet R, Plummer J, Loescher WH (1997) Molecular cloning of mannose-6-phosphate reductase and its developmental expression in celery. Plant Physiol 113:1427–1435. doi: 10.1104/pp.113.4.1427Fennell A (2014) Genomics and functional genomics of winter low temperature tolerance in temperate fruit crops. Crit Rev Plant Sci 33:125–140. doi: 10.1080/07352689.2014.870410Figueroa CM, Iglesias AA (2010) Aldose-6-phosphate reductase from apple leaves: importance of the quaternary structure for enzyme activity. Biochimie 92:81–88. doi: 10.1016/j.biochi.2009.09.013Gao M, Tao R, Miura K, Dandekar AM, Sugiura A (2001) Transformation of Japanese persimmon (Diospyros kaki Thunb) with apple cDNA encoding NADP-dependent sorbitol-6-phosphate dehydrogenase. Plant Sci 160:837–845. doi: 10.1016/S0168-9452(00)00458-1Grant CR, ap Rees T (1981) Sorbitol metabolism by apple seedlings. Phytochemistry 20:1505–1511. doi: 10.1016/S0031-9422(00)98521-2Hartman MD, Figueroa CM, Arias DG, Iglesias AA (2017) Inhibition of recombinant aldose-6-phosphate reductase from peach leaves by hexose-phosphates, inorganic phosphate and oxidants. Plant Cell Physiol 58:145–155. doi: 10.1093/pcp/pcw180Horvath DP, Anderson JV, Chao WS, Foley ME (2003) Knowing when to grow: signals regulating bud dormancy. Trends Plant Sci 8:534–540. doi: 10.1016/j.tplants.2003.09.013Horvath DP, Sung S, Kim D, Chao W, Anderson J (2010) Characterization, expression and function of DORMANCY ASSOCIATED MADS-BOX genes from leafy spurge. Plant Mol Biol 73:169–179. doi: 10.1007/s11103-009-9596-5Hussain S, Niu Q, Yang F, Hussain N, Teng Y (2015) The possible role of chilling in floral and vegetative bud dormancy release in Pyrus pyrifolia. Biol Plant 59:726–734. doi: 10.1007/s10535-015-0547-5Hyndman D, Baumanb DR, Herediac VV, Penning TM (2003) The aldo-keto reductase superfamily homepage. Chem Biol Interact 143–144:621–631. doi: 10.1016/S0009-2797(02)00193-XIto A, Sakamoto D, Moriguchi T (2012) Carbohydrate metabolism and its possible roles in endodormancy transition in Japanese pear. Sci Hortic 144:187–194. doi: 10.1016/j.scienta.2012.07.009Ito A, Sugiura T, Sakamoto D, Moriguchi T (2013) Effects of dormancy progression and low-temperature response on changes in the sorbitol concentration in xylem sap of Japanese pear during winter season. Tree Physiol 33:398–408. doi: 10.1093/treephys/tpt021Jung S, Bassett C, Bielenberg DG et al (2015) A standard nomenclature for gene designation in the Rosaceae. Tree Genet Genomes 11:108. doi: 10.1007/s11295-015-0931-5Kanayama Y, Mori H, Imaseki H, Yamaki S (1992) Nucleotide sequence of a cDNA encoding NADP-sorbitol-6-phosphate dehydrogenase from apple. Plant Physiol 100:1607–1608Kanayama Y, Watanabe M, Moriguchi R, Deguchi M, Kanahama K, Yamaki S (2006) Effects of low temperature and abscisic acid on the expression of the sorbitol-6-phosphate dehydrogenase gene in apple leaves. J Japan Soc Hort Sci 75:20–25. doi: 10.2503/jjshs.75.20Kumar G, Rattan UK, Singh AK (2016a) Chilling-mediated DNA methylation changes during dormancy and its release reveal the importance of epigenetic regulation during winter dormancy in apple (Malus x domestica Borkh). PLoS ONE 11:e0149934. doi: 10.1371/journal.pone.0149934Kumar S, Stecher G, Tamura K (2016b) MEGA7: molecular evolutionary genetics analysis version 70 for bigger datasets. Mol Biol Evol 33:1870–1874. doi: 10.1093/molbev/msw054Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227:680–685. doi: 10.1038/227680a0Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. doi: 10.1093/bioinformatics/btm404Leida C, Terol J, Martí G et al (2010) Identification of genes associated with bud dormancy release in Prunus persica by suppression subtractive hybridization. Tree Physiol 30:655–666. doi: 10.1093/treephys/tpq008Leida C, Conesa A, Llácer G, Badenes ML, Ríos G (2012) Histone modifications and expression of DAM6 gene in peach are modulated during bud dormancy release in a cultivar-dependent manner. New Phytol 193:67–80. doi: 10.1111/j.1469-8137.2011.03863.xLiang D, Cui M, Wu S, Ma F-W (2012) Genomic structure, sub-cellular localization, and promoter analysis of the gene encoding sorbitol-6-phosphate dehydrogenase from apple. Plant Mol Biol Rep 30:904–914. doi: 10.1007/s11105-011-0409-zLiu D, Ni J, Wu R, Teng Y (2013) High temperature alters sorbitol metabolism in Pyrus pyrifolia leaves and fruit flesh during late stages of fruit enlargement. J Am Soc Hortic Sci 138:443–451Lloret A, Conejero A, Leida C et al (2017) Dual regulation of water retention and cell growth by a stress-associated protein (SAP) gene in Prunus. Sci Rep 7:332. doi: 10.1038/s41598-017-00471-7Lo Bianco R, Rieger M, Sung S-JS (2000) Effect of drought on sorbitol and sucrose metabolism in sinks and sources of peach. Physiol Plant 108:71–78. doi: 10.1034/j.1399-3054.2000.108001071.xLoescher WH (1987) Physiology and metabolism of sugar alcohols in higher-plants. Physiol Plant 70:553–557. doi: 10.1111/j.1399-3054.1987.tb02857.xLoescher WH, Marlow GC, Kennedy RA (1982) Sorbitol metabolism and sink-source interconversions in developing apple leaves. Plant Physiol 70:335–339. doi: 10.1104/pp.70.2.335Marquat C, Vandamme M, Gendraud M, Pétel G (1999) Dormancy in vegetative buds of peach: relation between carbohydrate absorption potentials and carbohydrate concentration in the bud during dormancy and its release. Sci Hortic 79:151–162. doi: 10.1016/S0304-4238(98)00203-9Niu Q, Li J, Cai D et al (2016) Dormancy-associated MADS-box genes and microRNAs jointly control dormancy transition in pear (Pyrus pyrifolia white pear group) flower bud. J Exp Bot 67:239–257. doi: 10.1093/jxb/erv454Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP (2004) Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper—excel-based tool using pair-wise correlations. Biotechnol Lett 26:509–515. doi: 10.1023/B:BILE.0000019559.84305.47Ríos G, Leida C, Conejero A, Badenes ML (2014) Epigenetic regulation of bud dormancy events in perennial plants. Front Plant Sci 5:247. doi: 10.3389/fpls.2014.00247Saito T, Bai S, Imai T, Ito A, Nakajima I, Moriguchi T (2015) Histone modification and signalling cascade of the dormancy-associated MADS-box gene, PpMADS13-1, in Japanese pear (Pyrus pyrifolia) during endodormancy. Plant Cell Environ 38:1157–1166. doi: 10.1111/pce.12469Santamaría ME, Hasbún R, Valera MJ et al (2009) Acetylated H4 histone and genomic DNA methylation patterns during bud set and bud burst in Castanea sativa. J Plant Physiol 166:1360–1369. doi: 10.1016/j.jplph.2009.02.014Shen B, Hohmann S, Jensen RG, Bohnert HJ (1999) Roles of sugar alcohols in osmotic stress adaptation replacement of glycerol by mannitol and sorbitol in yeast. Plant Physiol 121:45–52. doi: 10.1104/pp.121.1.45Sheveleva EV, Marquez S, Chmara W, Zegeer A, Jensen RG, Bohnert HJ (1998) Sorbitol-6-phosphate dehydrogenase expression in transgenic tobacco high amounts of sorbitol lead to necrotic lesions. Plant Physiol 117:831–839. doi: 10.1104/pp.117.3.831Silver N, Best S, Jian J, Thein SL (2006) Selection of housekeeping genes for gene expression studies in human reticulocytes using real-time PCR. BMC Mol Biol 7:33. doi: 10.1186/1471-2199-7-33Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577. doi: 10.1080/10635150701472164Tao R, Uratsu SL, Dandekar AM (1995) Sorbitol synthesis in transgenic tobacco with apple cDNA encoding NADP-dependent sorbitol-6-phosphate dehydrogenase. Plant Cell Physiol 36:525–532. doi: 10.1093/oxfordjournals.pcp.a078789Teo G, Suzuki Y, Uratsu SL et al (2006) Silencing leaf sorbitol synthesis alters long-distance partitioning and apple fruit quality. Proc Natl Acad Sci USA 103:18842–18847. doi: 10.1073/pnas.0605873103Trotel P, Bouchereau A, Niogret MF, Larher F (1996) The fate of osmo-accumulated proline in leaf discs of Rape (Brassica napus L) incubated in a medium of low osmolarity. Plant Sci 118:31–45. doi: 10.1016/0168-9452(96)04422-6Verde I, Abbott AG, Scalabrin S et al (2013) The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45:487–494. doi: 10.1038/ng.2586Webb KL, Burley JWA (1962) Sorbitol translocation in apple. Science 137:766. doi: 10.1126/science.137.3532.766Wisniewski M, Norelli J, Artlip T (2015) Overexpression of a peach CBF gene in apple: a model for understanding the integration of growth, dormancy, and cold hardiness in woody plants. Front Plant Sci 6:85. doi: 10.3389/fpls.2015.00085Yadav R, Prasad R (2014) Identification and functional characterization of sorbitol-6-phosphate dehydrogenase protein from rice and structural elucidation by in silico approach. Planta 240:223–238. doi: 10.1007/s00425-014-2076-

    Comparative Genomics of the Mating-Type Loci of the Mushroom Flammulina velutipes Reveals Widespread Synteny and Recent Inversions

    Get PDF
    Mating-type loci of mushroom fungi contain master regulatory genes that control recognition between compatible nuclei, maintenance of compatible nuclei as heterokaryons, and fruiting body development. Regions near mating-type loci in fungi often show adapted recombination, facilitating the generation of novel mating types and reducing the production of self-compatible mating types. Compared to other fungi, mushroom fungi have complex mating-type systems, showing both loci with redundant function (subloci) and subloci with many alleles. The genomic organization of mating-type loci has been solved in very few mushroom species, which complicates proper interpretation of mating-type evolution and use of those genes in breeding programs.We report a complete genetic structure of the mating-type loci from the tetrapolar, edible mushroom Flammulina velutipes mating type A3B3. Two matB3 subloci, matB3a that contains a unique pheromone and matB3b, were mapped 177 Kb apart on scaffold 1. The matA locus of F. velutipes contains three homeodomain genes distributed over 73 Kb distant matA3a and matA3b subloci. The conserved matA region in Agaricales approaches 350 Kb and contains conserved recombination hotspots showing major rearrangements in F. velutipes and Schizophyllum commune. Important evolutionary differences were indicated; separation of the matA subloci in F. velutipes was diverged from the Coprinopsis cinerea arrangement via two large inversions whereas separation in S. commune emerged through transposition of gene clusters.In our study we determined that the Agaricales have very large scale synteny at matA (∼350 Kb) and that this synteny is maintained even when parts of this region are separated through chromosomal rearrangements. Four conserved recombination hotspots allow reshuffling of large fragments of this region. Next to this, it was revealed that large distance subloci can exist in matB as well. Finally, the genes that were linked to specific mating types will serve as molecular markers in breeding

    Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

    Get PDF
    Background A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. Results In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. Conclusions The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign webcite
    corecore