563 research outputs found

    Validating module network learning algorithms using simulated data

    Get PDF
    In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

    Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

    Get PDF
    Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7Γ— genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life

    Genome-wide analysis of butterfly bush in three uplands provides insights into biogeography, demography and speciation

    Get PDF
    Understanding processes that generate and maintain large disjunctions within plant species can provide valuable insights into plant diversity and speciation. The butterfly bush Buddleja alternifolia has an unusual disjunct distribution, occurring in the Himalaya, Hengduan Mountains (HDM) and the Loess Plateau (LP) in China. We generated a high-quality, chromosome-level genome assembly of B. alternifolia, the first within the family Scrophulariaceae. Whole-genome re-sequencing data from 48 populations plus morphological and petal colour reflectance data covering its full distribution range were collected. Three distinct genetic lineages of B. alternifolia were uncovered, corresponding to Himalayan, HDM and LP populations, with the last also differentiated morphologically and phenologically, indicating occurrence of allopatric speciation likely to be facilitated by geographic isolation and divergent adaptation to distinct ecological niches. Moreover, speciation with gene flow between populations from either side of a mountain barrier could be under way within LP. The current disjunctions within B. alternifolia might result from vicariance of a once widespread distribution, followed by several past contraction and expansion events, possibly linked to climate fluctuations promoted by the Kunlun-Yellow river tectonic movement. Several adaptive genes are likely to be either uniformly or diversely selected among regions, providing a footprint of local adaptations. These findings provide new insights into plant biogeography, adaptation and different processes of allopatric speciation

    An Update on MyoD Evolution in Teleosts and a Proposed Consensus Nomenclature to Accommodate the Tetraploidization of Different Vertebrate Genomes

    Get PDF
    DJM was supported by a Natural Environment Research Council studentship (NERC/S/A/2004/12435).Background: MyoD is a muscle specific transcription factor that is essential for vertebrate myogenesis. In several teleost species, including representatives of the Salmonidae and Acanthopterygii, but not zebrafish, two or more MyoD paralogues are conserved that are thought to have arisen from distinct, possibly lineage-specific duplication events. Additionally, two MyoD paralogues have been characterised in the allotetraploid frog, Xenopus laevis. This has lead to a confusing nomenclature since MyoD paralogues have been named outside of an appropriate phylogenetic framework. Methods and Principal Findings: Here we initially show that directly depicting the evolutionary relationships of teleost MyoD orthologues and paralogues is hindered by the asymmetric evolutionary rate of Acanthopterygian MyoD2 relative to other MyoD proteins. Thus our aim was to confidently position the event from which teleost paralogues arose in different lineages by a comparative investigation of genes neighbouring myod across the vertebrates. To this end, we show that genes on the single myod-containing chromosome of mammals and birds are retained in both zebrafish and Acanthopterygian teleosts in a striking pattern of double conserved synteny. Further, phylogenetic reconstruction of these neighbouring genes using Bayesian and maximum likelihood methods supported a common origin for teleost paralogues following the split of the Actinopterygii and Sarcopterygii. Conclusion: Our results strongly suggest that myod was duplicated during the basal teleost whole genome duplication event, but was subsequently lost in the Ostariophysi ( zebrafish) and Protacanthopterygii lineages. We propose a sensible consensus nomenclature for vertebrate myod genes that accommodates polyploidization events in teleost and tetrapod lineages and is justified from a phylogenetic perspective.Publisher PDFPeer reviewe

    Chromosome-scale assembly of the Moringa oleifera Lam. genome uncovers polyploid history and evolution of secondary metabolism pathways through tandem duplication

    Get PDF
    The African Orphan Crops Consortium (AOCC) selected the highly nutritious, fast growing and drought tolerant tree crop moringa (Moringa oleifera Lam.) as one of the first of 101 plant species to have its genome sequenced and a first draft assembly was published in 2019. Given the extensive uses and culture of moringa, often referred to as the multipurpose tree, we generated a significantly improved new version of the genome based on long-read sequencing into 14 pseudochromosomes equivalent to n = 14 haploid chromosomes. We leveraged this nearly complete version of the moringa genome to investigate main drivers of gene family and genome evolution that may be at the origin of relevant biological innovations including agronomical favorable traits. Our results reveal that moringa has not undergone any additional whole-genome duplication (WGD) or polyploidy event beyond the gamma WGD shared by all core eudicots. Moringa duplicates retained following that ancient gamma events are also enriched for functions commonly considered as dosage balance sensitive. Furthermore, tandem duplications seem to have played a prominent role in the evolution of specific secondary metabolism pathways including those involved in the biosynthesis of bioactive glucosinolate, flavonoid, and alkaloid compounds as well as of defense response pathways and might, at least partially, explain the outstanding phenotypic plasticity attributed to this species. This study provides a genetic roadmap to guide future breeding programs in moringa, especially those aimed at improving secondary metabolism related traits.https://wileyonlinelibrary.com/journal/tpg2dm2022BiochemistryGeneticsMicrobiology and Plant Patholog

    Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

    Get PDF
    BACKGROUND : Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7Γ— genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. METHODS : The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A highquality reference genome was assembled with the MECAT assembly pipeline combining PacBio longread sequencing and Hi-C scaffolding. RESULTS : In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 proteinencoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. CONCLUSIONS : As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.The DOE-Joint Genome Institute, Berkeley, CA, USA, Community Sequencing Program 2019.http://f1000research.com/am2022BiochemistryGeneticsMicrobiology and Plant Patholog

    Elusive Origins of the Extra Genes in Aspergillus oryzae

    Get PDF
    The genome sequence of Aspergillus oryzae revealed unexpectedly that this species has approximately 20% more genes than its congeneric species A. nidulans and A. fumigatus. Where did these extra genes come from? Here, we evaluate several possible causes of the elevated gene number. Many gene families are expanded in A. oryzae relative to A. nidulans and A. fumigatus, but we find no evidence of ancient whole-genome duplication or other segmental duplications, either in A. oryzae or in the common ancestor of the genus Aspergillus. We show that the presence of divergent pairs of paralogs is a feature peculiar to A. oryzae and is not shared with A. nidulans or A. fumigatus. In phylogenetic trees that include paralog pairs from A. oryzae, we frequently find that one of the genes in a pair from A. oryzae has the expected orthologous relationship with A. nidulans, A. fumigatus and other species in the subphylum Eurotiomycetes, whereas the other A. oryzae gene falls outside this clade but still within the Ascomycota. We identified 456 such gene pairs in A. oryzae. Further phylogenetic analysis did not however indicate a single consistent evolutionary origin for the divergent members of these pairs. Approximately one-third of them showed phylogenies that are suggestive of horizontal gene transfer (HGT) from Sordariomycete species, and these genes are closer together in the A. oryzae genome than expected by chance, but no unique Sordariomycete donor species was identifiable. The postulated HGTs from Sordariomycetes still leave the majority of extra A. oryzae genes unaccounted for. One possible explanation for our observations is that A. oryzae might have been the recipient of many separate HGT events from diverse donors

    Genome-wide analysis of butterfly bush (Buddleja alternifolia) in three uplands provides insights into biogeography, demography and speciation

    Get PDF
    Understanding processes that generate and maintain large disjunctions within plant species can provide valuable insights into plant diversity and speciation. The butterfly bush Buddleja alternifolia has an unusual disjunct distribution, occurring in the Himalaya, Hengduan Mountains (HDM) and the Loess Plateau (LP) in China. We generated a high-quality, chromosome-level genome assembly of B. alternifolia, the first within the family Scrophulariaceae. Whole-genome re-sequencing data from 48 populations plus morphological and petal colour reflectance data covering its full distribution range were collected. Three distinct genetic lineages of B. alternifolia were uncovered, corresponding to Himalayan, HDM and LP populations, with the last also differentiated morphologically and phenologically, indicating occurrence of allopatric speciation likely to be facilitated by geographic isolation and divergent adaptation to distinct ecological niches. Moreover, speciation with gene flow between populations from either side of a mountain barrier could be under way within LP. The current disjunctions within B. alternifolia might result from vicariance of a once widespread distribution, followed by several past contraction and expansion events, possibly linked to climate fluctuations promoted by the Kunlun–Yellow river tectonic movement. Several adaptive genes are likely to be either uniformly or diversely selected among regions, providing a footprint of local adaptations. These findings provide new insights into plant biogeography, adaptation and different processes of allopatric speciation.Supplementary Material 1: Dataset S1 Morphological measurement and floral colour reflectance data for populations of Buddleja alternifola. Fig. S1 Phylogenetic trees inferred by ASTRAL- and ML-based approaches. Fig. S2 Patterns of linkage disequilibrium (LD). Fig. S3 Models 1–3, during the process of divergence among the three linkages, no gene flows with no changes in effective population size and (Model 1); with changes in effective population sizes starting from the divergence of TB (TDIV1), as well as SC and HT (TDIV2, Model 2); with changes in effective population sizes starting from TDIV1. Fig. S4 The phylogenomic tree used for time assignment of divergence for ancestral area reconstruction using representative samples of B. alternifolia and three species in the genus are currently available with re-sequencing data. Fig. S5 Cross-validation (CV) error and marginal likelihood values for different model K. Fig. S6 Reconstructing the phylogenomic relationships for 46 species of Buddleja using single-copy genes. Methods S1 Site ancestral state estimation. Methods S2 Estimating mutation rate of B. alternifolia. Methods S3 Reconstructing the phylogenomic relationships for 46 species of Buddleja using single-copy genes.Supplementary Material 2: Notes S1 Reproducibility of analyses for BEAST and r8s files.Supplementary Material 3: Table S1 Statistics of all assemblies. Table S2 Basic information with regards to genomes of 17 plants that were used to gene family analysis and the phylogenetic tree construction. Table S3 A matrix information on geographic distances among populations. Table S4 Environmental parameters used for assessment of ecological niche differentiation in B. alternifolia. Table S5 Geographical coordinate of B. alternifolia. Table S6 WGS-PacBio sequencing statistics. Table S7 WGS Illumina sequencing statistics. Table S8 HiC sequencing statistics. Table S9 Repeat annotations of the Buddleja alternifolia genome assembly. Table S10 Gene annotation statistics of the Buddleja alternifolia assembly. Table S11 Functional annotation of predicted genes in the Buddleja alternifolia genome. Table S12 Summary of the gene family analyses. Table S13 Basic information on location and genome mapping characteristics of all sampled individuals. Table S14 Summary of SNP annotations. Table S15 Global pairwise Fst between areas at the whole-genome level. Table S16 Pairwise Fst between areas in the divided nine subgroups of the whole genome, that is, eight in the gene region and one in the intergene region. Table S17 Results of nine models used in the fastsimocal analysis. Table S18 Basic parameters of three models compared in BioGeoBears, that is, Dec and Divalike based on dispersal-vicariance analysis, and Bayarea based on Bayesian inference of historical biogeography for discrete areas, with and without the founder-event speciation β€˜J’ parameter. Table S19 Results of IBD and IBA analysis using simple and partial Mantel tests. Table S20 Shared genes detected by both approaches, red colour font indicating the shared genes of a significant overrepresentation with a specific GO term (P < 0.05). Table S21 Annotation of genes with significant GO terms (P < 0.05) detected by both approaches. Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.National Natural Science Foundation of China; Second Tibetan Plateau Scientific Expedition and Research Programme; Yunnan Science and Technology Innovation Team Programme for PSESP Conservation and Utilisation; Youth Innovation Promotion Association, Chinese Academy of Sciences.https://nph.onlinelibrary.wiley.com/journal/14698137am2022BiochemistryGeneticsMicrobiology and Plant Patholog

    Anaerobic utilization of pectinous substrates at extremely haloalkaline conditions by Natranaerovirga pectinivora gen. nov., sp. nov., and Natranaerovirga hydrolytica sp. nov., isolated from hypersaline soda lakes

    Get PDF
    Anaerobic enrichments at pH 10, with pectin and polygalacturonates as substrates and inoculated with samples of sediments of hypersaline soda lakes from the Kulunda Steppe (Altai, Russia) demonstrated the potential for microbial pectin degradation up to soda-saturating conditions. The enrichments resulted in the isolation of six strains of obligately anaerobic fermentative bacteria, which represented a novel deep lineage within the order Clostridiales loosely associated with the family Lachnospiraceae. The isolates were rod-shaped and formed terminal round endospores. One of the striking features of the novel group is a very narrow substrate spectrum for growth, restricted to galacturonic acid and its polymers (e.g. pectin). Acetate and formate were the final fermentation products. Growth was possible in a pH range from 8 to 10.5, with an optimum at pH 9.5–10, and in a salinity range from 0.2 to 3.5Β M Na+. On the basis of unique phenotypic properties and distinct phylogeny, the pectinolytic isolates are proposed to be assigned to a new genus Natranaerovirga with two species N. hydrolytica (APP2T=DSM24176T=UNIQEM U806T) and N. pectinivora (AP3T=DSM24629T=UNIQEM U805T)
    • …
    corecore