15 research outputs found

    The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics.

    Get PDF
    ABSTRACT: A global genome database of all of Earth’s species diversity could be a treasure trove of scientific discoveries. However, regardless of the major advances in genome sequencing technologies, only a tiny fraction of species have genomic information available. To contribute to a more complete planetary genomic database, scientists and institutions across the world have united under the Earth BioGenome Project (EBP), which plans to sequence and assemble high-quality reference genomes for all ∌1.5 million recognized eukaryotic species through a stepwise phased approach. As the initiative transitions into Phase II, where 150,000 species are to be sequenced in just four years, worldwide participation in the project will be fundamental to success. As the European node of the EBP, the European Reference Genome Atlas (ERGA) seeks to implement a new decentralised, accessible, equitable and inclusive model for producing high-quality reference genomes, which will inform EBP as it scales. To embark on this mission, ERGA launched a Pilot Project to establish a network across Europe to develop and test the first infrastructure of its kind for the coordinated and distributed reference genome production on 98 European eukaryotic species from sample providers across 33 European countries. Here we outline the process and challenges faced during the development of a pilot infrastructure for the production of reference genome resources, and explore the effectiveness of this approach in terms of high-quality reference genome production, considering also equity and inclusion. The outcomes and lessons learned during this pilot provide a solid foundation for ERGA while offering key learnings to other transnational and national genomic resource projects.info:eu-repo/semantics/publishedVersio

    GraphUnzip: unzipping assembly graphs with long reads and Hi-C

    No full text
    International audienceLong reads and Hi-C have revolutionized the field of genome assembly as they have made highly contiguous assemblies accessible even for challenging genomes. As haploid chromosome-level assemblies are now commonly achieved for all types of organisms, phasing assemblies has become the new frontier for genome reconstruction. Several tools have already been released using long reads and/or Hi-C to phase assemblies, but they all start from a set of linear sequences and are ill-suited for non-model organisms with high levels of heterozygosity. We present GraphUnzip, a fast, memory-efficient and flexible tool to unzip assembly graphs into their constituent haplotypes using long reads and/or Hi-C data. As GraphUnzip only connects sequences that already had a potential link in the assembly graph, it yields high-quality gap-less supercontigs. To demonstrate the efficiency of GraphUnzip, we tested it on the human HG00733 and the potato Solanum tuberosum. In both cases, GraphUnzip yielded phased assemblies with improved contiguity

    GraphUnzip: unzipping assembly graphs with long reads and Hi-C

    No full text
    International audienceLong reads and Hi-C have revolutionized the field of genome assembly as they have made highly contiguous assemblies accessible even for challenging genomes. As haploid chromosome-level assemblies are now commonly achieved for all types of organisms, phasing assemblies has become the new frontier for genome reconstruction. Several tools have already been released using long reads and/or Hi-C to phase assemblies, but they all start from a set of linear sequences and are ill-suited for non-model organisms with high levels of heterozygosity. We present GraphUnzip, a fast, memory-efficient and flexible tool to unzip assembly graphs into their constituent haplotypes using long reads and/or Hi-C data. As GraphUnzip only connects sequences that already had a potential link in the assembly graph, it yields high-quality gap-less supercontigs. To demonstrate the efficiency of GraphUnzip, we tested it on the human HG00733 and the potato Solanum tuberosum. In both cases, GraphUnzip yielded phased assemblies with improved contiguity

    DataSheet1_Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution.pdf

    No full text
    High-quality genomes obtained using long-read data allow not only for a better understanding of heterozygosity levels, repeat content, and more accurate gene annotation and prediction when compared to those obtained with short-read technologies, but also allow to understand haplotype divergence. Advances in long-read sequencing technologies in the last years have made it possible to produce such high-quality assemblies for non-model organisms. This allows us to revisit genomes, which have been problematic to scaffold to chromosome-scale with previous generations of data and assembly software. Nematoda, one of the most diverse and speciose animal phyla within metazoans, remains poorly studied, and many previously assembled genomes are fragmented. Using long reads obtained with Nanopore R10.4.1 and PacBio HiFi, we generated highly contiguous assemblies of a diploid nematode of the Mermithidae family, for which no closely related genomes are available to date, as well as a collapsed assembly and a phased assembly for a triploid nematode from the Panagrolaimidae family. Both genomes had been analysed before, but the fragmented assemblies had scaffold sizes comparable to the length of long reads prior to assembly. Our new assemblies illustrate how long-read technologies allow for a much better representation of species genomes. We are now able to conduct more accurate downstream assays based on more complete gene and transposable element predictions.</p

    SeSAM: software for automatic construction of order-robust linkage maps

    No full text
    Genotyping and sequencing technologies produce increasingly large numbers of genetic markers with potentially high rates of missing or erroneous data. Therefore, the construction of linkage maps is more and more complex. Moreover, the size of segregating populations remains constrained by cost issues and is less and less commensurate with the numbers of SNPs available. Thus, guaranteeing a statistically robust marker order requires that maps include only a carefully selected subset of SNPs.In this context, the SeSAM software allows automatic genetic map construction using seriation and placement approaches, to produce (1) a high-robustness framework map which includes as many markers as possible while keeping the order robustness beyond a given statistical threshold, and (2) a high-density total map including the framework plus almost all polymorphic markers. During this process, care is taken to limit the impact of genotyping errors and of missing data on mapping quality. SeSAM can be used with a wide range of biparental populations including from outcrossing species for which phases are inferred on-the-fly by maximum-likelihood during map elongation. The package also includes functions to simulate data sets, convert data formats, detect putative genotyping errors, visualize data and map quality (including graphical genotypes), and merge several maps into a consensus. SeSAM is also suitable for interactive map construction, by providing lower-level functions for 2-point and multipoint EM analyses. The software is implemented in a R package including functions in C++.SeSAM is a fully automatic linkage mapping software designed to (1) produce a framework map as robust as desired by optimizing the selection of a subset of markers, and (2) produce a high-density map including almost all polymorphic markers. The software can be used with a wide range of biparental mapping populations including cases from outcrossing. SeSAM is freely available under a GNU GPL v3 license and works on Linux, Windows, and macOS platforms. It is available as Additional file 1 and can be downloaded together with its user-manual and quick-start tutorial from ForgeMIA (SeSAM project) athttps://forgemia.inra.fr/gqe-acep/sesam/-/release

    Chromosome-level genome assembly and annotation of two lineages of the ant Cataglyphis hispanica: stepping stones towards genomic studies of hybridogenesis and thermal adaptation in desert ants

    Get PDF
    Cataglyphis are thermophilic ants that forage during the day when temperatures are highest and sometimes close to their critical thermal limit. Several Cataglyphis species have evolved unusual reproductive systems such as facultative queen parthenogenesis or social hybridogenesis, which have not yet been investigated in detail at the molecular level. We generated high-quality genome assemblies for two hybridogenetic lineages of the Iberian ant Cataglyphis hispanica using long-read Nanopore sequencing and exploited chromosome conformation capture (3C) sequencing to assemble contigs into 26 and 27 chromosomes, respectively. Further karyotype analyses confirm this difference in chromosome numbers between lineages; however, they also suggest it may not be fixed among lineages. We obtained transcriptomic data to assist gene annotation and built custom repeat libraries for each of the two assemblies. Comparative analyses with 19 other published ant genomes were also conducted. These new genomic resources pave the way for exploring the genetic mechanisms underlying the remarkable thermal adaptation and the molecular mechanisms associated with transitions between different genetic systems characteristic of the ant genus Cataglyphis

    Computer vision for pattern detection in chromosome contact maps

    No full text
    International audienceChromosomes of all species studied so far display a variety of higher-order organisational features, such as self-interacting domains or loops. These structures, which are often associated to biological functions, form distinct, visible patterns on genome-wide contact maps generated by chromosome conformation capture approaches such as Hi-C. Here we present Chromosight, an algorithm inspired from computer vision that can detect patterns in contact maps. Chromosight has greater sensitivity than existing methods on synthetic simulated data, while being faster and applicable to any type of genomes, including bacteria, viruses, yeasts and mammals. Our method does not require any prior training dataset and works well with default parameters on data generated with various protocols

    Chromosomal assembly of the flat oyster ( Ostrea edulis L.) genome as a new genetic resource for aquaculture

    No full text
    International audienceThe European flat oyster (Ostrea edulis L.) is a native bivalve of the European coasts. Harvest of this species has declined during the last decades because of the appearance of two parasites that have led to the collapse of the stocks and the loss of the natural oyster beds. O. edulis has been the subject of numerous studies in population genetics and on the detection of the parasites Bonamia ostreae and Marteilia refringens. These studies investigated immune responses to these parasites at the molecular and cellular levels. Several genetic improvement programs have been initiated especially for parasite resistance. Within the framework of a European project (PERLE 2) that aims to produce genetic lines of O. edulis with hardiness traits (growth, survival, resistance) for the purpose of repopulating natural oyster beds in Brittany and reviving the culture of this species in the foreshore, obtaining a reference genome becomes essential as done recently in many bivalve species of aquaculture interest. Here, we present a chromosome-level genome assembly and annotation for the European flat oyster, generated by combining PacBio, Illumina, 10X linked, and Hi-C sequencing. The finished assembly is 887.2 Mb with a scaffold-N50 of 97.1 Mb scaffolded on the expected 10 pseudochromosomes. Annotation of the genome revealed the presence of 35,962 protein-coding genes. We analyzed in detail the transposable element (TE) diversity in the flat oyster genome, highlighted some specificities in tRNA and miRNA composition, and provided the first insight into the molecular response of O. edulis to M. refringens. This genome provides a reference for genomic studies on O. edulis to better understand its basic physiology and as a useful resource for genetic breeding in support of aquaculture and natural reef restoration

    Chromosome-level genome assembly reveals homologous chromosomes and recombination in asexual rotifer Adineta vaga

    No full text
    International audienceBdelloid rotifers are notorious as a speciose ancient clade comprising only asexual lineages. Thanks to their ability to repair highly fragmented DNA, most bdelloid species also withstand complete desiccation and ionizing radiation. Producing a well-assembled reference genome is a critical step to developing an understanding of the effects of long-term asexuality and DNA breakage on genome evolution. To this end, we present the first high-quality chromosome-level genome assemblies for the bdelloid Adineta vaga, composed of six pairs of homologous (diploid) chromosomes with a footprint of paleotetraploidy. The observed large-scale losses of heterozygosity are signatures of recombination between homologous chromosomes, either during mitotic DNA double-strand break repair or when resolving programmed DNA breaks during a modified meiosis. Dynamic subtelomeric regions harbor more structural diversity (e.g., chromosome rearrangements, transposable elements, and haplotypic divergence). Our results trigger the reappraisal of potential meiotic processes in bdelloid rotifers and help unravel the factors underlying their long-term asexual evolutionary success
    corecore