223 research outputs found

    Identification of long non-coding RNAs in insects genomes

    Get PDF
    International audienceThe development of high throughput sequencing technologies (HTS) has allowed researchers to better assess the complexity and diversity of the transcriptome. Among the many classes of non-coding RNAs (ncRNAs) identified the last decade, long non-coding RNAs (lncRNAs) represent a diverse and numerous repertoire of important ncRNAs, reinforcing the view that they are of central importance to the cell machinery in all branches of life. Although lncRNAs have been involved in essential biological processes such as imprinting, gene regulation or dosage compensation especially in mammals, the repertoire of lncRNAs is poorly characterized for many non-model organisms. In this review, we first focus on what is known about experimentally validated lncRNAs in insects and then review bioinformatic methods to annotate lncRNAs in the genomes of hexapods

    LEVIATHAN: efficient discovery of large structural variants by leveraging long-range information from Linked-Reads data

    Get PDF
    National audienceLinked-Reads technologies, popularized by 10x Genomics, combine the highquality and low cost of short-reads sequencing with a long-range information by adding barcodes that tag reads originating from the same long DNA fragment. Thanks to their high-quality and long-range information, such reads are thus particularly useful for various applications such as genome scaffolding and structural variant calling. As a result, multiple structural variant calling methods were developed within the last few years. However, these methods were mainly tested on human data, and do not run well on non-human organisms, for which reference genomes are highly fragmented, or sequencing data display high levels of heterozygosity. Moreover, even on human data, most tools still require large amounts of computing resources. We present LEVIATHAN, a new structural variant calling tool that aims to address these issues, and especially better scale and apply to a wide variety of organisms. Our method relies on a barcode index, that allows to quickly compare the similarity of all possible pairs of regions in terms of amount of common barcodes. Region pairs sharing a sufficient number of barcodes are then considered as potential structural variants, and complementary, classical short reads methods are applied to further refine the breakpoint coordinates. Our experiments on simulated data underline that our method compares well to the state-of-the-art, both in terms of recall and precision, and also in terms of resource consumption. Moreover, LEVIATHAN was successfully applied to a real dataset from a non-model organism, while all other tools either failed to run or required unreasonable amounts of resources. LEVIATHAN is implemented in C++, supported on Linux platforms, and available under AGPL-3.0 License at https://github.com/morispi/LEVIATHAN

    LRez: C++ API and toolkit for analyzing and managing Linked-Reads data

    Get PDF
    International audienceLinked-Reads technologies, such as 10x Genomics, Haplotagging, stLFR and TELL-Seq, partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. This way, Linked-Reads manage to combine the high-quality of the short reads and a long-range information which can be inferred by identifying distant reads belonging to the same DNA molecule with the help of the barcodes. This technology can thus efficiently be employed in various applications, such as structural variant calling, but also genome assembly, phasing and scaffoling. To benefit from Linked-Reads data, most methods first map the reads against a reference genome, and then rely on the analysis of the barcode contents of genomic regions, often requiring to fetch all reads or alignments with a given barcode. However, despite the fact that various tools and libraries are available for processing BAM files, to the best of our knowledge, no such tool currently exists for managing Linked-Reads barcodes, and allowing features such as indexing, querying, and comparisons of barcode contents. LRez aims to address this issue, by providing a complete and easy to use API and suite of tools which are directly compatible with various Linked-Reads sequencing technologies. LRez provides various functionalities such as extracting, indexing and querying Linked-Reads barcodes, in BAM, FASTQ, and gzipped FASTQ files (Table 1). The API is compiled as a shared library, helping its integration to external projects. Moreover, all functionalities are implemented in a thread-safe fashion. Our experiments show that, on a 70 GB Haplotagging BAM file from Heliconius erato [1], index construction took an hour, and resulted in an index occupying 11 GB of RAM. Using this index, querying time per barcode reached an average of 11 ms. In comparison, using a naive approach without a barcode-based index, querying time per barcode reached an hour

    Intégration de données de phénotypiques, environnementales et de biodiversité à l'aide des technologies du Web Sémantique

    Get PDF
    National audienceDe nombreuses études reposent sur des observations phénotypiques d’espèces d’intérêt, des données sur les conditions environnementales d’observation et des données de métagénomique mesurant la biodiversité. Le problème est que chaque étude développe un modèle de données ad hoc que les experts doivent s’approprier. Ceci complique à la fois la phase d’acquisition des données et la phase d’analyse, surtout lorsque celle-ci nécessite du raisonnement automatique sur les bases de connaissances associées comme la hiérarchie des espèces du NCBI Taxon, ou des ontologies de phénotypes

    MTG-Link: filling gaps in draft genome assemblies with linked read data

    Get PDF
    National audienceDe novo genome assembly is a challenging task, especially for large non-model organism genomes. Low sequence coverage, genomic repeats and heterozygosity often create ambiguities in the assembly, and result in undefined sequences between contigs called "gaps". Hence, filling gaps in draft genomes has become a natural sub-problem of many de novo genome assembly projects. Even though there are several tools for closing gaps, to our knowledge none uses the long-range information of the linked read data. Linked read technologies have a great potential for filling gaps in draft genomes as they provide long-range information while maintaining the power and accuracy of short-read sequencing. In this work, we present MTG-Link, a novel gap-filling tool dedicated to linked read data. Taking advantage of the barcode information contained in the linked read dataset, a subsample of reads is first selected for each gap. These reads are then locally assembled and the resulting gap-filled sequences are automatically evaluated. We validated our approach on a real 10X genomics linked read dataset, on a set of simulated gaps, and showed that the read subsampling step of MTG-Link enables to get better gap assemblies in a time/memory efficient manner. We also applied MTG-Link on individual genomes of a mimetic butterfly (Heliconius numata), where it significantly improved the contiguity of a 1.3 Mb locus of biological interest. MTG-Link is freely available at https://github.com/anne-gcd/MTG-Link

    Multi-scale characterization of symbiont diversity in the pea aphid complex through metagenomic approaches

    Get PDF
    International audienceMost metazoans are involved in durable symbiotic relationships with microbes which can take several forms, from mutualism to parasitism. The advances of NGS technologies and bioinformatics tools have opened new opportunities to shed light on this hidden but very influential diversity.The pea aphid is a model insect system for symbiont studies. It harbors both an obligatory symbiont supplying key nutrients and several facultative symbionts bringing some novel functions to the host, such as protection against natural enemies and thermal stress. The pea aphid is organized in a complex of biotypes, each adapted to a specific host plant of the legume family and having its own symbiont composition. Yet, the metagenomic diversity of the biotype-associated symbiotic community is still largely unknown. In particular, little is known on how the symbiotic genomic diversity is structured at different scales: across host biotypes, amongst individuals of the same biotype, or within individual aphids.We used high throughput whole genome metagenomic sequencing to characterize with a fine resolution the metagenomic diversity of both individual resequenced aphids and biotype specific pooled aphids. By a reference genome mapping approach, we first assessed the taxonomic diversity of the samples and built symbiont specific read sets. We then performed a genome-wide SNP-calling, to examine the differences in bacterial strains between samples. Our results revealed different diversity patterns at the three considered scales for the pea aphid symbionts. At the inter-biotype and intra-biotype scales, the primary symbiont Buchnera and some secondary symbionts such as Serratia showed a biotype specific diversity. We showed evidence for horizontal transfer of a Hamiltonella strain between biotypes, and found two distinct strains of Regiella symbionts within some biotypes. At the finest intra-host diversity scale, we also showed that these two strains of Regiella may coexist inside the same aphid host. This study highlights the huge potential of bioinformatics analyses of metagenomic dataset in exploring microbiote diversity in relation with host variation

    Spodoptera frugiperda (Lepidoptera: Noctuidae) host-plant variants: two host strains or two distinct species?

    Get PDF
    International audienceThe moth Spodoptera frugiperda is a well-known pest of crops throughout the Americas, which consists of two strains adapted to different host-plants: the first feeds preferentially on corn, cotton and sorghum whereas the second is more associated with rice and several pasture grasses. Though morphologically indistinguishable, they exhibit differences in their mating behavior, pheromone compositions, and show development variability according to the host-plant. Though the latter suggest that both strains are different species, this issue is still highly controversial because hybrids naturally occur in the wild, not to mention the discrepancies among published results concerning mating success between the two strains. In order to clarify the status of the two host-plant strains of S. frugiperda, we analyze features that possibly reflect the level of post-zygotic isolation: (1) first generation (F1) hybrid lethality and sterility; (2) patterns of meiotic segregation of hybrids in reciprocal second generation (F2), as compared to the meiosis of the two parental strains. We found a significant reduction of mating success in F1 in one direction of the cross and a high level of microsatellite markers showing transmission ratio distortion in the F2 progeny. Our results support the existence of post-zygotic reproductive isolation between the two laboratory strains and are in accordance with the marked level of genetic differentiation that was recovered between individuals of the two strains collected from the field. Altogether these results provide additional evidence in favor of a sibling species status for the two strains

    A comparison of the olfactory gene repertoires of adults and larvae in the noctuid moth Spodoptera littoralis

    Get PDF
    International audienceTo better understand the olfactory mechanisms in a lepidopteran pest model species, the cotton leafworm Spodoptera littoralis, we have recently established a partial transcriptome from adult antennae. Here, we completed this transcriptome using next generation sequencing technologies, namely 454 and Illumina, on both adult antennae and larval tissues, including caterpillar antennae and maxillary palps. All sequences were assembled in 77,643 contigs. Their analysis greatly enriched the repertoire of chemosensory genes in this species, with a total of 57 candidate odorant-binding and chemosensory proteins, 47 olfactory receptors, 6 gustatory receptors and 17 ionotropic receptors. Using RT-PCR, we conducted the first exhaustive comparison of olfactory gene expression between larvae and adults in a lepidopteran species. All the 127 candidate olfactory genes were profiled for expression in male and female adult antennae and in caterpillar antennae and maxillary palps. We found that caterpillars expressed a smaller set of olfactory genes than adults, with a large overlap between these two developmental stages. Two binding proteins appeared to be larvae-specific and two others were adult-specific. Interestingly, comparison between caterpillar antennae and maxillary palps revealed numerous organ-specific transcripts, suggesting the complementary involvement of these two organs in larval chemosensory detection. Adult males and females shared the same set of olfactory transcripts, except two male-specific candidate pheromone receptors, two male-specific and two female-specific odorant-binding proteins. This study identified transcripts that may be important for sex-specific or developmental stage-specific chemosensory behaviors
    • …
    corecore