92 research outputs found

    GAM-NGS: genomic assemblies merger for next generation sequencing

    Get PDF
    Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct

    Metassembler: merging and optimizing de novo genome assemblies

    Get PDF
    Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net

    Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons

    Get PDF
    In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. International competitions such as Assemblathons or GAGE tried to identify the best assembler(s) and their features. Some what circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers -- thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.Comment: Submitted to PLoS One. Supplementary material available at http://www.nada.kth.se/~vezzi/publications/supplementary.pdf and http://cs.nyu.edu/mishra/PUBLICATIONS/12.supplementaryFRC.pd

    The genome of cowpea (Vigna unguiculata [L.] Walp.)

    Get PDF
    [EN] Cowpea (Vigna unguiculata [L.] Walp.) is a major crop for worldwide food and nutritional security, especially in sub-Saharan Africa, that is resilient to hot and drought-prone environments. An assembly of the single-haplotype inbred genome of cowpea IT97K-499-35 was developed by exploiting the synergies between single-molecule real-time sequencing, optical and genetic mapping, and an assembly reconciliation algorithm. A total of 519 Mb is included in the assembled sequences. Nearly half of the assembled sequence is composed of repetitive elements, which are enriched within recombination-poor pericentromeric regions. A comparative analysis of these elements suggests that genome size differences between Vigna species are mainly attributable to changes in the amount of Gypsy retrotransposons. Conversely, genes are more abundant in more distal, high-recombination regions of the chromosomes; there appears to be more duplication of genes within the NBS-LRR and the SAUR-like auxin superfamilies compared with other warm-season legumes that have been sequenced. A surprising outcome is the identification of an inversion of 4.2 Mb among landraces and cultivars, which includes a gene that has been associated in other plants with interactions with the parasitic weed Striga gesnerioides. The genome sequence facilitated the identification of a putative syntelog for multiple organ gigantism in legumes. A revised numbering system has been adopted for cowpea chromosomes based on synteny with common bean (Phaseolus vulgaris). An estimate of nuclear genome size of 640.6 Mbp based on cytometry is presentedS

    Inserting Space into the Transformation of Higher Education

    Get PDF
    In this article we argue for a socio-political conception of space in order to show how conceptualisations of space can provide conceptual tools in the reframing of policy and designing of policy interventions in pursuit of higher education transformation goals. In keeping with Lefebvre and others, we conceptualise space as a co-producer of social relations with agentic capability in the transformation of higher education. Using this understanding of space as a conceptual framework, we analyse four national cornerstone policy documents on higher education transformation in South Africa. We find that space is almost consistently conceived of only as an object in transformation – be it with respect to macro policy on mergers to reconfigure the apartheid spatial landscape of higher education, or with respect to discriminatory institutional cultures and the need to create secure and safe campus environments.  Since the landmark White Paper on Higher Education of 1997, it is only the most recent policy document we analyse, the Draft National Plan for Post-school Education and Training of 2017, which blurs the lines between the social ills affecting higher education, the student experience and student academic performance, and different functions of space. We conclude by introducing the conceptual tool of spatial types as an opening gambit for a research agenda that aims to explore the organisation of space in higher education institutions to identify the underlying rules that govern their social nature and promote conceptualisations of social space in the reframing and design of policy that respond to calls for the creation of transformed and ‘decolonised’ higher education, as heard in studentmovement campaigns in 2015/16

    Evolution of genes and genomes on the Drosophila phylogeny

    Get PDF
    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species

    Proteogenomics of the Spotted Hyena (Crocuta crocuta)

    Get PDF
    Thesis (MMed)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The spotted hyena (Crocuta crocuta) is an important yet understudied organism that could provide insights into the fields of disease resistance, pathogen movement and disease evolution. They exist in matrilineally controlled, transient, clan-like groups that feed on a variety of organic matter and, subsequently, control the spread of pathogenic infections within an environment. Due to this, they appear to possess a high degree of resistance to pathogens. In this project, RNA-Seq data were utilized to assemble a transcriptome for the spotted hyena and tissue samples were further used to acquire protein data via MS/MS analysis. The aim of this study was to produce an accurate assembly via the transcriptomic data and subsequently further validate this assembly through the use of proteomics to better prove the quality therein. The assembly was produced using the Trinity de novo assembly software tool and assessed via the BUSCO and TransRate analysis tools. Orthology detection was carried out using ProteinOrtho, using closely related species (tiger, house cat, leopard, cheetah). Finally, LC-MS/MS data (consisting of tissue samples from peripheral, abdominal, head and thoracic lymph, as well as lung and liver tissue), and fractionated data from the sample containing the most diverse spectra, were searched against both the assembly itself and the translated genome data from the NCBI. These data served as the means by which the proteomic data were assessed and to determine whether the fractionation was successful, based on the comparative quantity of spectra between initial and fractionated analyses, in diversifying the sample. Further, these data were utilized to determine whether the translated transcriptome assembly could be successfully aligned against the proteomic data. The analysis of the quality control results found that the assembly was of appropriate quality when compared to the standards found within NCBI and within those described by the quality analysis tools. This coupled with the analysis of the proteomic data suggest that the assembly is useable, though requires further refinement. Based on the above, the inclusion of more data for assembly, is required for it to be a completely viable and ideal model assembly, however, current results are promising.AFRIKAANSE OPSOMMIMG: Alhoewel die huidige tydlyn dit verhoed het, sou daar data oor hiëna-reekse voor hierdie projek beskikbaar wees, die analise sal verder uitgebrei word. Die eerste stap sou 'n meer uitgebreide snywerk en daaropvolgende kwaliteitsbeoordelingsstap gewees het, wat sou bepaal of die snystap suksesvol is om die kwaliteit van die samestelling van die begin af te verbeter. Die voordeel van die beskikbaarheid van 'n genoom sou die gebruik van 'n ander samesteller noodsaak, moontlik deur die verwysingsgebaseerde samestellingsinstrument te gebruik, wat die genoom sou benut om 'n beter samestelling te bewerkstellig. 'n Verdere assessering, met behulp van 'n versameling monteerinstrumente, kan voordelig wees, aangesien een instrument waarskynlik onvoldoende is om al die data vas te lê. Die toets van 'n toepaslike instrument vir versoeningsversameling volg die vorige stap, wat die navorser in staat stel om te ondersoek of elkeen van die gemeentes saam beter resultate lewer as wanneer dit afsonderlik gebruik word. Toetsing van kwaliteit behou die gebruik van BUSCO en TransRate, maar kon nie so maklik vir verwysingsgebaseerde analise gebruik word nie. In hierdie geval is dit die beste om 'n vergelykende stap met die NCBI-samestelling uit te voer of instrumente te ondersoek wat meer geskik is vir hierdie tipe analise, hoewel TransRate steeds gebruik kan word, aangesien dit die samestelling op die oorspronklike fastq-lêers karteer. Daar is verskeie ander instrumente vir genoomassessering, soos GAGE, maar dit is onseker of dit korrek van toepassing kan wees op 'n RNA-Seq-vergadering of 'n versoenende vergadering met behulp van RNA-Seq-data. Na versoening en kwaliteitsbeoordeling is verdere ontleding nodig met behulp van die proteïendata. Hierdie stap sal die NCBI-proteïendata insluit vanaf die begin van die analise. Dit kan eenvoudiger wees, aangesien proteogenomiese navorsing met RNA, DNA en proteïene uitgevoer is, in plaas daarvan om slegs met RNA-Seq-data of genomiese data te begin. Een metode behels die bepaling van die vlak van oorvleueling tussen die twee proteïenstelle, sowel as tussen die proteïenstelle en die verskillende samestellings, as 'n vorm van vergelykende analise. Die bestryding kan in hierdie geval 'n meer verwante organisme wees, 'n lid van die Felidae-familie, of 'n selfs verder verwante spesie, soos 'n mens, wat 'n uitgebreide vergadering beskikbaar het.Master

    Pingu virus : a new picornavirus in penguins from Antarctica

    Get PDF
    Picornaviridae family comprises single-stranded, positive-sense RNA viruses distributed into forty-seven genera. Picornaviruses have a broad host range and geographic distribution in all continents. In this study, we applied a high-throughput sequencing approach to examine the presence of picornaviruses in penguins from King George Island, Antarctica. We discovered and characterized a novel picornavirus from cloacal swab samples of gentoo penguins (Pygoscelis papua), which we tentatively named Pingu virus. Also, using RT-PCR we detected this virus in 12.9 per cent of cloacal swabs derived from P. papua, but not in samples from adelie penguins (Pygoscelis adeliae) or chinstrap penguins (Pygoscelis antarcticus). Attempts to isolate the virus in a chicken cell line and in embryonated chicken eggs were unsuccessful. Our results expand the viral diversity, host range, and geographical distribution of the Picornaviridae52FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP13/14929-1; 17/13981-0; 12/24150-9; 15/05778-5; 14/20851-8, 16/01414-1; 06/00572-0This work was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil (Grant no. 13/14929-1, and Scholarships nos. 17/13981-0; 12/24150-9; 15/05778-5; 14/20851-8; 16/01414-1; 06/00572-0). P.R.M. was supported by the Medical Research Council of the UK (Grant no. MC_UU_120/14/9
    corecore