146 research outputs found

    Allele-specific miRNA-binding analysis identifies candidate target genes for breast cancer risk

    Get PDF
    Most breast cancer (BC) risk-associated single-nucleotide polymorphisms (raSNPs) identified in genome-wide association studies (GWAS) are believed to cis-regulate the expression of genes. We hypothesise that cis-regulatory variants contributing to disease risk may be affecting microRNA (miRNA) genes and/or miRNA binding. To test this, we adapted two miRNA-binding prediction algorithms-TargetScan and miRanda-to perform allele-specific queries, and integrated differential allelic expression (DAE) and expression quantitative trait loci (eQTL) data, to query 150 genome-wide significant ( P≤5×10-8 ) raSNPs, plus proxies. We found that no raSNP mapped to a miRNA gene, suggesting that altered miRNA targeting is an unlikely mechanism involved in BC risk. Also, 11.5% (6 out of 52) raSNPs located in 3'-untranslated regions of putative miRNA target genes were predicted to alter miRNA::mRNA (messenger RNA) pair binding stability in five candidate target genes. Of these, we propose RNF115, at locus 1q21.1, as a strong novel target gene associated with BC risk, and reinforce the role of miRNA-mediated cis-regulation at locus 19p13.11. We believe that integrating allele-specific querying in miRNA-binding prediction, and data supporting cis-regulation of expression, improves the identification of candidate target genes in BC risk, as well as in other common cancers and complex diseases.Funding Agency Portuguese Foundation for Science and Technology CRESC ALGARVE 2020 European Union (EU) 303745 Maratona da Saude Award DL 57/2016/CP1361/CT0042 SFRH/BPD/99502/2014 CBMR-UID/BIM/04773/2013 POCI-01-0145-FEDER-022184info:eu-repo/semantics/publishedVersio

    Assembly complexity of prokaryotic genomes using short reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.</p> <p>Results</p> <p>We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for <it>de novo </it>reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).</p> <p>Conclusions</p> <p>Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.</p

    Comparing de novo assemblers for 454 transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode <it>Litomosoides sigmodontis</it>.</p> <p>Results</p> <p>Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs.</p> <p>Conclusions</p> <p>Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.</p

    Linkage Mapping and Comparative Genomics Using Next-Generation RAD Sequencing of a Non-Model Organism

    Get PDF
    Restriction-site associated DNA (RAD) sequencing is a powerful new method for targeted sequencing across the genomes of many individuals. This approach has broad potential for genetic analysis of non-model organisms including genotype-phenotype association mapping, phylogeography, population genetics and scaffolding genome assemblies through linkage mapping. We constructed a RAD library using genomic DNA from a Plutella xylostella (diamondback moth) backcross that segregated for resistance to the insecticide spinosad. Sequencing of 24 individuals was performed on a single Illumina GAIIx lane (51 base paired-end reads). Taking advantage of the lack of crossing over in homologous chromosomes in female Lepidoptera, 3,177 maternally inherited RAD alleles were assigned to the 31 chromosomes, enabling identification of the spinosad resistance and W/Z sex chromosomes. Paired-end reads for each RAD allele were assembled into contigs and compared to the genome of Bombyx mori (n = 28) using BLAST, revealing 28 homologous matches plus 3 expected fusion/breakage events which account for the difference in chromosome number. A genome-wide linkage map (1292 cM) was inferred with 2,878 segregating RAD alleles inherited from the backcross father, producing chromosome and location specific sequenced RAD markers. Here we have used RAD sequencing to construct a genetic linkage map de novo for an organism that has no previous genome data. Comparative analysis of P. xyloxtella linkage groups with B. mori chromosomes shows for the first time, genetic synteny appears common beyond the Macrolepidoptera. RAD sequencing is a powerful system capable of rapidly generating chromosome specific data for non-model organisms

    RNA-Seq reveals large quantitative differences between the transcriptomes of outbreak and non-outbreak locusts

    Get PDF
    Outbreaks of locust populations repeatedly devastate economies and ecosystems in large parts of the world. The consequent behavioural shift from solitarious to gregarious and the concomitant changes in the locusts’ biology are of relevant scientific interest. Yet, research on the main locust species has not benefitted from recent advances in genomics. In this first RNA-Seq study on Schistocerca gregaria, we report two transcriptomes, including many novel genes, as well as differential gene expression results. In line with the large biological differences between solitarious and gregarious locusts, almost half of the transcripts are differentially expressed between their central nervous systems. Most of these transcripts are over-expressed in the gregarious locusts, suggesting positive correlations between the levels of activity at the population, individual, tissue and gene expression levels. We group these differentially expressed transcripts by gene function and highlight those that are most likely to be associated with locusts’ phase change either in a species-specific or general manner. Finally, we discuss our findings in the context of population-level and physiological events leading to gregariousness.M. Bakkali wishes to thank the Spanish Ministerio de Ciencia y Tecnología for the for the Ramón y Cajal fellowship and for the BFU2010-16438 grant that supported both this research and the FPI studentship to Rubén Martín Blázquez. We thank Mrs. Pernille Lavgesen for revision of the English language writing of this manuscript. We also thank the editor for the valuable comments on the manuscript

    Pathoadaptive mutations of Escherichia coli K1 in experimental neonatal systemic infection

    Get PDF
    Although Escherichia coli K1 strains are benign commensals in adults, their acquisition at birth by the newborn may result in life-threatening systemic infections, most commonly sepsis and meningitis. Key features of these infections, including stable gastrointestinal (GI) colonization and age-dependent invasion of the bloodstream, can be replicated in the neonatal rat. We previously increased the capacity of a septicemia isolate of E. coli K1 to elicit systemic infection following colonization of the small intestine by serial passage through two-day-old (P2) rat pups. The passaged strain, A192PP (belonging to sequence type 95), induces lethal infection in all pups fed 2–6 x 106 CFU. Here we use whole-genome sequencing to identify mutations responsible for the threefold increase in lethality between the initial clinical isolate and the passaged derivative. Only four single nucleotide polymorphisms (SNPs), in genes (gloB, yjgV, tdcE) or promoters (thrA) involved in metabolic functions, were found: no changes were detected in genes encoding virulence determinants associated with the invasive potential of E. coli K1. The passaged strain differed in carbon source utilization in comparison to the clinical isolate, most notably its inability to metabolize glucose for growth. Deletion of each of the four genes from the E. coli A192PP chromosome altered the proteome, reduced the number of colonizing bacteria in the small intestine and increased the number of P2 survivors. This work indicates that changes in metabolic potential lead to increased colonization of the neonatal GI tract, increasing the potential for translocation across the GI epithelium into the systemic circulation

    Biology of archaea from a novel family Cuniculiplasmataceae (Thermoplasmata) ubiquitous in hyperacidic environments

    Get PDF
    The order Thermoplasmatales (Euryarchaeota) is represented by the most acidophilic organisms known so far that are poorly amenable to cultivation. Earlier culture-independent studies in Iron Mountain (California) pointed at an abundant archaeal group, dubbed 'G-plasma'. We examined the genomes and physiology of two cultured representatives of a Family Cuniculiplasmataceae, recently isolated from acidic (pH 1-1.5) sites in Spain and UK that are 16S rRNA gene sequence-identical with 'G-plasma'. Organisms had largest genomes among Thermoplasmatales (1.87-1.94 Mbp), that shared 98.7-98.8% average nucleotide identities between themselves and 'G-plasma' and exhibited a high genome conservation even within their genomic islands, despite their remote geographical localisations. Facultatively anaerobic heterotrophs, they possess an ancestral form of A-type terminal oxygen reductase from a distinct parental clade. The lack of complete pathways for biosynthesis of histidine, valine, leucine, isoleucine, lysine and proline pre-determines the reliance on external sources of amino acids and hence the lifestyle of these organisms as scavengers of proteinaceous compounds from surrounding microbial community members. In contrast to earlier metagenomics-based assumptions, isolates were S-layer-deficient, non-motile, non-methylotrophic and devoid of iron-oxidation despite the abundance of methylotrophy substrates and ferrous iron in situ, which underlines the essentiality of experimental validation of bioinformatic predictions

    Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission.

    Get PDF
    Genome sequencing is revolutionizing clinical microbiology and our understanding of infectious diseases. Previous studies have largely relied on the sequencing of a single isolate from each individual. However, it is not clear what degree of bacterial diversity exists within, and is transmitted between individuals. Understanding this 'cloud of diversity' is key to accurate identification of transmission pathways. Here, we report the deep sequencing of methicillin-resistant Staphylococcus aureus among staff and animal patients involved in a transmission network at a veterinary hospital. We demonstrate considerable within-host diversity and that within-host diversity may rise and fall over time. Isolates from invasive disease contained multiple mutations in the same genes, including inactivation of a global regulator of virulence and changes in phage copy number. This study highlights the need for sequencing of multiple isolates from individuals to gain an accurate picture of transmission networks and to further understand the basis of pathogenesis.Thanks to Dr Alex O’Neill, University of Leeds and Dr Matthew Ellington, Public Health England for provision of RN4220 and RN4200mutS. We thank the core sequencing and informatics team at the Wellcome Trust Sanger Institute for sequencing of the isolates described in this study. This work was supported by a Medical Research Council Partnership grant (G1001787/1) held between the Department of Veterinary Medicine, University of Cambridge (M.A.H.), the School of Clinical Medicine, University of Cambridge (S.J.P.), the Moredun Research Institute, and the Wellcome Trust Sanger Institute (J.P. and S.J.P). S.J.P. receives support from the NIHR Cambridge Biomedical Research Centre. M.T.G.H., S.R.H. and J.P. were funded by Wellcome Trust grant no. 098051. G.G.R.M. was funded by an MRC studentship.This is the final version of the article. It first appeared from Nature Publishing Group via http://dx.doi.org/10.1038/ncomms756

    Evaluation of next-generation sequencing software in mapping and assembly

    Get PDF
    Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.published_or_final_versio

    Exploring the Zoonotic Potential of Mycobacterium avium Subspecies paratuberculosis through Comparative Genomics

    Get PDF
    A comparative genomics approach was utilised to compare the genomes of Mycobacterium avium subspecies paratuberculosis (MAP) isolated from early onset paediatric Crohn's disease (CD) patients as well as Johne's diseased animals. Draft genome sequences were produced for MAP isolates derived from four CD patients, one ulcerative colitis (UC) patient, and two non-inflammatory bowel disease (IBD) control individuals using Illumina sequencing, complemented by comparative genome hybridisation (CGH). MAP isolates derived from two bovine and one ovine host were also subjected to whole genome sequencing and CGH. All seven human derived MAP isolates were highly genetically similar and clustered together with one bovine type isolate following phylogenetic analysis. Three other sequenced isolates (including the reference bovine derived isolate K10) were genetically distinct. The human isolates contained two large tandem duplications, the organisations of which were confirmed by PCR. Designated vGI-17 and vGI-18 these duplications spanned 63 and 109 open reading frames, respectively. PCR screening of over 30 additional MAP isolates (3 human derived, 27 animal derived and one environmental isolate) confirmed that vGI-17 and vGI-18 are common across many isolates. Quantitative real-time PCR of vGI-17 demonstrated that the proportion of cells containing the vGI-17 duplication varied between 0.01 to 15% amongst isolates with human isolates containing a higher proportion of vGI-17 compared to most animal isolates. These findings suggest these duplications are transient genomic rearrangements. We hypothesise that the over-representation of vGI-17 in human derived MAP strains may enhance their ability to infect or persist within a human host by increasing genome redundancy and conferring crude regulation of protein expression across biologically important regions
    corecore