18 research outputs found

    Tersect: a set theoretical utility for exploring sequence variant data

    Get PDF
    Comparing genomic features among a large panel of individuals across the same species is considered nowadays a core part of the bioinformatics analyses. This typically involves a series of complex theoretical expressions to compare, intersect, extract symmetric differences between individuals within a large set of genotypes. Several publically available tools are capable of performing such tasks; however, due to the sheer size of variants being queried, such tasks can be computationally expensive with a runtime ranging from few minutes up to several hours depending on the dataset size. This makes existing tools unsuitable for interactive data query or as part of genomic data visualization platforms such as genome browsers. Tersect is a lightweight, high-performance command-line utility which interprets and applies flexible set theoretical expressions to sets of sequence variant data. It can be used both for interactive data exploration and as part of a larger pipeline thanks to its highly optimized storage and indexing algorithms for variant data

    De novo genome assembly of Solanum sitiens reveals structural variation associated with drought and salinity tolerance

    Get PDF
    Motivation: Solanum sitiens is a self-incompatible wild relative of tomato, characterised by salt and drought resistance traits, with the potential to contribute through breeding programmes to crop improvement in cultivated tomato. This species has a distinct morphology, classification and ecotype compared to other stress resistant wild tomato relatives such as S. pennellii and S. chilense. Therefore, the availability of a reference genome for S. sitiens will facilitate the genetic and molecular understanding of salt and drought resistance. Results: A high-quality de novo genome and transcriptome assembly for S. sitiens (Accession LA1974) has been developed. A hybrid assembly strategy was followed using Illumina short reads (~159X coverage) and PacBio long reads (~44X coverage), generating a total of ~262 Gbp of DNA sequence. A reference genome of 1,245 Mbp, arranged in 1,483 scaffolds with a N50 of 1.826 Mbp was generated. Genome completeness was estimated at 95% using the Benchmarking Universal Single-Copy Orthologs (BUSCO) and the K-mer Analysis Tool (KAT). In addition, ~63 Gbp of RNA-Seq were generated to support the prediction of 31,164 genes from the assembly, and to perform a de novo transcriptome. Lastly, we identified three large inversions compared to S. lycopersicum, containing several drought resistance related genes, such as beta-amylase 1 and YUCCA7. Availability: S. sitiens (LA1974) raw sequencing, transcriptome and genome assembly have been deposited at the NCBI’s Sequence Read Archive, under the BioProject number “PRJNA633104”

    BIFURCATE FLOWER TRUSS: a novel locus controlling inflorescence branching in tomato contains a defective MAP kinase gene

    Get PDF
    A mutant line, bifurcate flower truss (bif), was recovered from a tomato breeding program. Plants from the control line LAM183 produced a mean of 0.16 branches per truss, whereas the value for bif plants was 4.1. This increase in branching was accompanied by a 3.3-fold increase in flower number and showed a significant interaction with exposure to low temperature during truss development. The LAM183 and bif genomes were resequenced and the bif gene was mapped to a 2.01 Mbp interval on chromosome 12; all coding region polymorphisms in the interval were surveyed and five candidate genes displaying altered protein sequences were detected. One of these genes, SlMAPK1, encoding a MAP kinase, contained a leucine-to-stop codon mutation predicted to disrupt kinase function. SlMAPK1 is an excellent candidate for bif because knock-out mutations of an Arabidopsis orthologue MPK6 were reported to have increased flower number. An introgression browser was used to demonstrate that the origin of the bif genomic DNA at the BIF locus was Solanum galapagense and that the SlMAPK1 null mutant is a naturally occurring allele widespread only on the Galápagos Islands. This work strongly implicates SlMAPK1 as part of the network of genes controlling inflorescence branching in tomato

    Missense mutation of a class B heat shock factor is responsible for the tomato bushy root-2 phenotype

    Get PDF
    The bushy root-2 (brt-2) tomato mutant has twisting roots, and slower plant development. Here we used whole genome resequencing and genetic mapping to show that brt-2 is caused by a serine to cysteine (S75C) substitution in the DNA binding domain (DBD) of a heat shock factor class B (HsfB) encoded by SolycHsfB4a. This gene is orthologous to the Arabidopsis SCHIZORIZA gene, also known as AtHsfB4. The brt-2 phenotype is very similar to Arabidopsis lines in which the function of AtHsfB4 is altered: a proliferation of lateral root cap and root meristematic tissues, and a tendency for lateral root cap cells to easily separate. The brt-2 S75C mutation is unusual because all other reported amino acid substitutions in the highly conserved DBD of eukaryotic heat shock factors are dominant negative mutations, but brt-2 is recessive. We further show through reciprocal grafting that brt-2 exerts its effects predominantly through the root genotype even through BRT-2 is expressed at similar levels in both root and shoot meristems. Since AtHsfB4 is induced by root knot nematodes (RKN), and loss-of-function mutants of this gene are resistant to RKNs, BRT-2 could be a target gene for RKN resistance, an important trait in tomato rootstock breeding.Biotechnology and Biological Sciences Research Council (BBSRC): BB/L01954X/

    Genes involved in auxin biosynthesis, transport and signalling underlie the extreme adventitious root phenotype of the tomato aer mutant

    Get PDF
    The use of tomato rootstocks has helped to alleviate the soaring abiotic stresses provoked by the adverse effects of climate change. Lateral and adventitious roots can improve topsoil exploration and nutrient uptake, shoot biomass and resulting overall yield. It is essential to understand the genetic basis of root structure development and how lateral and adventitious roots are produced. Existing mutant lines with specific root phenotypes are an excellent resource to analyse and comprehend the molecular basis of root developmental traits. The tomato aerial roots (aer) mutant exhibits an extreme adventitious rooting phenotype on the primary stem. It is known that this phenotype is associated with restricted polar auxin transport from the juvenile to the more mature stem, but prior to this study, the genetic loci responsible for the aer phenotype were unknown. We used genomic approaches to define the polygenic nature of the aer phenotype and provide evidence that increased expression of specific auxin biosynthesis, transport and signalling genes in different loci causes the initiation of adventitious root primordia in tomato stems. Our results allow the selection of different levels of adventitious rooting using molecular markers, potentially contributing to rootstock breeding strategies in grafted vegetable crops, especially in tomato. In crops vegetatively propagated as cuttings, such as fruit trees and cane fruits, orthologous genes may be useful for the selection of cultivars more amenable to propagation.The research was supported by BBSRC—UKRI funding; the RootLINK (BB/L01954X/1) and AdRoot (BB/S007970/1) projects

    A chromosome-level genome assembly of Solanum chilense, a tomato wild relative associated with resistance to salinity and drought

    Get PDF
    Introduction: Solanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources. Methods: In this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads. Results: The resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production. Discussion: Gene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants.This work was jointly supported by the UK’s Biotechnology and Biological Sciences Research Council and the Indian Department of Biotechnology (BB/L011611/1)

    A chromosome-level genome assembly of Solanum chilense, a tomato wild relative associated with resistance to salinity and drought

    Get PDF
    IntroductionSolanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources.MethodsIn this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads. ResultsThe resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production. DiscussionGene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants

    Mutagenesis of Puccinia graminis f. sp. tritici and selection of gain-of-virulence mutants

    Get PDF
    Wheat stem rust caused by the fungus Puccinia graminis f. sp. tritici (Pgt), is regaining prominence due to the recent emergence of virulent isolates and epidemics in Africa, Europe and Central Asia. The development and deployment of wheat cultivars with multiple stem rust resistance (Sr) genes stacked together will provide durable resistance. However, certain disease resistance genes can suppress each other or fail in particular genetic backgrounds. Therefore, the function of each Sr gene must be confirmed after incorporation into an Sr-gene stack. This is difficult when using pathogen disease assays due to epistasis from recognition of multiple avirulence (Avr) effectors. Heterologous delivery of single Avr effectors can circumvent this limitation, but this strategy is currently limited by the paucity of cloned Pgt Avrs. To accelerate Avr gene cloning, we outline a procedure to develop a mutant population of Pgt spores and select for gain-of-virulence mutants. We used ethyl methanesulphonate (EMS) to mutagenize urediniospores and create a library of > 10,000 independent mutant isolates that were combined into 16 bulks of ~658 pustules each. We sequenced random mutants and determined the average mutation density to be 1 single nucleotide variant (SNV) per 258 kb. From this, we calculated that a minimum of three independently derived gain-of-virulence mutants is required to identify a given Avr gene. We inoculated the mutant library onto plants containing Sr43, Sr44, or Sr45 and obtained 9, 4, and 14 mutants with virulence toward Sr43, Sr44, or Sr45, respectively. However, only mutants identified on Sr43 and Sr45 maintained their virulence when reinolculated onto the lines from which they were identified. We further characterized 8 mutants with virulence toward Sr43. These also maintained their virulence profile on the stem rust international differential set containing 20 Sr genes, indicating that they were most likely not accidental contaminants. In conclusion, our method allows selecting for virulent mutants toward targeted resistance (R) genes. The development of a mutant library from as little as 320 mg spores creates a resource that enables screening against several R genes without the need for multiple rounds of spore multiplication and mutagenesis

    CRAMER: A lightweight, highly customisable web-based genome browser supporting multiple visualisation instances

    Get PDF
    In recent years the ability to generate genomic data has increased dramatically along with the demand for easily personalised and customisable genome browsers for effective visualisation of diverse types of data. Despite the large number of web-based genome browsers available nowadays, none of the existing tools provide means for creating multiple visualisation instances without manual set up on the deployment server side. The Cranfield Genome Browser (CRAMER) is an open-source, lightweight and highly customisable web application for interactive visualisation of genomic data. Once deployed, CRAMER supports seamless creation of multiple visualisation instances in parallel while allowing users to control and customise multiple tracks. The application is deployed on a Node.js server and is supported by a MongoDB database which stored all customisations made by the users allowing quick navigation between instances. Currently, the browser supports visualising a large number of file formats for genome annotation, variant calling, reads coverage and gene expression. Additionally, the browser supports direct Javascript coding for personalised tracks, providing a whole new level of customisation both functionally and visually. Tracks can be added via direct file upload or processed in real-time via links to files stored remotely on an FTP repository. Furthermore, additional tracks can be added by users via simple drag and drop to an existing visualisation instance

    Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction

    Get PDF
    The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches
    corecore