10 research outputs found

    Using semantic web technology to accelerate plant breeding

    Get PDF
    One goal within plant breeding is to find the causal gene(s) explaining a given phenotype. Semantic web technology brings opportu- nities to integration data and information accross spread data sources. Chebi2gene and Marker2sequence are two applications relying on this se- mantic web technology to integration genes, proteins, metabolites, path- ways, literature. Their web-based interface allows biologists to use and explore this network of information

    Bioinformatics assisted breeding, from QTL to candidate genes

    Get PDF
    Over the last decade, the amount of data generated by a single run of a NGS sequencer outperforms days of work done with Sanger sequencing. Metabolomics, proteomics and transcriptomics technologies have also involved producing more and more information at an ever faster rate. In addition, the number of databases available to biologists and breeders is increasing every year. The challenge for them becomes two-fold, namely: to cope with the increased amount of data produced by these new technologies and to cope with the distribution of the information across the Web. An example of a study with a lot of ~omics data is described in Chapter 2, where more than 600 peaks have been measured using liquid chromatography mass-spectrometry (LCMS) in peel and flesh of a segregating F1apple population. In total, 669 mQTL were identified in this study. The amount of mQTL identified is vast and almost overwhelming. Extracting meaningful information from such an experiment requires appropriate data filtering and data visualization techniques. The visualization of the distribution of the mQTL on the genetic map led to the discovery of QTL hotspots on linkage group: 1, 8, 13 and 16. The mQTL hotspot on linkage group 16 was further investigated and mainly contained compounds involved in the phenylpropanoid pathway. The apple genome sequence and its annotation were used to gain insight in genes potentially regulating this QTL hotspot. This led to the identification of the structural gene leucoanthocyanidin reductase (LAR1) as well as seven genes encoding transcription factors as putative candidates regulating the phenylpropanoid pathway, and thus candidates for the biosynthesis of health beneficial compounds. However, this study also indicated bottlenecks in the availability of biologist-friendly tools to visualize large-scale QTL mapping results and smart ways to mine genes underlying QTL intervals. In this thesis, we provide bioinformatics solutions to allow exploration of regions of interest on the genome more efficiently. In Chapter 3, we describe MQ2, a tool to visualize results of large-scale QTL mapping experiments. It allows biologists and breeders to use their favorite QTL mapping tool such as MapQTL or R/qtl and visualize the distribution of these QTL among the genetic map used in the analysis with MQ2. MQ2provides the distribution of the QTL over the markers of the genetic map for a few hundreds traits. MQ2is accessible online via its web interface but can also be used locally via its command line interface. In Chapter 4, we describe Marker2sequence (M2S), a tool to filter out genes of interest from all the genes underlying a QTL. M2S returns the list of genes for a specific genome interval and provides a search function to filter out genes related to the provided keyword(s) by their annotation. Genome annotations often contain cross-references to resources such as the Gene Ontology (GO), or proteins of the UniProt database. Via these annotations, additional information can be gathered about each gene. By integrating information from different resources and offering a way to mine the list of genes present in a QTL interval, M2S provides a way to reduce a list of hundreds of genes to possibly tens or less of genes potentially related to the trait of interest. Using semantic web technologies M2S integrates multiple resources and has the flexibility to extend this integration to more resources as they become available to these technologies. Besides the importance of efficient bioinformatics tools to analyze and visualize data, the work in Chapter 2also revealed the importance of regulatory elements controlling key genes of pathways. The limitation of M2S is that it only considers genes within the interval. In genome annotations, transcription factors are not linked to the trait (keyword) and to the gene it controls, and these relationships will therefore not be considered. By integrating information about the gene regulatory network of the organism into Marker2sequence, it should be able to integrate in its list of genes, genes outside of the QTL interval but regulated by elements present within the QTL interval. In tomato, the genome annotation already lists a number of transcription factors, however, it does not provide any information about their target. In Chapter 5, we describe how we combined transcriptomics information with six genotypes from an Introgression Line (IL) population to find genes differentially expressed while being in a similar genomic background (i.e.: outside of any introgression segments) as the reference genotype (with no introgression). These genes may be differentially expressed as a result of a regulatory element present in an introgression. The promoter regions of these genes have been analyzed for DNA motifs, and putative transcription factor binding sites have been found. The approaches taken in M2S (Chaper 4) are focused on a specific region of the genome, namely the QTL interval. In Chapter 6, we generalized this approach to develop Annotex. Annotex provides a simple way to browse the cross-references existing between biological databases (ChEBI, Rhea, UniProt, GO) and genome annotations. The main concept of Annotex being, that from any type of data present in the databases, one can navigate the cross-references to retrieve the desired type of information. This thesis has resulted in the production of three tools that biologists and breeders can use to speed up their research and build new hypothesis on. This thesis also revealed the state of bioinformatics with regards to data integration. It also reveals the need for integration into annotations (for example, genome annotations, protein annotations, and pathway annotations) of more ontologies than just the Gene Ontology (GO) currently used. Multiple platforms are arising to build these new ontologies but the process of integrating them into existing resources remains to be done. It also confirms the state of the data in plants where multiples resources may contain overlapping. Finally, this thesis also shows what can be achieved when the data is made inter-operable which should be an incentive to the community to work together and build inter-operable, non-overlapping resources, creating a bioinformatics Web for plant research.</p

    Combined biotic and abiotic stress resistance in tomato

    Get PDF
    Abiotic and biotic stress factors are the major constrains for the realization of crop yield potential. As climate change progresses, the spread and intensity of abiotic as well as biotic stressors is expected to increase, with increased probability of crops being exposed to both types of stress. Shielding crops from combinatorial stress requires a better understanding of the plant’s response and its genetic architecture. In this study, we evaluated resistance to salt stress, powdery mildew and to both stresses combined in tomato, using the Solanum habrochaites LYC4 introgression line (IL) population. The IL population segregated for both salt stress tolerance and powdery mildew resistance. Using SNP array marker data, QTLs were identified for salt tolerance as well as Na+ and Cl- accumulation. Salt stress increased the susceptibility of the population to powdery mildew in an additive manner. Phenotypic variation for disease resistance was reduced under combined stress as indicated by the coefficient of variation. No correlation was found between disease resistance and Na+ and Cl- accumulation under combined stress Most genetic loci were specific for either salt stress tolerance or powdery mildew resistance. These findings increase our understanding of the genetic regulation of responses to abiotic and biotic stress combinations and can provide leads to more efficiently breeding tomatoes and other crops with a high level of disease resistance while maintaining their performance in combination with abiotic stress

    Genomics data integration for knowledge discovery using genome annotations from molecular databases and scientific literature

    Get PDF
    One of the major global challenges of today is to meet the food demands of an ever increasing population (food demand will increase by 50% in 2030). One approach to address this challenge is to breed new crop varieties that yield more even under unfavorable conditions e.g. have improved tolerance to drought and/or resistance to pathogens. However, designing a breeding program is a laborious and time consuming effort that often lacks the capacity to generate new cultivars quickly in response to the required traits. Recent advances in biotechnology and genomics data science have the potential to accelerate and precise breeding programs greatly. As large-scale genomic data sets for crop species are available in multiple independent data sources and scientific literature, this thesis provides innovative technologies that use natural language processing (NLP) and semantic web technologies to address challenges of integrating genomic data for improving plant breeding. Firstly, in this research study, we developed a supervised Natural language processing (NLP) model with the help of IBM Watson, to extract knowledge networks containing genotypic-phenotypic associations of potato tuber flesh color from the scientific literature. Secondly, a table mining tool called QTLTableMiner++ (QTM) was developed which enables knowledge discovery of novel genomic regions (such as QTL regions), which positively or negatively affect the traits of interest. The objective of both above mentioned, NLP techniques was to extract information which is implicitly described in the literature and is not available in structured resources, like databases. Thirdly, with the help of semantic web technology, a linked-data platform called Solanaceae linked data platform(pbg-ld) was developed, to semantically integrates geno- and pheno-typic data of Solanaceae species. This platform combines both unstructured data from scientific literature and structured data from publicly available biological databases using the Linked Data approach. Lastly, analysis workflows for prioritizing candidate genes with QTL regions were tested using pbg-ld. Hence, this research provides in-silico knowledge discovery tools and genomic data infrastructure, which aids researchers and breeders in the design of a precise and improved breeding program.</p

    Whitefly resistance in tomato: from accessions to mechanisms

    Get PDF
    Tomato (Solanum lycopersicum) is affected by a wide range of biotic stresses, of which Bemisia tabaci is one of the most important.Bemisia tabaci affects tomato directly through phloem sap feeding, and indirectly through its ability to be the vector of a large number of viruses. Different methods are available for whitefly control, and although several biological control agents are used against whiteflies in greenhouse cultivation, chemical control still is an essential component in open field tomato production. Breeding for host plant resistance is considered as one of the most promising methods in insect pest control in crop plants, and especially it is a promising alternative in whitefly control. Resistance to whiteflies was found in several wild relatives of tomato like Solanum peruvianum, S. pennellii, S. habrochaites, S. lycopersicum var. cerasiforme, S. pimpinellifolium andS. galapagense. In spite of previous breeding efforts, whiteflies are still a problem in tomato cultivation. The aim of my research was to identify and understand resistance mechanisms targeting specific stages of the whitefly life cycle in order to provide breeders with tools for developing whitefly resistant varieties. I assessed the natural variation and whitefly resistance in Solanum galapagense and S. cheesmaniae, two wild tomato species endemic to the Galapagos Islands. Previously, Solanum galapagense and S. cheesmaniae were classified as two species based on a morphological species concept, but with molecular markers no clear separation could be made. So far, only a limited number of accessions/populations of S. galapagense and S. cheesmaniae have been evaluated for insect resistance and therefore it was unknown if the insect resistance coincides with the morphological species boundaries. Neither was there any knowledge about the relation between geographical and climatic conditions today on the Galapagos and the occurrence of the two species. We characterized twelve accessions of S. galapagense, 22 of S. cheesmaniae, and as reference one of S. lycopersicum for whitefly resistance using no-choice experiments. Whitefly resistance was found in S. galapagense only and was associated with the presence of relatively high levels of acyl sugars and the presence of glandular trichomes of type I and IV.It is likely that a minimum level of acyl sugars and the presence of glandular trichomes type IV are needed to achieve an effective level of resistance. Genetic fingerprinting using 3316 polymorphic SNP markers did not show a clear differentiation between the two species endemic to the Galapagos. Acyl sugar accumulation as well as the climatic and geographical conditions at the collection sites of the accessions did not follow the morphological species boundaries. Altogether, our results suggest that S. galapagense and S. cheesmaniae might be considered as morphotypes rather than two species and that their co-existence is likely the result of selective pressure. Plants possess several resistance mechanisms acting at different time points during the interaction with herbivorous insect. Before any contact with the insects, plants emit an array of volatile organic compounds that can act as attractant or repellent of insects.Bemisia tabaci use a set of plant-derived cues in the process of host plant selection. It recognizes mainly monoterpenes (p-cymene, γ-terpinene and β-myrcene, α-phellandrene) and sesquiterpenes (7-epizingiberene and R-curcumene). Previously the line FCN93-6-2, which was derived from a cross between a susceptible tomato cultivar (Uco Plata INTA) and S. habrochaites (FCN3-5) was proved to be non-preferred by the greenhouse whitefly Trialeurodes vaporariorum. We identified chemical cues produced by FCN93-6-2 and S. habrochaites that can affect the preference of the whitefly B. tabaci as well as the potential chromosomal region(s) of S. habrochaites harbouring the genes involved in the preference. Two S. habrochaites accessions (CGN1.1561 and in FCN3-5) and the line FCN93-6-2 were non-preferred by B. tabaci when the whiteflies could get in direct contact with the plant and also when the whiteflies were offered olfactory cues only. The non-preference was independent of trichome type IV and of the presence of methyl-ketones but associated to the presence of monoterpenes in lower concentrations. Functional validation of the candidate metabolites and of the different introgressions is still needed. Once the insect has landed on a plant, another set of resistance mechanisms enter into action. We have described a 3.06 Mbp introgression on top of Chromosome 5 (OR-5) from the wild tomato species S. habrochaites (CGN1.1561). For the identification of OR-5, we went from the selection of specific F2 plants to the development of F2BC4S1 and F2BC4S2 families. This introgression was sufficient to reduce whitefly fecundity without an evident effect on whitefly survival. The identification of mechanisms exclusively affecting whitefly fecundity and independent of trichomes type IV opens new doors for resistance breeding to whiteflies that may be especially interesting in greenhouse cultivation combination with natural enemies of the whitefly. As an additional layer of defences, plants can perceive stress signals and respond to them in a specific way through induction of their immune system. This induction can also be triggered by exposing the plants to priming agents like hormones, some xenobiotic chemicals, like benzothiadiazole (BTH), β-aminobutyric acid (BABA), and sugars. Although the effect of priming agents was shown in laboratory and field studies, little is known about the effect of the genetic background of tomato on the extent of the priming, e.g. do genotypes varying in their level of resistance to insects and pathogens respond in the same way to a priming agent. We assessed the effect of selected priming agents on the effectiveness of natural defence in tomato. A set of no-choice and choice bioassays was conducted using tomato genotypes varying in their level of basal resistance to Bemisia tabaci and pathogens. We observed that whitefly survival and oviposition were not affected by the priming treatment in no-choice assays. Overall, in choice assays, fructose treated plants were more preferred by whiteflies than control plants. A genotype specific effect of priming was seen for the line FCN93-6-2. On this tomato line, JA and BABA applications decreased the number of whiteflies, e.g. making them less preferred. In this thesis, I have gone from the screening of wild relatives of tomatoes to in depth characterization of resistance mechanisms. I have identified resistance mechanisms targeting specific stages of the whitefly life cycle, thus providing new tools for breeding durable whitefly resistance in tomato.</p

    Marker2sequence, mine your QTL regions for candidate genes

    No full text
    Marker2sequence (M2S) aims at mining quantitative trait loci (QTLs) for candidate genes. For each gene, within the QTL region, M2S uses data integration technology to integrate putative gene function with associated gene ontology terms, proteins, pathways and literature. As a typical QTL region easily contains several hundreds of genes, this gene list can then be further filtered using a keyword-based query on the aggregated annotations. M2S will help breeders to identify potential candidate genes for their traits of interest
    corecore