5 research outputs found

    Prioritising candidate genes causing QTL using hierarchical orthologous groups.

    Get PDF
    A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch. Supplementary data are available at Bioinformatics online

    Prioritising candidate genes causing QTL using hierarchical orthologous groups

    Get PDF
    Motivation: A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. // Results: This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. // Availability and implementation: QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch

    Prioritising candidate genes causing QTL using hierarchical orthologous groups

    No full text
    Abstract Motivation A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. Results This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. Availability and implementation QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch. Supplementary information Supplementary data are available at Bioinformatics online

    Identification of new candidate genes associated with metabolic traits applying a multiomics approach in the obese mouse model BFMI861

    Get PDF
    Hintergrund: Die Berlin Fat Mouse Inzuchtlinie (BFMI) ist ein Modell für Adipositas und das metabolische Syndrom. Diese Studie zielte darauf ab, genetische Varianten zu identifizieren, die mit dem gestörten Glukosestoffwechsel assoziiert sind, indem die fettleibigen Linien BFMI861-S1 und BFMI861-S2 verwendet wurden, die genetisch eng verwandt sind, sich aber in mehreren Merkmalen unterscheiden. BFMI861-S1 ist insulinresistent und speichert ektopisches Fett in der Leber, während BFMI861-S2 insulinsensitiv ist. Methoden: Die QTL-Analyse wurde in zwei fortgeschrittenen Intercross-Linien (AIL) in der Generation durchgeführt. Eine AIL wurde aus der Kreuzung BFMI861-S1 x BFMI861-S2 und die zweite AIL aus der Kreuzung BFMI861-S1 x BFMI861-B6N erhalten. Für beide AILs wurden Phänotypen über 25 bzw. 20 Wochen gesammelt. Zur Priorisierung von positionellen Kandidatengenen wurden Gesamtgenomsequenzierung und Genexpressionsdaten der Elternlinien verwendet. Ergebnisse: Für den AIL BFMI861-S1 x BFMI861-S2 wurden überlappende QTL für das Gonadenfettgewebegewicht und die Blutglukosekonzentration auf Chromosom (Chr) 3 (95,8–100,1 Mb) und für das Gonadenfettgewebegewicht, Lebergewicht und Blut nachgewiesen Glukosekonzentration auf Chr 17 (9,5–26,1 Mb). Für die AIL BFMI861-S1 x BFMI861-B6N zeigte ein hochsignifikanter QTL auf Chromosom (Chr) 1 (157–168 Mb) einen Zusammenhang mit dem Lebergewicht. Ein QTL für das Körpergewicht nach 20 Wochen wurde auf Chr 3 (34,1 – 40 Mb) gefunden, der sich mit einem QTL für das scAT-Gewicht überlappte. In einem multiplen QTL-Mapping-Ansatz wurde ein zusätzliches QTL, das das Körpergewicht bei 16 Wochen beeinflusste, auf Chr 6 (9,5–26,1 Mb) identifiziert. Schlussfolgerungen: Die QTL-Kartierung zusammen mit einem detaillierten Priorisierungsansatz ermöglichte es uns, Kandidatengene zu identifizieren, die mit Merkmalen des metabolischen Syndroms in beiden AILs assoziiert sind.Background: The Berlin Fat Mouse Inbred line (BFMI) is a model for obesity and the metabolic syndrome. This study aimed to identify genetic variants associated with the impaired glucose metabolism using the obese lines BFMI861-S1 and BFMI861-S2, which are genetically closely related, but differ in several traits. BFMI861-S1 is insulin resistant and stores ectopic fat in the liver, whereas BFMI861-S2 is insulin sensitive. Methods: QTL-analysis was performed in two advanced intercross lines (AIL) in generation. One AIL obtained from the cross BFMI861-S1 x BFMI861-S2 and the second AIL from the cross BFMI861-S1 x BFMI861-B6N. For both AILs phenotypes were collected over 25 and 20 weeks, respectively. For prioritization of positional candidate genes whole genome sequencing and gene expression data of the parental lines were used. Results: For the AIL BFMI861-S1 x BFMI861-S2 overlapping QTL for gonadal adipose tissue weight and blood glucose concentration were detected on chromosome (Chr) 3 (95.8-100.1 Mb), and for gonadal adipose tissue weight, liver weight, and blood glucose concentration on Chr 17 (9.5-26.1 Mb). For the AIL BFMI861-S1 x BFMI861-B6N one highly significant QTL on chromosome (Chr) 1 (157–168 Mb) showed an association with liver weight. A QTL for body weight at 20 weeks was found on Chr 3 (34.1 – 40 Mb) overlapping with a QTL for scAT weight. In a multiple QTL mapping approach, an additional QTL affecting body weight at 16 weeks was identified on Chr 6 (9.5-26.1 Mb). Conclusions: QTL mapping together with a detailed prioritization approach allowed us to identify candidate genes associated with traits of the metabolic syndrome in both AILs

    Genomics data integration for knowledge discovery using genome annotations from molecular databases and scientific literature

    Get PDF
    One of the major global challenges of today is to meet the food demands of an ever increasing population (food demand will increase by 50% in 2030). One approach to address this challenge is to breed new crop varieties that yield more even under unfavorable conditions e.g. have improved tolerance to drought and/or resistance to pathogens. However, designing a breeding program is a laborious and time consuming effort that often lacks the capacity to generate new cultivars quickly in response to the required traits. Recent advances in biotechnology and genomics data science have the potential to accelerate and precise breeding programs greatly. As large-scale genomic data sets for crop species are available in multiple independent data sources and scientific literature, this thesis provides innovative technologies that use natural language processing (NLP) and semantic web technologies to address challenges of integrating genomic data for improving plant breeding. Firstly, in this research study, we developed a supervised Natural language processing (NLP) model with the help of IBM Watson, to extract knowledge networks containing genotypic-phenotypic associations of potato tuber flesh color from the scientific literature. Secondly, a table mining tool called QTLTableMiner++ (QTM) was developed which enables knowledge discovery of novel genomic regions (such as QTL regions), which positively or negatively affect the traits of interest. The objective of both above mentioned, NLP techniques was to extract information which is implicitly described in the literature and is not available in structured resources, like databases. Thirdly, with the help of semantic web technology, a linked-data platform called Solanaceae linked data platform(pbg-ld) was developed, to semantically integrates geno- and pheno-typic data of Solanaceae species. This platform combines both unstructured data from scientific literature and structured data from publicly available biological databases using the Linked Data approach. Lastly, analysis workflows for prioritizing candidate genes with QTL regions were tested using pbg-ld. Hence, this research provides in-silico knowledge discovery tools and genomic data infrastructure, which aids researchers and breeders in the design of a precise and improved breeding program.</p
    corecore