842 research outputs found

    Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes

    Get PDF
    Genetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.&nbsp

    Whole-genome analysis of Fusarium graminearum insertional mutants identifies virulence associated genes and unmasks untagged chromosomal deletions

    Get PDF
    BACKGROUND: Identifying pathogen virulence genes required to cause disease is crucial to understand the mechanisms underlying the pathogenic process. Plasmid insertion mutagenesis of fungal protoplasts is frequently used for this purpose in filamentous ascomycetes. Post transformation, the mutant population is screened for loss of virulence to a specific plant or animal host. Identifying the insertion event has previously met with varying degrees of success, from a cleanly disrupted gene with minimal deletion of nucleotides at the insertion point to multiple-copy insertion events and large deletions of chromosomal regions. Currently, extensive mutant collections exist in laboratories globally where it was hitherto impossible to identify all the affected genes. RESULTS: We used a whole-genome sequencing (WGS) approach using Illumina HiSeq 2000 technology to investigate DNA tag insertion points and chromosomal deletion events in mutagenised, reduced virulence F. graminearum isolates identified in disease tests on wheat (Triticum aestivum). We developed the FindInsertSeq workflow to localise the DNA tag insertions to the nucleotide level. The workflow was tested using four mutants showing evidence of single and multi-copy insertions in DNA blot analysis. FindInsertSeq was able to identify both single and multi-copy concatenation insertion sites. By comparing sequencing coverage, unexpected molecular recombination events such as large tagged and untagged chromosomal deletions, and DNA amplification were observed in three of the analysed mutants. A random data sampling approach revealed the minimum genome coverage required to survey the F. graminearum genome for alterations. CONCLUSIONS: This study demonstrates that whole-genome re-sequencing to 22x fold genome coverage is an efficient tool to characterise single and multi-copy insertion mutants in the filamentous ascomycete Fusarium graminearum. In some cases insertion events are accompanied with large untagged chromosomal deletions while in other cases a straight-forward insertion event could be confirmed. The FindInsertSeq analysis workflow presented in this study enables researchers to efficiently characterise insertion and deletion mutants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1412-9) contains supplementary material, which is available to authorized users

    The Wheat GENIE3 Network Provides Biologically-Relevant Information in Polyploid Wheat

    Get PDF
    Gene regulatory networks are powerful tools which facilitate hypothesis generation and candidate gene discovery. However, the extent to which the network predictions are biologically relevant is often unclear. Recently a GENIE3 network which predicted targets of wheat transcription factors was produced. Here we used an independent RNA-Seq dataset to test the predictions of the wheat GENIE3 network for the senescence-regulating transcription factor NAM-A1 (TraesCS6A02G108300). We re-analyzed the RNA-Seq data against the RefSeqv1.0 genome and identified a set of differentially expressed genes (DEGs) between the wild-type and nam-a1 mutant which recapitulated the known role of NAM-A1 in senescence and nutrient remobilisation. We found that the GENIE3-predicted target genes of NAM-A1 overlap significantly with the DEGs, more than would be expected by chance. Based on high levels of overlap between GENIE3-predicted target genes and the DEGs, we identified candidate senescence regulators. We then explored genome-wide trends in the network related to polyploidy and found that only homeologous transcription factors are likely to share predicted targets in common. However, homeologs which vary in expression levels across tissues are less likely to share predicted targets than those that do not, suggesting that they may be more likely to act in distinct pathways. This work demonstrates that the wheat GENIE3 network can provide biologically-relevant predictions of transcription factor targets, which can be used for candidate gene prediction and for global analyses of transcription factor function. The GENIE3 network has now been integrated into the KnetMiner web application, facilitating its use in future studies

    A query suggestion workflow for life science IR-systems

    Get PDF
    Summary Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.</jats:p

    KnetMaps: a BioJS component to visualize biological knowledge networks

    Get PDF
    KnetMaps is a BioJS component for the interactive visualization of biological knowledge networks. It is well suited for applications that need to visualise complementary, connected and content-rich data in a single view in order to help users to traverse pathways linking entities of interest, for example to go from genotype to phenotype. KnetMaps loads data in JSON format, visualizes the structure and content of knowledge networks using lightweight JavaScript libraries, and supports interactive touch gestures. KnetMaps uses effective visualization techniques to prevent information overload and to allow researchers to progressively build their knowledge

    Daisychain Search and Interactive Visualisation of Homologs in Genome Assemblies

    Get PDF
    Daisychain is an interactive graph visualisation and search tool for custom-built gene homology databases. The main goal of Daisychain is to allow researchers working with specific genes to identify homologs in other annotation releases. The gene-centric representation includes local gene neighborhood to distinguish orthologs and paralogs by local synteny. The software supports genome sequences in FASTA format and GFF3 formatted annotation files, and the process of building the homology database requires a minimum amount of user interaction. Daisychain includes an integrated web viewer that can be used for both data analysis and data publishing. The web interface extends KnetMaps.js and is based on JavaScript

    Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach

    Get PDF
    The speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles)

    The Role of Trehalose 6-Phosphate in Crop Yield and Resilience

    Get PDF
    Significant increases in global food security require improving crop yields in favorable and poor conditions alike. However, it is challenging to increase both crop yield potential and yield resilience simultaneously, since the mechanisms that determine productivity and stress tolerance are typically inversely related. Carbon allocation and use may be amenable to improving yields in a range of conditions. The interaction between trehalose 6-phosphate (T6P) and SnRK1 (SNF1-related/ AMPK protein kinases) significantly affects the regulation of carbon allocation and utilization in plants. Targeting T6P appropriately to certain cell types, tissue types, and developmental stages results in an increase in both yield potential and resilience. Increasing T6P levels promotes flux through biosynthetic pathways associated with growth and yield, whereas decreasing T6P levels promotes themobilization of carbon reserves and the movement of carbon associated with stress responses. Genetic modification, gene discovery through quantitative trait locus mapping, and chemical intervention approaches have been used to modify the T6P pathway and improve crop performance under favourable conditions, drought, and flooding in the three main food security crops: wheat (Triticum aestivum), maize (Zea mays), and rice (Oryza sativa). Interestingly, both trehalose phosphate synthase (TPS) and trehalose phosphate phosphatase (TPP) genes are associated with maize domestication. A phylogenetic comparison of wheat TPS and TPP with eudicots and other cereals shows strong distinctions in wheat in both gene families. This Update highlights recent research examining the potential of the trehalose pathway in crop improvement and highlights an emerging strategy to increase cereal yields by targeting T6P in reproductive tissue
    • …
    corecore