3,768 research outputs found

    How to understand the cell by breaking it: network analysis of gene perturbation screens

    Get PDF
    Modern high-throughput gene perturbation screens are key technologies at the forefront of genetic research. Combined with rich phenotypic descriptors they enable researchers to observe detailed cellular reactions to experimental perturbations on a genome-wide scale. This review surveys the current state-of-the-art in analyzing perturbation screens from a network point of view. We describe approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. The first part of the review describes methods to analyze one- or low-dimensional phenotypes like viability or reporter activity; the second part concentrates on high-dimensional phenotypes showing global changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio

    Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

    Get PDF
    With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

    A framework for gene mapping in wheat demonstrated using the Yr7 yellow rust resistance gene

    Get PDF
    We used three approaches to map the yellow rust resistance gene Yr7 and identify associated SNPs in wheat. First, we used a traditional QTL mapping approach using a double haploid (DH) population and mapped Yr7 to a low-recombination region of chromosome 2B. To fine map the QTL, we then used an association mapping panel. Both populations were SNP array genotyped allowing alignment of QTL and genome-wide association scans based on common segregating SNPs. Analysis of the association panel spanning the QTL interval, narrowed the interval down to a single haplotype block. Finally, we used mapping-by-sequencing of resistant and susceptible DH bulks to identify a candidate gene in the interval showing high homology to a previously suggested Yr7 candidate and to populate the Yr7 interval with a higher density of polymorphisms. We highlight the power of combining mapping-by-sequencing, delivering a complete list of gene-based segregating polymorphisms in the interval with the high recombination, low LD precision of the association mapping panel. Our mapping-by-sequencing methodology is applicable to any trait and our results validate the approach in wheat, where with a near complete reference genome sequence, we are able to define a small interval containing the causative gene

    Ontology-driven and weakly supervised rare disease identification from clinical notes

    Get PDF
    BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies

    Low-cost and automated phenotyping system “Phenomenon” for multi-sensor in situ monitoring in plant in vitro culture

    Get PDF
    Background: The current development of sensor technologies towards ever more cost-effective and powerful systems is steadily increasing the application of low-cost sensors in different horticultural sectors. In plant in vitro culture, as a fundamental technique for plant breeding and plant propagation, the majority of evaluation methods to describe the performance of these cultures are based on destructive approaches, limiting data to unique endpoint measurements. Therefore, a non-destructive phenotyping system capable of automated, continuous and objective quantification of in vitro plant traits is desirable. Results: An automated low-cost multi-sensor system acquiring phenotypic data of plant in vitro cultures was developed and evaluated. Unique hardware and software components were selected to construct a xyz-scanning system with an adequate accuracy for consistent data acquisition. Relevant plant growth predictors, such as projected area of explants and average canopy height were determined employing multi-sensory imaging and various developmental processes could be monitored and documented. The validation of the RGB image segmentation pipeline using a random forest classifier revealed very strong correlation with manual pixel annotation. Depth imaging by a laser distance sensor of plant in vitro cultures enabled the description of the dynamic behavior of the average canopy height, the maximum plant height, but also the culture media height and volume. Projected plant area in depth data by RANSAC (random sample consensus) segmentation approach well matched the projected plant area by RGB image processing pipeline. In addition, a successful proof of concept for in situ spectral fluorescence monitoring was achieved and challenges of thermal imaging were documented. Potential use cases for the digital quantification of key performance parameters in research and commercial application are discussed. Conclusion: The technical realization of “Phenomenon” allows phenotyping of plant in vitro cultures under highly challenging conditions and enables multi-sensory monitoring through closed vessels, ensuring the aseptic status of the cultures. Automated sensor application in plant tissue culture promises great potential for a non-destructive growth analysis enhancing commercial propagation as well as enabling research with novel digital parameters recorded over time

    Review:New sensors and data-driven approaches—A path to next generation phenomics

    Get PDF
    At the 4th International Plant Phenotyping Symposium meeting of the International Plant Phenotyping Network (IPPN) in 2016 at CIMMYT in Mexico, a workshop was convened to consider ways forward with sensors for phenotyping. The increasing number of field applications provides new challenges and requires specialised solutions. There are many traits vital to plant growth and development that demand phenotyping approaches that are still at early stages of development or elude current capabilities. Further, there is growing interest in low-cost sensor solutions, and mobile platforms that can be transported to the experiments, rather than the experiment coming to the platform. Various types of sensors are required to address diverse needs with respect to targets, precision and ease of operation and readout. Converting data into knowledge, and ensuring that those data (and the appropriate metadata) are stored in such a way that they will be sensible and available to others now and for future analysis is also vital. Here we are proposing mechanisms for “next generation phenomics” based on our learning in the past decade, current practice and discussions at the IPPN Symposium, to encourage further thinking and collaboration by plant scientists, physicists and engineering experts
    corecore