584 research outputs found

    The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables

    Get PDF
    AbstractThere is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1mg/wk) and the specialty database (10.4mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care

    M-BISON: Microarray-based integration of data sources using networks

    Get PDF
    BACKGROUND: The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes. RESULTS: M-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response. CONCLUSION: This work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources

    Bioinformatics

    Get PDF
    This article is made available for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.After reading this chapter, you should know the answers to these questions: Why is sequence, structure, and biological pathway information relevant to medicine? Where on the Internet should you look for a DNA sequence, a protein sequence, or a protein structure? What are two problems encountered in analyzing biological sequence, structure, and function? How has the age of genomics changed the landscape of bioinformatics? What two changes should we anticipate in the medical record as a result of these new information sources? What are two computational challenges in bioinformatics for the future

    Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The emergence of structural genomics presents significant challenges in the annotation of biologically uncharacterized proteins. Unfortunately, our ability to analyze these proteins is restricted by the limited catalog of known molecular functions and their associated 3D motifs.</p> <p>Results</p> <p>In order to identify novel 3D motifs that may be associated with molecular functions, we employ an unsupervised, two-phase clustering approach that combines k-means and hierarchical clustering with knowledge-informed cluster selection and annotation methods. We applied the approach to approximately 20,000 cysteine-based protein microenvironments (3D regions 7.5 Å in radius) and identified 70 interesting clusters, some of which represent known motifs (<it>e.g</it>. metal binding and phosphatase activity), and some of which are novel, including several zinc binding sites. Detailed annotation results are available online for all 70 clusters at <url>http://feature.stanford.edu/clustering/cys</url>.</p> <p>Conclusions</p> <p>The use of microenvironments instead of backbone geometric criteria enables flexible exploration of protein function space, and detection of recurring motifs that are discontinuous in sequence and diverse in structure. Clustering microenvironments may thus help to functionally characterize novel proteins and better understand the protein structure-function relationship.</p

    Time to Organize the Bioinformatics Resourceome

    Get PDF
    The initial steps toward a bioinformatics resourceome are clear. First, an overall ontology with the high-level concepts (algorithms, databases, organizations, papers, people, etc.) must be created, with a set of standard attributes and a standard set of relations between these concepts (e.g., people publish papers, papers describe algorithms or databases, organizations house people, etc.). The initial ontology should be compact and built for distributed collaborative extension. Second, a mechanism for people to extend this ontology with subconcepts in order to describe their own resources should be designed. The precise location of a tool within a taxonomy is not critical—the author will place it somewhere based on the location of similar/competing resources or based on a best-informed guess. Others may create links to the resource from other appropriate locations in the taxonomy in order to ensure that competing interpretations of the appropriate conceptual location for the resource are accommodated. Third, the formats for the ontologies and the resource descriptions should be published so enterprising software engineers can create interfaces for surfing, searching, and viewing the resources. The resulting distributed system of resource descriptions would be extensible, robust, and useful to the entire biomedical research community

    The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation

    Get PDF
    SeqFEATURE, a tool for protein function annotation, models protein functions described by sequence motifs using a structural representation. The tool shows significantly improved performance over other methods when sequence and structural similarity are low
    corecore