436 research outputs found

    Clinical Bioinformatics: challenges and opportunities

    Get PDF
    Background: Network Tools and Applications in Biology (NETTAB) Workshops are a series of meetings focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics. The NETTAB 2011 workshop, held in Pavia, Italy, in October 2011 was aimed at presenting some of the most relevant methods, tools and infrastructures that are nowadays available for Clinical Bioinformatics (CBI), the research field that deals with clinical applications of bioinformatics. Methods: In this editorial, the viewpoints and opinions of three world CBI leaders, who have been invited to participate in a panel discussion of the NETTAB workshop on the next challenges and future opportunities of this field, are reported. These include the development of data warehouses and ICT infrastructures for data sharing, the definition of standards for sharing phenotypic data and the implementation of novel tools to implement efficient search computing solutions. Results: Some of the most important design features of a CBI-ICT infrastructure are presented, including data warehousing, modularity and flexibility, open-source development, semantic interoperability, integrated search and retrieval of –omics information. Conclusions: Clinical Bioinformatics goals are ambitious. Many factors, including the availability of high-throughput “-omics” technologies and equipment, the widespread availability of clinical data warehouses and the noteworthy increase in data storage and computational power of the most recent ICT systems, justify research and efforts in this domain, which promises to be a crucial leveraging factor for biomedical research

    Combining Unsupervised and Supervised Learning for Discovering Disease Subclasses

    Get PDF
    Diseases are often umbrella terms for many subcategories of disease. The identification of these subcategories is vital if we are to develop personalised treatments that are better focussed on individual patients. In this short paper, we explore the use of a combination of unsupervised learning to identify potential subclasses, and supervised learning to build models for better predicting a number of different health outcomes for patients that suffer from systemic sclerosis, a rare chronic connective tissue disorder - but one that shares many characteristics with other diseases. We explore a number of different algorithms for constructing models that simultaneously predict health outcomes and identify subcategories

    Phenotype forecasting with SNPs data through gene-based Bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.</p> <p>Results</p> <p>We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.</p> <p>Conclusion</p> <p>The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.</p

    Phenotype forecasting with SNPs data through gene-based Bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.</p> <p>Results</p> <p>We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.</p> <p>Conclusion</p> <p>The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.</p

    A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays

    Get PDF
    BACKGROUND: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples. RESULTS: We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset. CONCLUSION: The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes

    Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

    Get PDF
    Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that “nearest consensus clustering classification” improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods

    Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

    Get PDF
    Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that "nearest consensus clustering classification" improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods

    Review of rheological behaviour of sewage sludge and its importance in the management of wastewater treatment plants

    Get PDF
    Abstract The process operation of wastewater treatment plants (WWTPs) is based on the proper set up of several physical, chemical and biological parameters. Often, issues and problems arising in the process are strictly linked to the rheological behaviour of sewage sludge (SeS). Therefore, rheological measurements, which recently have captured a growing interest, represent an important aspect to consider in the design and operation of WWTPs, especially in the sludge-handling processes. The knowledge of rheological behaviour of SeS represents a crucial step to better understand its flow behaviour and therefore optimize the performance of the processes, minimizing the costs. The SeS are non-Newtonian fluids and, to date, Bingham and Ostwald models are the most applied. This work presents an overview of scientific literature about the rheological properties of SeS and discusses the importance of its knowledge for the management of WWTPs

    Advancing Critical Care in the ICU: A Human-Centered Biomedical Data Visualization Systems

    Get PDF
    The purpose of this research is to provide medical clinicians with a new technology for interpreting large and diverse datasets to expedite critical care decision-making in the ICU. We refer to this technology as the medical information visualization assistant (MIVA). MIVA delivers multivariate biometric (bedside) data via a visualization display by transforming and organizing it into temporal resolutions that can provide contextual knowledge to clinicians. The result is a spatial organization of multiple datasets that allows rapid analysis and interpretation of trends. Findings from the usability study of the MIVA static prototype and heuristic inspection of the dynamic prototype suggest that using MIVA can yield faster and more accurate results. Furthermore, comments from the majority of the experimental group and the heuristic inspectors indicate that MIVA can facilitate clinical task flow in context-dependent health care settings

    Clusters of individuals recovering from an exacerbation of chronic obstructive pulmonary disease and response to in-hospital pulmonary rehabilitation

    Get PDF
    Introduction and objectives: Due to the present low availability of pulmonary rehabilitation (PR) for individuals recovering from a COPD exacerbation (ECOPD), we need admission priority criteria. We tested the hypothesis that these individuals might be clustered according to baseline characteristics to identify subpopulations with different responses to PR. Methods: Multicentric retrospective analysis of individuals undergone in-hospital PR. Baseline characteristics and outcome measures (six-minute walking test - 6MWT, Medical Research Council scale for dyspnoea -MRC, COPD assessment test -CAT) were used for clustering analysis. Results: Data analysis of 1159 individuals showed that after program, the proportion of individuals reaching the minimal clinically important difference (MCID) was 85.0%, 86.3%, and 65.6% for CAT, MRC, and 6MWT respectively. Three clusters were found (C1-severe: 10.9%; C2-intermediate: 74.4%; C3-mild: 14.7% of cases respectively). Cluster C1-severe showed the worst conditions with the largest post PR improvements in outcome measures; C3-mild showed the least severe baseline conditions, but the smallest improvements. The proportion of participants reaching the MCID in ALL three outcome measures was significantly different among clusters, with C1-severe having the highest proportion of full success (69.0%) as compared to C2-intermediate (48.3%) and C3-mild (37.4%). Participants in C2-intermediate and C1-severe had 1.7- and 4.6-fold increases in the probability to reach the MCID in all three outcomes as compared to those in C3-mild (OR&nbsp;=&nbsp;1.72, 95% confidence interval [95% CI]&nbsp;=&nbsp;1.2 - 2.49, p&nbsp;=&nbsp;0.0035 and OR&nbsp;=&nbsp;4.57, 95% CI&nbsp;=&nbsp;2.68 - 7.91, p &lt; 0.0001 respectively). Conclusions: Clustering analysis can identify subpopulations of individuals recovering from ECOPD associated with different responses to PR. Our results may help in defining priority criteria based on the probability of success of PR
    corecore