81 research outputs found

    Predicting disease risks from highly imbalanced data using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.</p> <p>Methods</p> <p>We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.</p> <p>Results</p> <p>We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.</p> <p>Conclusions</p> <p>In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p

    A flexible mathematical model platform for studying branching networks : experimentally validated using the model actinomycete, Streptomyces coelicolor

    Get PDF
    Branching networks are ubiquitous in nature and their growth often responds to environmental cues dynamically. Using the antibiotic-producing soil bacterium Streptomyces as a model we have developed a flexible mathematical model platform for the study of branched biological networks. Streptomyces form large aggregates in liquid culture that can impair industrial antibiotic fermentations. Understanding the features of these could aid improvement of such processes. The model requires relatively few experimental values for parameterisation, yet delivers realistic simulations of Streptomyces pellet and is able to predict features, such as the density of hyphae, the number of growing tips and the location of antibiotic production within a pellet in response to pellet size and external nutrient supply. The model is scalable and will find utility in a range of branched biological networks such as angiogenesis, plant root growth and fungal hyphal networks

    Bistability versus Bimodal Distributions in Gene Regulatory Processes from Population Balance

    Get PDF
    In recent times, stochastic treatments of gene regulatory processes have appeared in the literature in which a cell exposed to a signaling molecule in its environment triggers the synthesis of a specific protein through a network of intracellular reactions. The stochastic nature of this process leads to a distribution of protein levels in a population of cells as determined by a Fokker-Planck equation. Often instability occurs as a consequence of two (stable) steady state protein levels, one at the low end representing the “off” state, and the other at the high end representing the “on” state for a given concentration of the signaling molecule within a suitable range. A consequence of such bistability has been the appearance of bimodal distributions indicating two different populations, one in the “off” state and the other in the “on” state. The bimodal distribution can come about from stochastic analysis of a single cell. However, the concerted action of the population altering the extracellular concentration in the environment of individual cells and hence their behavior can only be accomplished by an appropriate population balance model which accounts for the reciprocal effects of interaction between the population and its environment. In this study, we show how to formulate a population balance model in which stochastic gene expression in individual cells is incorporated. Interestingly, the simulation of the model shows that bistability is neither sufficient nor necessary for bimodal distributions in a population. The original notion of linking bistability with bimodal distribution from single cell stochastic model is therefore only a special consequence of a population balance model

    Ensemble Analysis of Angiogenic Growth in Three-Dimensional Microfluidic Cell Cultures

    Get PDF
    We demonstrate ensemble three-dimensional cell cultures and quantitative analysis of angiogenic growth from uniform endothelial monolayers. Our approach combines two key elements: a micro-fluidic assay that enables parallelized angiogenic growth instances subject to common extracellular conditions, and an automated image acquisition and processing scheme enabling high-throughput, unbiased quantification of angiogenic growth. Because of the increased throughput of the assay in comparison to existing three-dimensional morphogenic assays, statistical properties of angiogenic growth can be reliably estimated. We used the assay to evaluate the combined effects of vascular endothelial growth factor (VEGF) and the signaling lipid sphingoshine-1-phosphate (S1P). Our results show the importance of S1P in amplifying the angiogenic response in the presence of VEGF gradients. Furthermore, the application of S1P with VEGF gradients resulted in angiogenic sprouts with higher aspect ratio than S1P with background levels of VEGF, despite reduced total migratory activity. This implies a synergistic effect between the growth factors in promoting angiogenic activity. Finally, the variance in the computed angiogenic metrics (as measured by ensemble standard deviation) was found to increase linearly with the ensemble mean. This finding is consistent with stochastic agent-based mathematical models of angiogenesis that represent angiogenic growth as a series of independent stochastic cell-level decisions

    Modern temporal network theory: A colloquium

    Full text link
    The power of any kind of network approach lies in the ability to simplify a complex system so that one can better understand its function as a whole. Sometimes it is beneficial, however, to include more information than in a simple graph of only nodes and links. Adding information about times of interactions can make predictions and mechanistic understanding more accurate. The drawback, however, is that there are not so many methods available, partly because temporal networks is a relatively young field, partly because it more difficult to develop such methods compared to for static networks. In this colloquium, we review the methods to analyze and model temporal networks and processes taking place on them, focusing mainly on the last three years. This includes the spreading of infectious disease, opinions, rumors, in social networks; information packets in computer networks; various types of signaling in biology, and more. We also discuss future directions.Comment: Final accepted versio

    3D Multi-Cell Simulation of Tumor Growth and Angiogenesis

    Get PDF
    We present a 3D multi-cell simulation of a generic simplification of vascular tumor growth which can be easily extended and adapted to describe more specific vascular tumor types and host tissues. Initially, tumor cells proliferate as they take up the oxygen which the pre-existing vasculature supplies. The tumor grows exponentially. When the oxygen level drops below a threshold, the tumor cells become hypoxic and start secreting pro-angiogenic factors. At this stage, the tumor reaches a maximum diameter characteristic of an avascular tumor spheroid. The endothelial cells in the pre-existing vasculature respond to the pro-angiogenic factors both by chemotaxing towards higher concentrations of pro-angiogenic factors and by forming new blood vessels via angiogenesis. The tumor-induced vasculature increases the growth rate of the resulting vascularized solid tumor compared to an avascular tumor, allowing the tumor to grow beyond the spheroid in these linear-growth phases. First, in the linear-spherical phase of growth, the tumor remains spherical while its volume increases. Second, in the linear-cylindrical phase of growth the tumor elongates into a cylinder. Finally, in the linear-sheet phase of growth, tumor growth accelerates as the tumor changes from cylindrical to paddle-shaped. Substantial periods during which the tumor grows slowly or not at all separate the exponential from the linear-spherical and the linear-spherical from the linear-cylindrical growth phases. In contrast to other simulations in which avascular tumors remain spherical, our simulated avascular tumors form cylinders following the blood vessels, leading to a different distribution of hypoxic cells within the tumor. Our simulations cover time periods which are long enough to produce a range of biologically reasonable complex morphologies, allowing us to study how tumor-induced angiogenesis affects the growth rate, size and morphology of simulated tumors

    Brachypodium distachyon as a model for defining the allergen potential of non-prolamin proteins

    Get PDF
    Epitope databases and the protein sequences of published plant genomes are suitable to identify some of the proteins causing food allergies and sensitivities. Brachypodium distachyon, a diploid wild grass with a sequenced genome and low prolamin content, is the closest relative of the allergen cereals, such as wheat or barley. Using the Brachypodium genome sequence, a workflow has been developed to identify potentially harmful proteins which may cause either celiac disease or wheat allergy-related symptoms. Seed tissue-specific expression of the potential allergens has been determined, and intact epitopes following an in silico digestion with several endopeptidases have been identified. Molecular function of allergen proteins has been evaluated using Gene Ontology terms. Biologically overrepresented proteins and potentially allergen protein families have been identified. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10142-012-0294-z) contains supplementary material, which is available to authorized users

    Management of Patients With Crohn's Disease and Ulcerative Colitis During the Coronavirus Disease-2019 Pandemic: Results of an International Meeting

    Get PDF
    The International Organization for the Study of Inflammatory Bowel Diseases (IOIBD) is the only global organization devoted to the study of and management of the inflammatory bowel diseases (IBDs), namely, Crohn?s disease and ulcerative colitis. Membership is composed of physician-scientists who have established expertise in these diseases. The organization hosts an annual meeting and a number of working groups addressing issues of the epidemiology of IBD, diet and nutrition, and the development and use of treatments for IBD. There are currently 89 members of IOIBD representing 26 different countries. The organization has taken particular interest in the coronavirus disease-2019 (COVID-19) pandemic and how it may affect the IBD patient population. This document summarizes the results of 2 recent virtual meetings of the group and subsequent expert guidance for patients and providers
    corecore