11 research outputs found
A BAYESIAN APPROACH TO LEARNING DECISION TREES FOR PATIENT-SPECIFIC MODELS
A principal goal of precision medicine is to identify genomic factors that are predictive of outcomes in complex diseases, to provide better insight into their molecular mechanisms. Based on our current understanding, there are many genomic factors that are likely to be pathogenic in small subpopulations while being rare in the population as a whole. This research introduces a new machine learning method for discovering single nucleotide variants (SNVs), both common and rare, that in a given person are predictive of that person developing a disease or disease outcome.
The new method described in this research constructs decision tree models, uses a Bayesian score to evaluate the models, and employs a person-specific search strategy to identify SNVs that are predictive in a subpopulation whose members are similar to the person of interest. This method, called the Personalized Decision Tree Algorithm (PDTA), works by constructing a decision tree model from the data and then identifying a path in the tree that has excellent
prediction for the person of interest, or constructing a new path if none of the paths in the tree have excellent prediction.
The PDTA was refined iteratively on synthetic data and was experimentally evaluated on five datasets. One of the datasets was synthetic, one was semi-synthetic, and three were biological datasets collected from patients with chronic pancreatitis that included one small genomic dataset, a whole exome dataset, and a whole exome dataset focused on patients with diabetes in chronic pancreatitis. The performance of the method was evaluated using area under the Receiver Operating Characteristic curve and F1 score, as well as the ability to retrieve known and unknown rare SNVs. The PDTA was found to be effective to varying degrees in the datasets that were evaluated, creating parsimonious genetic representations for patient-specific groups, with the potential to discover novel variants
Microiontophoresis as a technique to investigate Spike Timing Dependent Plasticity
Spike timing dependent plasticity (STDP) is a form of synaptic plasticity that depends on the relative time of activation of a presynaptic neuron and its postsynaptic neuron. STDP in the synapses made by Schaffer collateral afferents onto hippocampal CA1 pyramidal neurons (CA3-CA1 synapses) is NMDA receptor dependent. The objective of the current study was to develop and test a technique of glutamate iontophoresis that could replace the role of presynaptic neurotransmitter release at the CA3-CA1 synapse, so that the postsynaptic mechanisms involved in the induction of STDP could be isolated for study. Therefore, this document describes: (1) fabrication of electrodes that could be used for millisecond-level microiontophoresis in acute slice preparations of the juvenile rat hippocampus; (2) characterization of the properties and limitations of microiontophoresis in slice tissue, specifically for activation of postsynaptic ionotropic glutamate receptors at the CA3-CA1 synapse; (3) induction of STDP by pairing microiontophoresis with postsynaptic depolarization; (4) characterization of the properties and limitations of microiontophoretically induced STDP. It was determined that microiontophoresis is a viable technique to study the postsynaptic mechanisms of STDP at the CA3-CA1 synapse. My results also show that microiontophoretically induced STDP exhibits many of the same general properties as STDP induced either synaptically or by exogenously applied agonists. Microiontophoretically induced STDP also exhibits other features that will need to be considered during the design and interpretation of further experiments
Inferring causal molecular networks: empirical assessment through a community-based effort
Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks
Creating a pipeline of talent for informatics: STEM initiative for high school students in computer science, biology, and biomedical informatics
This editorial provides insights into how informatics can attract highly trained students by involving them in science, technology, engineering, and math (STEM) training at the high school level and continuing to provide mentorship and research opportunities through the formative years of their education. Our central premise is that the trajectory necessary to be expert in the emergent fields in front of them requires acceleration at an early time point. Both pathology (and biomedical) informatics are new disciplines which would benefit from involvement by students at an early stage of their education. In 2009, Michael T Lotze MD, Kirsten Livesey (then a medical student, now a medical resident at University of Pittsburgh Medical Center (UPMC)), Richard Hersheberger, PhD (Currently, Dean at Roswell Park), and Megan Seippel, MS (the administrator) launched the University of Pittsburgh Cancer Institute (UPCI) Summer Academy to bring high school students for an 8 week summer academy focused on Cancer Biology. Initially, pathology and biomedical informatics were involved only in the classroom component of the UPCI Summer Academy. In 2011, due to popular interest, an informatics track called Computer Science, Biology and Biomedical Informatics (CoSBBI) was launched. CoSBBI currently acts as a feeder program for the undergraduate degree program in bioinformatics at the University of Pittsburgh, which is a joint degree offered by the Departments of Biology and Computer Science. We believe training in bioinformatics is the best foundation for students interested in future careers in pathology informatics or biomedical informatics. We describe our approach to the recruitment, training and research mentoring of high school students to create a pipeline of exceptionally well-trained applicants for both the disciplines of pathology informatics and biomedical informatics. We emphasize here how mentoring of high school students in pathology informatics and biomedical informatics will be critical to assuring their success as leaders in the era of big data and personalized medicine
Parameter Discovery For Stochastic Computational Models In Systems Biology Using Bayesian Model Checking
Parameterized probabilistic complex computational (P2C2) models are being increasingly used in computational systems biology for analyzing biological systems. A key challenge is to build mechanistic P2C2 models by combining prior knowledge and empirical data, given that certain system properties are unknown. These unknown components are incorporated into a model as parameters and determining their values has traditionally been a process of trial and error. We present a new algorithmic procedure for discovering parameters in agent-based models of biological systems against behavioral specifications mined from large data-sets. Our approach uses Bayesian model checking, sequential hypothesis testing, and stochastic optimization to synthesize parameters of P2C2 models. We demonstrate our algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide in a clinical agent-based model of the dynamics of acute inflammation that guarantee a set of desired clinical outcomes with high probability
Parameter discovery for stochastic computational models in systems biology using Bayesian model checking
Parameterized probabilistic complex computational (P2C2) models are being increasingly used in computational systems biology for analyzing biological systems. A key challenge is to build mechanistic P2C2 models by combining prior knowledge and empirical data, given that certain system properties are unknown. These unknown components are incorporated into a model as parameters and determining their values has traditionally been a process of trial and error. We present a new algorithmic procedure for discovering parameters in agent-based models of biological systems against behavioral specifications mined from large data-sets. Our approach uses Bayesian model checking, sequential hypothesis testing, and stochastic optimization to synthesize parameters of P2C2 models. We demonstrate our algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide in a clinical agent-based model of the dynamics of acute inflammation that guarantee a set of desired clinical outcomes with high probability
Automated Parameter Estimation For Biological Models Using Bayesian Statistical Model Checking
Background: Probabilistic models have gained widespread acceptance in the systems biology community as a useful way to represent complex biological systems. Such models are developed using existing knowledge of the structure and dynamics of the system, experimental observations, and inferences drawn from statistical analysis of empirical data. A key bottleneck in building such models is that some system variables cannot be measured experimentally. These variables are incorporated into the model as numerical parameters. Determining values of these parameters that justify existing experiments and provide reliable predictions when model simulations are performed is a key research problem. Domain experts usually estimate the values of these parameters by fitting the model to experimental data. Model fitting is usually expressed as an optimization problem that requires minimizing a cost-function which measures some notion of distance between the model and the data. This optimization problem is often solved by combining local and global search methods that tend to perform well for the specific application domain. When some prior information about parameters is available, methods such as Bayesian inference are commonly used for parameter learning. Choosing the appropriate parameter search technique requires detailed domain knowledge and insight into the underlying system. Results: Using an agent-based model of the dynamics of acute inflammation, we demonstrate a novel parameter estimation algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide that guarantee a set of observed clinical outcomes with high probability. We synthesized values of twenty-eight unknown parameters such that the parameterized model instantiated with these parameter values satisfies four specifications describing the dynamic behavior of the model. Conclusions: We have developed a new algorithmic technique for discovering parameters in complex stochastic models of biological systems given behavioral specifications written in a formal mathematical logic. Our algorithm uses Bayesian model checking, sequential hypothesis testing, and stochastic optimization to automatically synthesize parameters of probabilistic biological models
How can we improve Science, Technology, Engineering, and Math education to encourage careers in Biomedical and Pathology Informatics?
The Computer Science, Biology, and Biomedical Informatics (CoSBBI) program was initiated in 2011 to expose the critical role of informatics in biomedicine to talented high school students.[1] By involving them in Science, Technology, Engineering, and Math (STEM) training at the high school level and providing mentorship and research opportunities throughout the formative years of their education, CoSBBI creates a research infrastructure designed to develop young informaticians. Our central premise is that the trajectory necessary to be an expert in the emerging fields of biomedical informatics and pathology informatics requires accelerated learning at an early age.In our 4th year of CoSBBI as a part of the University of Pittsburgh Cancer Institute (UPCI) Academy (http://www.upci.upmc.edu/summeracademy/), and our 2nd year of CoSBBI as an independent informatics-based academy, we enhanced our classroom curriculum, added hands-on computer science instruction, and expanded research projects to include clinical informatics. We also conducted a qualitative evaluation of the program to identify areas that need improvement in order to achieve our goal of creating a pipeline of exceptionally well-trained applicants for both the disciplines of pathology informatics and biomedical informatics in the era of big data and personalized medicine