11 research outputs found

    Is the most likely model likely to be the correct model?

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 89-93).In this work, I test the hypothesis that the 2-dimensional dependencies of a deterministic model can be correctly recovered via hypothesis-enumeration and Bayesian selection for a linear sequence, and what the degree of 'ignorance' or 'uncertainty' is that Bayesian selection can tolerate concerning the properties of the model and data. The experiment tests the data created by a number of rules of size 3 and compares the implied dependency map to the (correct) dependencies of the various generating rules, then extends it to a composition of 2 rules of total size 5. I found that 'causal' belief networks do not map directly to the dependencies of actual causal structures. For deterministic rules satisfying the condition of multiple involvement (two tails), the correct model is not likely to be retrieved without augmenting the model selection with a prior high enough to suggest that the desired dependency model is already known - simply restricting the class of models to trees, and placing other restrictions (such as ordering) is not sufficient. Second, the identified-model to correct-model map is not 1 to 1 - in the rule cases where the correct model is identified, the identified model could just as easily have been produced by a different rule. Third, I discovered that uncertainty concerning identification of observations directly resulted in the loss of existing information and made model selection the product of pure chance (such as the last observation). How to read and identify observations had to be agreed upon a-priori by both the rule and the learner to have any consistency in model identification.(cont.) Finally, I discovered that it is not the rule-observations that discriminate between models, but rather the noise, or uncaptured observations that govern the identified model. In analysis, I found that in enumeration of hypotheses (as dependency graphs) the differentiating space is very small. With representations of conditional independence, the equivalent factorizations of the graphs make the differentiating space even smaller. As Bayesian model identification relies on convergence to the differentiating space, if those spaces are diminishing in size (if the model size is allowed to grow) relative to the observation sequence, then maximizing the likelihood of a particular hypothesis may fail to converge on the correct one. Overall I found that if a learning mechanism either does not know how to read observations or know the dependencies he is looking for a-priori, then it is not likely to identify them probabilistically. Finally, I also confirmed existing results - that model selection always prefers increasingly connected models over independent models was confirmed, as was the knowledge that several conditional-independence graphs have equivalent factorizations. Finally Shannon's Asymptotic Equipartition Property was confirmed to apply both for novel observations and for an increasing model/parameter space size. These results are applicable to a number of domains: natural language processing and language induction by statistical means, bioinformatics and statistical identification and merging of ontologies, and induction of real-world causal dependencies.by Beracah Yankama.S.M

    OREMP: Ontology Reasoning Engine for Molecular Pathways

    Get PDF
    The information about molecular processes is shared continuously in the form of runnable pathway collections, and biomedical ontologies provide a semantic context to the majority of those pathways. Recent advances in both fields pave the way for a scalable information integration based on aggregate knowledge repositories, but the lack of overall standard formats impedes this progress. Here we propose a strategy that integrates these resources by means of extended ontologies built on top of a common meta-format. Information sharing, integration and discovery are the primary features provided by the system; additionally, two current field applications of the system are reported

    In Silico Modeling of Shear-Stress-Induced Nitric Oxide Production in Endothelial Cells through Systems Biology

    Get PDF
    Nitric oxide (NO) produced by vascular endothelial cells is a potent vasodilator and an antiinflammatory mediator. Regulating production of endothelial-derived NO is a complex undertaking, involving multiple signaling and genetic pathways that are activated by diverse humoral and biomechanical stimuli. To gain a thorough understanding of the rich diversity of responses observed experimentally, it is necessary to account for an ensemble of these pathways acting simultaneously. In this article, we have assembled four quantitative molecular pathways previously proposed for shear-stress-induced NO production. In these pathways, endothelial NO synthase is activated 1), via calcium release, 2), via phosphorylation reactions, and 3), via enhanced protein expression. To these activation pathways, we have added a fourth, a pathway describing actual NO production from endothelial NO synthase and its various protein partners. These pathways were combined and simulated using CytoSolve, a computational environment for combining independent pathway calculations. The integrated model is able to describe the experimentally observed change in NO production with time after the application of fluid shear stress. This model can also be used to predict the specific effects on the system after interventional pharmacological or genetic changes. Importantly, this model reflects the up-to-date understanding of the NO system, providing a platform upon which information can be aggregated in an additive way.National Institutes of Health (U.S.) (Grant R01HL090856)Singapore-MIT Alliance Computational and Systems Biology Progra

    Colorless green ideas do sleep furiously: gradient acceptability and the nature of the grammar

    No full text
    In their recent paper, Lau, Clark, and Lappin explore the idea that the probability of the occurrence of word strings can form the basis of an adequate theory of grammar (Lau, Jey H., Alexander Clark & 15 Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A prob- abilistic view of linguistic knowledge. Cognitive Science 41(5):1201-1241). To make their case, they present the results of correlating the output of several probabilistic models trained solely on naturally occurring sentences with the gradient acceptability judgments that humans report for ungrammatical sentences derived from roundtrip machine translation errors. In this paper, we first explore the logic of the Lau et al. argument, both in terms of the choice of evaluation metric (gradient acceptability), and in the choice of test data set (machine translation errors on random sentences from a corpus). We then present our own series of studies intended to allow for a better comparison between LCL's models and existing grammatical theories. We evaluate two of LCL's probabilistic models (trigrams and recurrent neural network) against three data sets (taken from journal articles, a textbook, and Chomsky's famous colorless-green-ideas sentence), using three evaluation metrics (LCL's gradience metric, a categorical version of the metric, and the experimental-logic metric used in the syntax literature). Our results suggest there are very real, measurable cost-benefit tradeoffs inherent in LCL's models across the three evaluation metrics. The gain in explanation of gradience (between 13% and 31% of gradience) is offset by losses in the other two metrics: a 43%-49% loss in coverage based on a categorical metric of explaining acceptability, and a loss of 12%-35% in explaining experimentally-defined phenomena. This suggests that anyone wishing to pursue LCL's models as competitors with existing syntactic theories must either be satisfied with this tradeoff, or modify the models to capture the phenomena that are not currently captured

    Colorless green ideas do sleep furiously: gradient acceptability and the nature of the grammar

    No full text
    In their recent paper, Lau, Clark, and Lappin explore the idea that the probability of the occurrence of word strings can form the basis of an adequate theory of grammar (Lau, Jey H., Alexander Clark & 15 Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A prob-abilistic view of linguistic knowledge. Cognitive Science 41(5): 1201-1241). To make their case, they present the results of correlating the output of several probabilistic models trained solely on naturally occurring sentences with the gradient acceptability judgments that humans report for ungrammatical sentences derived from roundtrip machine translation errors. In this paper, we first explore the logic of the Lau et al. argument, both in terms of the choice of evaluation metric (gradient acceptability), and in the choice of test data set (machine translation errors on random sentences from a corpus). We then present our own series of studies intended to allow for a better comparison between LCL's models and existing grammatical theories. We evaluate two of LCL's probabilistic models (trigrams and recurrent neural network) against three data sets (taken from journal articles, a textbook, and Chomsky's famous colorless-green-ideas sentence), using three evaluation metrics (LCL's gradience metric, a categorical version of the metric, and the experimental-logic metric used in the syntax literature). Our results suggest there are very real, measurable cost-benefit tradeoffs inherent in LCL's models across the three evaluation metrics. The gain in explanation of gradience (between 13% and 31% of gradience) is offset by losses in the other two metrics: a 43%-49% loss in coverage based on a categorical metric of explaining acceptability, and a loss of 12%-35% in explaining experimentally-defined phenomena. This suggests that anyone wishing to pursue LCL's models as competitors with existing syntactic theories must either be satisfied with this tradeoff, or modify the models to capture the phenomena that are not currently captured.National Science Foundation [BCS-1347115]12 month embargo; published online: 25 September 2018This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Get out but don’t fall down: verb-particle constructions in child language

    No full text
    Much has been discussed about the challenges posed by Multiword Expressions (MWEs) given their idiosyncratic, flexible and heterogeneous nature. Nonetheless, children successfully learn to use them and eventually acquire a number of Multiword Expressions comparable to that of simplex words. In this paper we report a wide-coverage investigation of a particular type of MWE: verb-particle constructions (VPCs) in English and their usage in child-produced and child-directed sentences. Given their potentially higher complexity in relation to simplex verbs, we examine whether they appear less prominently in child-produced than in childdirected speech, and whether the VPCs that children produce are more conservative than adults, displaying proportionally reduced lexical repertoire of VPCs or of verbs in these combinations. The results obtained indicate that regardless of any additional complexity VPCs feature widely in children data following closely adult usage. Studies like these can inform the development of computational models for language acquisition.

    Multiscale Mathematical Modeling to Support Drug Development

    No full text

    In Silico Modeling of Shear-Stress-Induced Nitric Oxide Production in Endothelial Cells through Systems Biology

    Get PDF
    AbstractNitric oxide (NO) produced by vascular endothelial cells is a potent vasodilator and an antiinflammatory mediator. Regulating production of endothelial-derived NO is a complex undertaking, involving multiple signaling and genetic pathways that are activated by diverse humoral and biomechanical stimuli. To gain a thorough understanding of the rich diversity of responses observed experimentally, it is necessary to account for an ensemble of these pathways acting simultaneously. In this article, we have assembled four quantitative molecular pathways previously proposed for shear-stress-induced NO production. In these pathways, endothelial NO synthase is activated 1), via calcium release, 2), via phosphorylation reactions, and 3), via enhanced protein expression. To these activation pathways, we have added a fourth, a pathway describing actual NO production from endothelial NO synthase and its various protein partners. These pathways were combined and simulated using CytoSolve, a computational environment for combining independent pathway calculations. The integrated model is able to describe the experimentally observed change in NO production with time after the application of fluid shear stress. This model can also be used to predict the specific effects on the system after interventional pharmacological or genetic changes. Importantly, this model reflects the up-to-date understanding of the NO system, providing a platform upon which information can be aggregated in an additive way
    corecore