16,650 research outputs found

    Network-Based Prediction and Analysis of HIV Dependency Factors

    Get PDF
    HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other

    A Primer on Causality in Data Science

    Get PDF
    Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

    Urban characteristics attributable to density-driven tie formation

    Get PDF
    Motivated by empirical evidence on the interplay between geography, population density and societal interaction, we propose a generative process for the evolution of social structure in cities. Our analytical and simulation results predict both super-linear scaling of social tie density and information flow as a function of the population. We demonstrate that our model provides a robust and accurate fit for the dependency of city characteristics with city size, ranging from individual-level dyadic interactions (number of acquaintances, volume of communication) to population-level variables (contagious disease rates, patenting activity, economic productivity and crime) without the need to appeal to modularity, specialization, or hierarchy.Comment: Early version of this paper was presented in NetSci 2012 as a contributed talk in June 2012. An improved version of this paper is published in Nature Communications in June 2013. It has 14 pages and 5 figure

    Quantitative modelling of the human–Earth System a new kind of science?

    No full text
    The five grand challenges set out for Earth System Science by the International Council for Science in 2010 require a true fusion of social science, economics and natural science—a fusion that has not yet been achieved. In this paper we propose that constructing quantitative models of the dynamics of the human–Earth system can serve as a catalyst for this fusion. We confront well-known objections to modelling societal dynamics by drawing lessons from the development of natural science over the last four centuries and applying them to social and economic science. First, we pose three questions that require real integration of the three fields of science. They concern the coupling of physical planetary boundaries via social processes; the extension of the concept of planetary boundaries to the human–Earth System; and the possibly self-defeating nature of the United Nation’s Millennium Development Goals. Second, we ask whether there are regularities or ‘attractors’ in the human–Earth System analogous to those that prompted the search for laws of nature. We nominate some candidates and discuss why we should observe them given that human actors with foresight and intentionality play a fundamental role in the human–Earth System. We conclude that, at sufficiently large time and space scales, social processes are predictable in some sense. Third, we canvass some essential mathematical techniques that this research fusion must incorporate, and we ask what kind of data would be needed to validate or falsify our models. Finally, we briefly review the state of the art in quantitative modelling of the human–Earth System today and highlight a gap between so-called integrated assessment models applied at regional and global scale, which could be filled by a new scale of model
    • …
    corecore