61 research outputs found

    Statistical integration of information

    Get PDF
    Modern data analysis frequently involves multiple large and diverse data sets generated from current high-throughput technologies. An integrative analysis of these sources of information is very promising for improving knowledge discovery in various fields. This dissertation focuses on three distinct challenges in the integration of information. The variables obtained from diverse and novel platforms often have highly non-Gaussian marginal distributions and therefore are challenging to analyze by commonly used methods. The first part introduces an automatic transformation for improving data quality before integrating multiple data sources. For each variable, a new family of parametrizations of the shifted logarithm transformation is proposed, which allows transformation for both left and right skewness within the single family and an automatic selection of the parameter value. The second part discusses an integrative analysis of disparate data blocks measured on a common set of experimental subjects. This data integration naturally motivates the simultaneous exploration of the joint and individual variation within each data block resulting in new insights. We introduce Non-iterative Joint and Individual Variation Explained (Non-iterative JIVE), capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of both a new conceptual understanding and a fast linear algebra computation. An important mathematical contribution is the use of score subspaces as the principal descriptors of variation structure and the use of perturbation theory as the guide for variation segmentation. Furthermore, this makes our method robust against the heterogeneity among data blocks, without a need for normalization. The last part proposes a Generalized Fiducial Inference inspired method for finding a robust consensus among several independently derived confidence distributions (CDs) for a quantity of interest. The resulting fused CD is robust to the existence of potentially discrepant CDs in the collection. The method uses computationally efficient fiducial model averaging to obtain a robust consensus distribution without the need to eliminate discrepant CDs from the analysis.Doctor of Philosoph

    Characterising and modeling the co-evolution of transportation networks and territories

    Full text link
    The identification of structuring effects of transportation infrastructure on territorial dynamics remains an open research problem. This issue is one of the aspects of approaches on complexity of territorial dynamics, within which territories and networks would be co-evolving. The aim of this thesis is to challenge this view on interactions between networks and territories, both at the conceptual and empirical level, by integrating them in simulation models of territorial systems.Comment: Doctoral dissertation (2017), Universit\'e Paris 7 Denis Diderot. Translated from French. Several papers compose this PhD thesis; overlap with: arXiv:{1605.08888, 1608.00840, 1608.05266, 1612.08504, 1706.07467, 1706.09244, 1708.06743, 1709.08684, 1712.00805, 1803.11457, 1804.09416, 1804.09430, 1805.05195, 1808.07282, 1809.00861, 1811.04270, 1812.01473, 1812.06008, 1908.02034, 2012.13367, 2102.13501, 2106.11996

    The structure and dynamics of multilayer networks

    Get PDF
    In the past years, network theory has successfully characterized the interaction among the constituents of a variety of complex systems, ranging from biological to technological, and social systems. However, up until recently, attention was almost exclusively given to networks in which all components were treated on equivalent footing, while neglecting all the extra information about the temporal- or context-related properties of the interactions under study. Only in the last years, taking advantage of the enhanced resolution in real data sets, network scientists have directed their interest to the multiplex character of real-world systems, and explicitly considered the time-varying and multilayer nature of networks. We offer here a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.Comment: In Press, Accepted Manuscript, Physics Reports 201

    Evaluating Privacy-Friendly Mobility Analytics on Aggregate Location Data

    Get PDF
    Information about people's movements and the locations they visit enables a wide number of mobility analytics applications, e.g., real-time traffic maps or urban planning, aiming to improve quality of life in modern smart-cities. Alas, the availability of users' fine-grained location data reveals sensitive information about them such as home and work places, lifestyles, political or religious inclinations. In an attempt to mitigate this, aggregation is often employed as a strategy that allows analytics and machine learning tasks while protecting the privacy of individual users' location traces. In this thesis, we perform an end-to-end evaluation of crowdsourced privacy-friendly location aggregation aiming to understand its usefulness for analytics as well as its privacy implications towards users who contribute their data. First, we present a time-series methodology which, along with privacy-friendly crowdsourcing of aggregate locations, supports mobility analytics such as traffic forecasting and mobility anomaly detection. Next, we design quantification frameworks and methodologies that let us reason about the privacy loss stemming from the collection or release of aggregate location information against knowledgeable adversaries that aim to infer users' profiles, locations, or membership. We then utilize these frameworks to evaluate defenses ranging from generalization and hiding, to differential privacy, which can be employed to prevent inferences on aggregate location statistics, in terms of privacy protection as well as utility loss towards analytics tasks. Our results highlight that, while location aggregation is useful for mobility analytics, it is a weak privacy protection mechanism in this setting and that additional defenses can only protect privacy if some statistical utility is sacrificed. Overall, the tools presented in this thesis can be used by providers who desire to assess the quality of privacy protection before data release and its results have several implications about current location data practices and applications

    STATISTICAL LEARNING OF INTEGRATIVE ANALYSIS

    Get PDF
    Integrative analysis is of great interest in modern scientific research. This dissertation mainly focuses on developing new statistical methods for integrative analysis. We first discuss a clustering analysis of a microbiome dataset in combination with phylogenetic information. Discovering disease related pneumotypes of the infected lower lung is difficult because the lower lung typically has few species of microbes and there is a low level of overlap from patient- to-patient, which makes it hard to calculate reliable distances between patients. We address this challenge by incorporating information from phylogenetic relationships, which results in improved clustering. When applied to an existing dataset, the method produces statistically distinct, easily described pneumotypes, which are better than those from standard approaches. In the second part, we discuss an integrative analysis of disparate data blocks measured on a common set of experimental subjects. We introduce Angle-Based Joint and Individual Variation Explained (AJIVE) capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of a new conceptual understanding, much better adaption to data heterogeneity and a fast linear algebra computation. Detailed comparison between AJIVE and competitors is discussed using a particular optimization view point. In the third part, we introduce a new perturbation framework, which estimates the angle between an arbitrary given direction and the underlying signal spaces. We also propose an efficient data-driven bootstrap procedure to compute this angle. While the Wedin bound in the AJIVE is “subspace oriented” and uniform for both row space and column space, this angle is “direction oriented” and specially adaptive to give improved inference in the row space.Doctor of Philosoph

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Spatiotemporal enabled Content-based Image Retrieval

    Full text link
    corecore