281 research outputs found

    A Simultaneous Extraction of Context and Community from pervasive signals using nested Dirichlet process

    Get PDF
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows a nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated

    Validating an Operational Flood Forecast Model Using Citizen Science in Hampton Roads, VA, USA

    Get PDF
    Changes in the eustatic sea level have enhanced the impact of inundation events in the coastal zone, ranging in significance from tropical storm surges to pervasive nuisance flooding events. The increased frequency of these inundation events has stimulated the production of interactive web-map tracking tools to cope with changes in our changing coastal environment. Tidewatch Maps, developed by the Virginia Institute of Marine Science (VIMS), is an effective example of an emerging street-level inundation mapping tool. Leveraging the Semi-implicit Cross-scale Hydro-science Integrated System Model (SCHISM) as the engine, Tidewatch operationally disseminates 36-h inundation forecast maps with a 12-h update frequency. SCHISM’s storm tide forecasts provide surge guidance for the legacy VIMS Tidewatch Charts sensor-based tidal prediction platform, while simultaneously providing an interactive and operationally functional forecast mapping tool with hourly temporal resolution and a 5 m spatial resolution throughout the coastal plain of Virginia, USA. This manuscript delves into the hydrodynamic modeling and geospatial methods used at VIMS to automate the 36-h street-level flood forecasts currently available via Tidewatch Maps, and the paradigm-altering efforts involved in validating the spatial, vertical, and temporal accuracy of the model. Supplementary material: Catch the King Tide GPS data points were collected by volunteers to effectively breadcrumb their path tracing the tidal high water contour lines by pressing the \u27Save Data\u27 button in the free Sea Level Rise Mobile App every few steps along the water\u27s edge during the high tide on the morning of November 5th, 2017. https://doi.org/10.25773/276h-2b4

    Robust and Optimal Methods for Geometric Sensor Data Alignment

    Get PDF
    Geometric sensor data alignment - the problem of finding the rigid transformation that correctly aligns two sets of sensor data without prior knowledge of how the data correspond - is a fundamental task in computer vision and robotics. It is inconvenient then that outliers and non-convexity are inherent to the problem and present significant challenges for alignment algorithms. Outliers are highly prevalent in sets of sensor data, particularly when the sets overlap incompletely. Despite this, many alignment objective functions are not robust to outliers, leading to erroneous alignments. In addition, alignment problems are highly non-convex, a property arising from the objective function and the transformation. While finding a local optimum may not be difficult, finding the global optimum is a hard optimisation problem. These key challenges have not been fully and jointly resolved in the existing literature, and so there is a need for robust and optimal solutions to alignment problems. Hence the objective of this thesis is to develop tractable algorithms for geometric sensor data alignment that are robust to outliers and not susceptible to spurious local optima. This thesis makes several significant contributions to the geometric alignment literature, founded on new insights into robust alignment and the geometry of transformations. Firstly, a novel discriminative sensor data representation is proposed that has better viewpoint invariance than generative models and is time and memory efficient without sacrificing model fidelity. Secondly, a novel local optimisation algorithm is developed for nD-nD geometric alignment under a robust distance measure. It manifests a wider region of convergence and a greater robustness to outliers and sampling artefacts than other local optimisation algorithms. Thirdly, the first optimal solution for 3D-3D geometric alignment with an inherently robust objective function is proposed. It outperforms other geometric alignment algorithms on challenging datasets due to its guaranteed optimality and outlier robustness, and has an efficient parallel implementation. Fourthly, the first optimal solution for 2D-3D geometric alignment with an inherently robust objective function is proposed. It outperforms existing approaches on challenging datasets, reliably finding the global optimum, and has an efficient parallel implementation. Finally, another optimal solution is developed for 2D-3D geometric alignment, using a robust surface alignment measure. Ultimately, robust and optimal methods, such as those in this thesis, are necessary to reliably find accurate solutions to geometric sensor data alignment problems

    Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data

    Get PDF
    How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend. This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications

    Automatically Characterizing Product and Process Incentives in Collective Intelligence

    Get PDF
    Social media facilitate interaction and information dissemination among an unprecedented number of participants. Why do users contribute, and why do they contribute to a specific venue? Does the information they receive cover all relevant points of view, or is it biased? The substantial and increasing importance of online communication makes these questions more pressing, but also puts answers within reach of automated methods. I investigate scalable algorithms for understanding two classes of incentives which arise in collective intelligence processes. Product incentives exist when contributors have a stake in the information delivered to other users. I investigate product-relevant user behavior changes, algorithms for characterizing the topics and points of view presented in peer-produced content, and the results of a field experiment with a prediction market framework having associated product incentives. Process incentives exist when users find contributing to be intrinsically rewarding. Algorithms which are aware of process incentives predict the effect of feedback on where users will make contributions, and can learn about the structure of a conversation by observing when users choose to participate in it. Learning from large-scale social interactions allows us to monitor the quality of information and the health of venues, but also provides fresh insights into human behavior

    Modelling individual accessibility using Bayesian networks: A capabilities approach

    Get PDF
    The ability of an individual to reach and engage with basic services such as healthcare, education and activities such as employment is a fundamental aspect of their wellbeing. Within transport studies, accessibility is considered to be a valuable concept that can be used to generate insights on issues related to social exclusion due to limited access to transport options. Recently, researchers have attempted to link accessibility with popular theories of social justice such as Amartya Sen's Capabilities Approach (CA). Such studies have set the theoretical foundations on the way accessibility can be expressed through the CA, however, attempts to operationalise this approach remain fragmented and predominantly qualitative in nature. The data landscape however, has changed over the last decade providing an unprecedented quantity of transport related data at an individual level. Mobility data from dfferent sources have the potential to contribute to the understanding of individual accessibility and its relation to phenomena such as social exclusion. At the same time, the unlabelled nature of such data present a considerable challenge, as a non-trivial step of inference is required if one is to deduce the transportation modes used and activities reached. This thesis develops a novel framework for accessibility modelling using the CA as theoretical foundation. Within the scope of this thesis, this is used to assess the levels of equality experienced by individuals belonging to different population groups and its link to transport related social exclusion. In the proposed approach, activities reached and transportation modes used are considered manifestations of individual hidden capabilities. A modelling framework using dynamic Bayesian networks is developed to quantify and assess the relationships and dynamics of the different components in fluencing the capabilities sets. The developed approach can also provide inferential capabilities for activity type and transportation mode detection, making it suitable for use with unlabelled mobility data such as Automatic Fare Collection Systems (AFC), mobile phone and social media. The usefulness of the proposed framework is demonstrated through three case studies. In the first case study, mobile phone data were used to explore the interaction of individuals with different public transportation modes. It was found that assumptions about individual mobility preferences derived from travel surveys may not always hold, providing evidence for the significance of personal characteristics to the choices of transportation modes. In the second case, the proposed framework is used for activity type inference, testing the limits of accuracy that can be achieved from unlabelled social media data. A combination of the previous case studies, the third case further defines a generative model which is used to develop the proposed capabilities approach to accessibility model. Using data from London's Automatic Fare Collection Systems (AFC) system, the elements of the capabilities set are explicitly de ned and linked with an individual's personal characteristics, external variables and functionings. The results are used to explore the link between social exclusion and transport disadvantage, revealing distinct patterns that can be attributed to different accessibility levels

    From models to data: understanding biodiversity patterns from environmental DNA data

    Get PDF
    La distribution de l'abondance des espèces en un site, et la similarité de la composition taxonomique d'un site à l'autre, sont deux mesures de la biodiversité ayant servi de longue date de base empirique aux écologues pour tenter d'établir les règles générales gouvernant l'assemblage des communautés d'organismes. Pour ce type de mesures intégratives, le séquençage haut-débit d'ADN prélevé dans l'environnement (" ADN environnemental ") représente une alternative récente et prometteuse aux observations naturalistes traditionnelles. Cette approche présente l'avantage d'être rapide et standardisée, et donne accès à un large éventail de taxons microbiens jusqu'alors indétectables. Toutefois, ces jeux de données de grande taille à la structure complexe sont difficiles à analyser, et le caractère indirect des observations complique leur interprétation. Le premier objectif de cette thèse est d'identifier les modèles statistiques permettant d'exploiter ce nouveau type de données afin de mieux comprendre l'assemblage des communautés. Le deuxième objectif est de tester les approches retenues sur des données de biodiversité du sol en forêt amazonienne, collectées en Guyane française. Deux grands types de processus sont invoqués pour expliquer l'assemblage des communautés d'organismes : les processus "neutres", indépendants de l'espèce considérée, que sont la naissance, la mort et la dispersion des organismes, et les processus liés à la niche écologique occupée par les organismes, c'est-à-dire les interactions avec l'environnement et entre organismes. Démêler l'importance relative de ces deux types de processus dans l'assemblage des communautés est une question fondamentale en écologie ayant de nombreuses implications, notamment pour l'estimation de la biodiversité et la conservation. Le premier chapitre aborde cette question à travers la comparaison d'échantillons d'ADN environnemental prélevés dans le sol de diverses parcelles forestières en Guyane française, via les outils classiques d'analyse statistique en écologie des communautés. Le deuxième chapitre se concentre sur les processus neutres d'assemblages des communautés. S.P. Hubbell a proposé en 2001 un modèle décrivant ces processus de façon probabiliste, et pouvant être utilisé pour quantifier la capacité de dispersion des organismes ainsi que leur diversité à l'échelle régionale simplement à partir de la distribution d'abondance des espèces observée en un site. Dans ce chapitre, les biais liés à l'utilisation de l'ADN environnemental pour reconstituer la distribution d'abondance des espèces sont discutés, et sont quantifiés au regard de l'estimation des paramètres de dispersion et de diversité régionale. Le troisième chapitre se concentre sur la manière dont les différences non-aléatoires de composition taxonomique entre sites échantillonnés, résultant des divers processus d'assemblage des communautés, peuvent être détectées, représentées et interprétés. Un modèle statistique conçu à l'origine pour classifier les documents à partir des thèmes qu'ils abordent est ici appliqué à des échantillons de sol prélevés selon une grille régulière au sein d'une grande parcelle forestière. La structure spatiale de la composition taxonomique des microorganismes est caractérisée avec succès et reliée aux variations fines des conditions environnementales au sein de la parcelle. Les implications des résultats de la thèse sont enfin discutées. L'accent est mis en particulier sur le potentiel des modèles thématique (" topic models ") pour la modélisation des données de biodiversité issues de l'ADN environnemental.Integrative patterns of biodiversity, such as the distribution of taxa abundances and the spatial turnover of taxonomic composition, have been under scrutiny from ecologists for a long time, as they offer insight into the general rules governing the assembly of organisms into ecological communities. Thank to recent progress in high-throughput DNA sequencing, these patterns can now be measured in a fast and standardized fashion through the sequencing of DNA sampled from the environment (e.g. soil or water), instead of relying on tedious fieldwork and rare naturalist expertise. They can also be measured for the whole tree of life, including the vast and previously unexplored diversity of microorganisms. Taking full advantage of this new type of data is challenging however: DNA-based surveys are indirect, and suffer as such from many potential biases; they also produce large and complex datasets compared to classical censuses. The first goal of this thesis is to investigate how statistical tools and models classically used in ecology or coming from other fields can be adapted to DNA-based data so as to better understand the assembly of ecological communities. The second goal is to apply these approaches to soil DNA data from the Amazonian forest, the Earth's most diverse land ecosystem. Two broad types of mechanisms are classically invoked to explain the assembly of ecological communities: 'neutral' processes, i.e. the random birth, death and dispersal of organisms, and 'niche' processes, i.e. the interaction of the organisms with their environment and with each other according to their phenotype. Disentangling the relative importance of these two types of mechanisms in shaping taxonomic composition is a key ecological question, with many implications from estimating global diversity to conservation issues. In the first chapter, this question is addressed across the tree of life by applying the classical analytic tools of community ecology to soil DNA samples collected from various forest plots in French Guiana. The second chapter focuses on the neutral aspect of community assembly. A mathematical model incorporating the key elements of neutral community assembly has been proposed by S.P. Hubbell in 2001, making it possible to infer quantitative measures of dispersal and of regional diversity from the local distribution of taxa abundances. In this chapter, the biases introduced when reconstructing the taxa abundance distribution from environmental DNA data are discussed, and their impact on the estimation of the dispersal and regional diversity parameters is quantified. The third chapter focuses on how non-random differences in taxonomic composition across a group of samples, resulting from various community assembly processes, can be efficiently detected, represented and interpreted. A method originally designed to model the different topics emerging from a set of text documents is applied here to soil DNA data sampled along a grid over a large forest plot in French Guiana. Spatial patterns of soil microorganism diversity are successfully captured, and related to fine variations in environmental conditions across the plot. Finally, the implications of the thesis findings are discussed. In particular, the potential of topic modelling for the modelling of DNA-based biodiversity data is stressed

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
    • …
    corecore