5,937 research outputs found

    Higher-order co-occurrences for exploratory point pattern analysis and decision tree clustering on spatial data

    Get PDF
    Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd

    Spatially clustered associations in health GIS

    Get PDF
    Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources and Web services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial mashups to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and correlation of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. Spatial entropy index HSu for the ScankOO analysis of the hypothetical dataset using a vicinity which is fixed by the number of points without distinction between their labels. (The size of the labels is proportional to the inverse of the index) In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Recommended System for Optimizing Battery Energy Management with Floating Car Data

    Get PDF
    Atualmente, os veículos pesados que transportam mercadoria sensível à temperatura utilizam sistemas de refrigeração ruidosos e com elevado consumo de combustível. Para combater estas desvantagens, está a ser instalado um sistema capaz de recuperar e produzir energia elétrica durante as travagens e a partir de painéis fotovoltaicos. Esta energia é armazenada num conjunto de baterias para, posteriormente, alimentar o sistema frigorífico em modo elétrico. Adicionalmente, estão a ser recolhidos dados em tempo real sobre o comportamento do veículo e do sistema.Tendo em conta que toda a energia disponível durante a condução está condicionada por diversas variáveis de operação, é fulcral extrair conhecimento a partir da análise dos dados recolhidos, identificando padrões que possam otimizar a produção e gestão da energia preditivamente. Este processo de extração de conhecimento inclui seleção e avaliação dos dados a recolher, construção do modelo preditivo do sistema e estudo da sua aplicação. Assim sendo, num dado momento, tendo em conta não só as métricas recolhidas da viagem atual, mas também de dados históricos de um dado percurso, será possível ao sistema de gestão de energia instalado no camião decidir qual a melhor ação a tomar de forma a otimizar a energia produzida sem causar stress ao sistema.Nowadays, heavy vehicles that transport temperature-sensitive goods, generally use a fuel-needy dedicated diesel engine. Towards solving this problem, an energy management system (EMS) capable of producing energy on-board of the vehicle is being developed. This recovery is possible due to the regenerative braking (RB) functionality, which consists in converting kinetic energy to electrical energy during a slowdown. The recovered energy is then stored in a set of batteries that supplies the refrigeration system when needed, allowing it to run in electrical mode. Using data retrieved from the vehicle's operation and this management system, an opportunity towards intelligently using the regenerative braking functionality emerges. By introducing an intelligence layer on the energy management system, a decision on applying the RB functionality could be made based on the trip's energetic potential. This decision will optimize the battery usage and reduce the load and wear on the EMS components.In order to calculate the energetic potential of a certain route, an estimation of the road is needed. This document presents context information and different approaches towards this end. In the modeling approach recommended and implemented, a route is divided in several spatial segments and each segment is categorized among three pre-defined classes. A classification model is used to predict traffic historical data as input. By using this modeling approach based on travel times, information on traffic flow and intersection queues are incorporated and by calculating the most likely sequence of states, a estimation of the road ahead is made.Using the information of the modeled path, when the RB systems detects a situation where the functionality can be applied, a decision will be made by weighting the energetic potential of the path ahead and the energy need. When the algorithm sees fit, a higher torque may be applied to the generator, which will result in a larger quantity of energy recovered. Since this causes stress to the system, this functionality needs a robust intelligence layer

    Statistical Models for Co-occurrence Data

    Get PDF
    Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms

    Fundamental structures of dynamic social networks

    Get PDF
    Social systems are in a constant state of flux with dynamics spanning from minute-by-minute changes to patterns present on the timescale of years. Accurate models of social dynamics are important for understanding spreading of influence or diseases, formation of friendships, and the productivity of teams. While there has been much progress on understanding complex networks over the past decade, little is known about the regularities governing the micro-dynamics of social networks. Here we explore the dynamic social network of a densely-connected population of approximately 1000 individuals and their interactions in the network of real-world person-to-person proximity measured via Bluetooth, as well as their telecommunication networks, online social media contacts, geo-location, and demographic data. These high-resolution data allow us to observe social groups directly, rendering community detection unnecessary. Starting from 5-minute time slices we uncover dynamic social structures expressed on multiple timescales. On the hourly timescale, we find that gatherings are fluid, with members coming and going, but organized via a stable core of individuals. Each core represents a social context. Cores exhibit a pattern of recurring meetings across weeks and months, each with varying degrees of regularity. Taken together, these findings provide a powerful simplification of the social network, where cores represent fundamental structures expressed with strong temporal and spatial regularity. Using this framework, we explore the complex interplay between social and geospatial behavior, documenting how the formation of cores are preceded by coordination behavior in the communication networks, and demonstrating that social behavior can be predicted with high precision.Comment: Main Manuscript: 16 pages, 4 figures. Supplementary Information: 39 pages, 34 figure

    Knowledge discovery from trajectories

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesAs a newly proliferating study area, knowledge discovery from trajectories has attracted more and more researchers from different background. However, there is, until now, no theoretical framework for researchers gaining a systematic view of the researches going on. The complexity of spatial and temporal information along with their combination is producing numerous spatio-temporal patterns. In addition, it is very probable that a pattern may have different definition and mining methodology for researchers from different background, such as Geographic Information Science, Data Mining, Database, and Computational Geometry. How to systematically define these patterns, so that the whole community can make better use of previous research? This paper is trying to tackle with this challenge by three steps. First, the input trajectory data is classified; second, taxonomy of spatio-temporal patterns is developed from data mining point of view; lastly, the spatio-temporal patterns appeared on the previous publications are discussed and put into the theoretical framework. In this way, researchers can easily find needed methodology to mining specific pattern in this framework; also the algorithms needing to be developed can be identified for further research. Under the guidance of this framework, an application to a real data set from Starkey Project is performed. Two questions are answers by applying data mining algorithms. First is where the elks would like to stay in the whole range, and the second is whether there are corridors among these regions of interest

    Data pre-processing to identify environmental risk factors associated with diabetes

    Get PDF
    Genetics, diet, obesity, and lack of exercise play a major role in the development of type II diabetes. Additionally, environmental conditions are also linked to type II diabetes. The aim of this research is to identify the environmental conditions associated with diabetes. To achieve this, the research study utilises hospital-admitted patient data in NSW integrated with weather, pollution, and demographic data. The environmental variables (air pollution and weather) change over time and space, necessitating spatiotemporal data analysis to identify associations. Moreover, the environmental variables are measured using sensors, and they often contain large gaps of missing values due to sensor failures. Therefore, enhanced methodologies in data cleaning and imputation are needed to facilitate research using this data. Hence, the objectives of this study are twofold: first, to develop a data cleaning and imputation framework with improved methodologies to clean and pre-process the environmental data, and second, to identify environmental conditions associated with diabetes. This study develops a novel data-cleaning framework that streamlines the practice of data analysis and visualisation, specifically for studying environmental factors such as climate change monitoring and the effects of weather and pollution. The framework is designed to efficiently handle data collected by remote sensors, enabling more accurate and comprehensive analyses of environmental phenomena that would otherwise not be possible. The study initially focuses on the Sydney Region, identifies missing data patterns, and utilises established imputation methods. It assesses the performance of existing techniques and finds that Kalman smoothing on structural time series models outperforms other methods. However, when dealing with larger gaps in missing data, none of the existing methods yield satisfactory results. To address this, the study proposes enhanced methodologies for filling substantial gaps in environmental datasets. The first proposed algorithm employs regularized regression models to fill large gaps in air quality data using a univariate approach. It is then extended to incorporate seasonal patterns and expand its applicability to weather data with similar patterns. Furthermore, the algorithm is enhanced by incorporating other correlated variables to accurately fill substantial gaps in environmental variables. Consistently, the algorithm presented in this thesis outperforms other methods in imputing large gaps. This algorithm is applicable for filling large gaps in air pollution and weather data, facilitating downstream analysis