Search CORE

5,937 research outputs found

Higher-order co-occurrences for exploratory point pattern analysis and decision tree clustering on spatial data

Author: Baddeley
Baddeley
Baddeley
Bastin
Bastin
Bel
Bivand
Breiman
D.G. Leibovici
Diggle
Diggle
L. Bastin
Leibovici
Leibovici
Li
Lotwick
M. Jackson
O’Neill
Phipps
Quinlan
Reza
Schabenberger
van Lieshout
Wagner
Publication venue: 'Elsevier BV'
Publication date: 01/03/2011
Field of study

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd

Crossref

Aston Publications Explorer

Spatially clustered associations in health GIS

Author: Anand Suchith
Bastin Lucy
Hobona Gobe
Jackson Michael
Leibovici Didier
Swan Jerry
Publication venue
Publication date: 01/01/2010
Field of study

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources and Web services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial mashups to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and correlation of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. Spatial entropy index HSu for the ScankOO analysis of the hypothetical dataset using a vicinity which is fixed by the number of points without distinction between their labels. (The size of the labels is proportional to the inverse of the index) In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research

Aston Publications Explorer

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

Recommended System for Optimizing Battery Energy Management with Floating Car Data

Author: Leonel Rocha Araujo
Publication venue
Publication date: 14/07/2016
Field of study

Atualmente, os veículos pesados que transportam mercadoria sensível à temperatura utilizam sistemas de refrigeração ruidosos e com elevado consumo de combustível. Para combater estas desvantagens, está a ser instalado um sistema capaz de recuperar e produzir energia elétrica durante as travagens e a partir de painéis fotovoltaicos. Esta energia é armazenada num conjunto de baterias para, posteriormente, alimentar o sistema frigorífico em modo elétrico. Adicionalmente, estão a ser recolhidos dados em tempo real sobre o comportamento do veículo e do sistema.Tendo em conta que toda a energia disponível durante a condução está condicionada por diversas variáveis de operação, é fulcral extrair conhecimento a partir da análise dos dados recolhidos, identificando padrões que possam otimizar a produção e gestão da energia preditivamente. Este processo de extração de conhecimento inclui seleção e avaliação dos dados a recolher, construção do modelo preditivo do sistema e estudo da sua aplicação. Assim sendo, num dado momento, tendo em conta não só as métricas recolhidas da viagem atual, mas também de dados históricos de um dado percurso, será possível ao sistema de gestão de energia instalado no camião decidir qual a melhor ação a tomar de forma a otimizar a energia produzida sem causar stress ao sistema.Nowadays, heavy vehicles that transport temperature-sensitive goods, generally use a fuel-needy dedicated diesel engine. Towards solving this problem, an energy management system (EMS) capable of producing energy on-board of the vehicle is being developed. This recovery is possible due to the regenerative braking (RB) functionality, which consists in converting kinetic energy to electrical energy during a slowdown. The recovered energy is then stored in a set of batteries that supplies the refrigeration system when needed, allowing it to run in electrical mode. Using data retrieved from the vehicle's operation and this management system, an opportunity towards intelligently using the regenerative braking functionality emerges. By introducing an intelligence layer on the energy management system, a decision on applying the RB functionality could be made based on the trip's energetic potential. This decision will optimize the battery usage and reduce the load and wear on the EMS components.In order to calculate the energetic potential of a certain route, an estimation of the road is needed. This document presents context information and different approaches towards this end. In the modeling approach recommended and implemented, a route is divided in several spatial segments and each segment is categorized among three pre-defined classes. A classification model is used to predict traffic historical data as input. By using this modeling approach based on travel times, information on traffic flow and intersection queues are incorporated and by calculating the most likely sequence of states, a estimation of the road ahead is made.Using the information of the modeled path, when the RB systems detects a situation where the functionality can be applied, a decision will be made by weighting the energetic potential of the path ahead and the energy need. When the algorithm sees fit, a higher torque may be applied to the generator, which will result in a larger quantity of energy recovered. Since this causes stress to the system, this functionality needs a robust intelligence layer

Repositório Aberto da Universidade do Porto

Statistical Models for Co-occurrence Data

Author: Hofmann Thomas
Puzicha Jan
Publication venue
Publication date: 01/01/1998
Field of study

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms

CiteSeerX

DSpace@MIT

Fundamental structures of dynamic social networks

Author: Lehmann Sune
Sekara Vedran
Stopczynski Arkadiusz
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2016
Field of study

Social systems are in a constant state of flux with dynamics spanning from minute-by-minute changes to patterns present on the timescale of years. Accurate models of social dynamics are important for understanding spreading of influence or diseases, formation of friendships, and the productivity of teams. While there has been much progress on understanding complex networks over the past decade, little is known about the regularities governing the micro-dynamics of social networks. Here we explore the dynamic social network of a densely-connected population of approximately 1000 individuals and their interactions in the network of real-world person-to-person proximity measured via Bluetooth, as well as their telecommunication networks, online social media contacts, geo-location, and demographic data. These high-resolution data allow us to observe social groups directly, rendering community detection unnecessary. Starting from 5-minute time slices we uncover dynamic social structures expressed on multiple timescales. On the hourly timescale, we find that gatherings are fluid, with members coming and going, but organized via a stable core of individuals. Each core represents a social context. Cores exhibit a pattern of recurring meetings across weeks and months, each with varying degrees of regularity. Taken together, these findings provide a powerful simplification of the social network, where cores represent fundamental structures expressed with strong temporal and spatial regularity. Using this framework, we explore the complex interplay between social and geospatial behavior, documenting how the formation of cores are preceded by coordination behavior in the communication networks, and demonstrating that social behavior can be predicted with high precision.Comment: Main Manuscript: 16 pages, 4 figures. Supplementary Information: 39 pages, 34 figure

arXiv.org e-Print Archive

DSpace@MIT

PubMed Central

Copenhagen University Research Information System

Online Research Database In Technology

Knowledge discovery from trajectories

Author: Li Song
Publication venue
Publication date: 05/03/2009
Field of study

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesAs a newly proliferating study area, knowledge discovery from trajectories has attracted more and more researchers from different background. However, there is, until now, no theoretical framework for researchers gaining a systematic view of the researches going on. The complexity of spatial and temporal information along with their combination is producing numerous spatio-temporal patterns. In addition, it is very probable that a pattern may have different definition and mining methodology for researchers from different background, such as Geographic Information Science, Data Mining, Database, and Computational Geometry. How to systematically define these patterns, so that the whole community can make better use of previous research? This paper is trying to tackle with this challenge by three steps. First, the input trajectory data is classified; second, taxonomy of spatio-temporal patterns is developed from data mining point of view; lastly, the spatio-temporal patterns appeared on the previous publications are discussed and put into the theoretical framework. In this way, researchers can easily find needed methodology to mining specific pattern in this framework; also the algorithms needing to be developed can be identified for further research. Under the guidance of this framework, an application to a real data set from Starkey Project is performed. Two questions are answers by applying data mining algorithms. First is where the elks would like to stay in the whole range, and the second is whether there are corridors among these regions of interest

Repositório da Universidade Nova de Lisboa

Data pre-processing to identify environmental risk factors associated with diabetes

Author: Wijesekara Lakmini
Publication venue: ,
Publication date: 01/01/2023
Field of study

Genetics, diet, obesity, and lack of exercise play a major role in the development of type II diabetes. Additionally, environmental conditions are also linked to type II diabetes. The aim of this research is to identify the environmental conditions associated with diabetes. To achieve this, the research study utilises hospital-admitted patient data in NSW integrated with weather, pollution, and demographic data. The environmental variables (air pollution and weather) change over time and space, necessitating spatiotemporal data analysis to identify associations. Moreover, the environmental variables are measured using sensors, and they often contain large gaps of missing values due to sensor failures. Therefore, enhanced methodologies in data cleaning and imputation are needed to facilitate research using this data. Hence, the objectives of this study are twofold: first, to develop a data cleaning and imputation framework with improved methodologies to clean and pre-process the environmental data, and second, to identify environmental conditions associated with diabetes. This study develops a novel data-cleaning framework that streamlines the practice of data analysis and visualisation, specifically for studying environmental factors such as climate change monitoring and the effects of weather and pollution. The framework is designed to efficiently handle data collected by remote sensors, enabling more accurate and comprehensive analyses of environmental phenomena that would otherwise not be possible. The study initially focuses on the Sydney Region, identifies missing data patterns, and utilises established imputation methods. It assesses the performance of existing techniques and finds that Kalman smoothing on structural time series models outperforms other methods. However, when dealing with larger gaps in missing data, none of the existing methods yield satisfactory results. To address this, the study proposes enhanced methodologies for filling substantial gaps in environmental datasets. The first proposed algorithm employs regularized regression models to fill large gaps in air quality data using a univariate approach. It is then extended to incorporate seasonal patterns and expand its applicability to weather data with similar patterns. Furthermore, the algorithm is enhanced by incorporating other correlated variables to accurately fill substantial gaps in environmental variables. Consistently, the algorithm presented in this thesis outperforms other methods in imputing large gaps. This algorithm is applicable for filling large gaps in air pollution and weather data, facilitating downstream analysis

Western Sydney ResearchDirect

Recommended from our members

Stacking-based visualization of trajectory attribute data

Author: Andrienko G.
Andrienko N.
Schumann H.
Tominski C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Visualizing trajectory attribute data is challenging because it involves showing the trajectories in their spatio-temporal context as well as the attribute values associated with the individual points of trajectories. Previous work on trajectory visualization addresses selected aspects of this problem, but not all of them. We present a novel approach to visualizing trajectory attribute data. Our solution covers space, time, and attribute values. Based on an analysis of relevant visualization tasks, we designed the visualization solution around the principle of stacking trajectory bands. The core of our approach is a hybrid 2D/3D display. A 2D map serves as a reference for the spatial context, and the trajectories are visualized as stacked 3D trajectory bands along which attribute values are encoded by color. Time is integrated through appropriate ordering of bands and through a dynamic query mechanism that feeds temporally aggregated information to a circular time display. An additional 2D time graph shows temporal information in full detail by stacking 2D trajectory bands. Our solution is equipped with analytical and interactive mechanisms for selecting and ordering of trajectories, and adjusting the color mapping, as well as coordinated highlighting and dedicated 3D navigation. We demonstrate the usefulness of our novel visualization by three examples related to radiation surveillance, traffic analysis, and maritime navigation. User feedback obtained in a small experiment indicates that our hybrid 2D/3D solution can be operated quite well

City Research Online

Fraunhofer-ePrints