208 research outputs found

    Multiple-Aspect Analysis of Semantic Trajectories

    Get PDF
    This open access book constitutes the refereed post-conference proceedings of the First International Workshop on Multiple-Aspect Analysis of Semantic Trajectories, MASTER 2019, held in conjunction with the 19th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2019, in WĂĽrzburg, Germany, in September 2019. The 8 full papers presented were carefully reviewed and selected from 12 submissions. They represent an interesting mix of techniques to solve recurrent as well as new problems in the semantic trajectory domain, such as data representation models, data management systems, machine learning approaches for anomaly detection, and common pathways identification

    Mining sensor datasets with spatiotemporal neighborhoods

    Get PDF
    Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within each temporal interval. These methods were tested on real-world datasets including (a) sea surface temperature data from the Tropical Atmospheric Ocean Project (TAO) array in the Equatorial Pacific Ocean and (b) NEXRAD precipitation data from the Hydro-NEXRAD system. The results were evaluated based on known patterns of the phenomenon being measured. Furthermore the results were quantified by performing hypothesis testing to establish the statistical significance using Monte Carlo simulations. The approach was also compared with existing approaches using validation metrics namely spatial autocorrelation and temporal interval dissimilarity. The results of these experiments show that our approach indeed identifies highly refined spatiotemporal neighborhoods

    Discovery of Spatiotemporal Event Sequences

    Get PDF
    Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms

    RANK-BASED TEMPO-SPATIAL CLUSTERING: A FRAMEWORK FOR RAPID OUTBREAK DETECTION USING SINGLE OR MULTIPLE DATA STREAMS

    Get PDF
    In the recent decades, algorithms for disease outbreak detection have become one of the main interests of public health practitioners to identify and localize an outbreak as early as possible in order to warrant further public health response before a pandemic develops. Today’s increased threat of biological warfare and terrorism provide an even stronger impetus to develop methods for outbreak detection based on symptoms as well as definitive laboratory diagnoses. In this dissertation work, I explore the problems of rapid disease outbreak detection using both spatial and temporal information. I develop a framework of non-parameterized algorithms which search for patterns of disease outbreak in spatial sub-regions of the monitored region within a certain period. Compared to the current existing spatial or tempo-spatial algorithm, the algorithms in this framework provide a methodology for fast searching of either univariate data set or multivariate data set. It first measures which study area is more likely to have an outbreak occurring given the baseline data and currently observed data. Then it applies a greedy searching mechanism to look for clusters with high posterior probabilities given the risk measurement for each unit area as heuristic. I also explore the performance of the proposed algorithms. From the perspective of predictive modeling, I adopt a Gamma-Poisson (GP) model to compute the probability of having an outbreak in each cluster when analyzing univariate data. I build a multinomial generalized Dirichlet (MGD) model to identify outbreak clusters from multivariate data which include the OTC data streams collected by the national retail data monitor (NRDM) and the ED data streams collected by the RODS system. Key contributions of this dissertation include 1) it introduces a rank-based tempo-spatial clustering algorithm, RSC, by utilizing greedy searching and Bayesian GP model for disease outbreak detection with comparable detection timeliness, cluster positive prediction value (PPV) and improved running time; 2) it proposes a multivariate extension of RSC (MRSC) which applies MGD model. The evaluation demonstrated the advantage that MGD model can effectively suppress the false alarms caused by elevated signals that are non-disease relevant and occur in all the monitored data streams

    Urban air pollution modelling with machine learning using fixed and mobile sensors

    Get PDF
    Detailed air quality (AQ) information is crucial for sustainable urban management, and many regions in the world have built static AQ monitoring networks to provide AQ information. However, they can only monitor the region-level AQ conditions or sparse point-based air pollutant measurements, but cannot capture the urban dynamics with high-resolution spatio-temporal variations over the region. Without pollution details, citizens will not be able to make fully informed decisions when choosing their everyday outdoor routes or activities, and policy-makers can only make macroscopic regulating decisions on controlling pollution triggering factors and emission sources. An increasing research effort has been paid on mobile and ubiquitous sampling campaigns as they are deemed the more economically and operationally feasible methods to collect urban AQ data with high spatio-temporal resolution. The current research proposes a Machine Learning based AQ Inference (Deep AQ) framework from data-driven perspective, consisting of data pre-processing, feature extraction and transformation, and pixelwise (grid-level) AQ inference. The Deep AQ framework is adaptable to integrate AQ measurements from the fixed monitoring sites (temporally dense but spatially sparse), and mobile low-cost sensors (temporally sparse but spatially dense). While instantaneous pollutant concentration varies in the micro-environment, this research samples representative values in each grid-cell-unit and achieves AQ inference at 1 km \times 1 km pixelwise scale. This research explores the predictive power of the Deep AQ framework based on samples from only 40 fixed monitoring sites in Chengdu, China (4,900 {\mathrm{km}}^\mathrm{2}, 26 April - 12 June 2019) and collaborative sampling from 28 fixed monitoring sites and 15 low-cost sensors equipped with taxis deployed in Beijing, China (3,025 {\mathrm{km}}^\mathrm{2}, 19 June - 16 July 2018). The proposed Deep AQ framework is capable of producing high-resolution (1 km \times 1 km, hourly) pixelwise AQ inference based on multi-source AQ samples (fixed or mobile) and urban features (land use, population, traffic, and meteorological information, etc.). This research has achieved high-resolution (1 km \times 1 km, hourly) AQ inference (Chengdu: less than 1% spatio-temporal coverage; Beijing: less than 5% spatio-temporal coverage) with reasonable and satisfactory accuracy by the proposed methods in urban cases (Chengdu: SMAPE \mathrm{<} 20%; Beijing: SMAPE \mathrm{<} 15%). Detailed outcomes and main conclusions are provided in this thesis on the aspects of fixed and mobile sensing, spatio-temporal coverage and density, and the relative importance of urban features. Outcomes from this research facilitate to provide a scientific and detailed health impact assessment framework for exposure analysis and inform policy-makers with data driven evidence for sustainable urban management.Open Acces

    Superresolution Reconstruction for Magnetic Resonance Spectroscopic Imaging Exploiting Low-Rank Spatio-Spectral Structure

    Get PDF
    Magnetic resonance spectroscopic imaging (MRSI) is a rapidly developing medical imaging modality, capable of conferring both spatial and spectral information content, and has become a powerful clinical tool. The ability to non-invasively observe spatial maps of metabolite concentrations, for instance, in the human brain, can offer functional, as well as pathological insights, perhaps even before structural aberrations or behavioral symptoms are evinced. Despite its lofty clinical prospects, MRSI has traditionally remained encumbered by a number of practical limitations. Of primary concern are the vastly reduced concentrations of tissue metabolites when compared to that of water, which forms the basis for conventional MR imaging. Moreover, the protracted exam durations required by MRSI routinely approach the limits for patient compliance. Taken in conjunction, the above considerations effectively circumscribe the data collection process, ultimately translating to coarse image resolutions that are of diminished clinical utility. Such shortcomings are compounded by spectral contamination artifacts due to the system pointspread function, which arise as a natural consequence when reconstructing non-band-limited data by the inverse Fourier transform. These artifacts are especially pronounced near regions characterized by substantial discrepancies in signal intensity, for example, the interface between normal brain and adipose tissue, whereby the metabolite signals are inundated by the dominant lipid resonances. In recent years, concerted efforts have been made to develop alternative, non-Fourier MRSI reconstruction strategies that aim to surmount the aforementioned limitations. In this dissertation, we build upon the burgeoning medley of innovative and promising techniques, proffering a novel superresolution reconstruction framework predicated on the recent interest in low-rank signal modeling, along with state-of-the-art regularization methods. The proposed framework is founded upon a number of key tenets. Firstly, we proclaim that the underlying spatio-spectral distribution of the investigated object admits a bilinear representation, whereby spatial and spectral signal components can be effectively segregated. We further maintain that the dimensionality of the subspace spanned by the components is, in principle, bounded by a modest number of observable metabolites. Secondly, we assume that local susceptibility effects represent the primary sources of signal corruption that tend to disallow such representations. Finally, we assert that the spatial components belong to a class of real-valued, non-negative, and piecewise linear functions, compelled in part through the use of a total variation regularization penalty. After demonstrating superior spatial and spectral localization properties in both numerical and physical phantom data when compared against standard Fourier methods, we proceed to evaluate reconstruction performance in typical in vivo settings, whereby the method is extended in order to promote the recovery of signal variations throughout the MRSI slice thickness. Aside from the various technical obstacles, one of the cardinal prospective challenges for high-resolution MRSI reconstruction is the shortfall of reliable ground truth data prudent for validation, thereby prompting reservations surrounding the resulting experimental outcomes. [...

    Attribute Relationship Analysis in Outlier Mining and Stream Processing

    Get PDF
    The main theme of this thesis is to unite two important fields of data analysis, outlier mining and attribute relationship analysis. In this work we establish the connection between these two fields. We present techniques which exploit this connection, allowing to improve outlier detection in high dimensional data. In the second part of the thesis we extend our work to the emerging topic of data streams
    • …
    corecore