892 research outputs found

    Development and Applications of Similarity Measures for Spatial-Temporal Event and Setting Sequences

    Get PDF
    Similarity or distance measures between data objects are applied frequently in many fields or domains such as geography, environmental science, biology, economics, computer science, linguistics, logic, business analytics, and statistics, among others. One area where similarity measures are particularly important is in the analysis of spatiotemporal event sequences and associated environs or settings. This dissertation focuses on developing a framework of modeling, representation, and new similarity measure construction for sequences of spatiotemporal events and corresponding settings, which can be applied to different event data types and used in different areas of data science. The first core part of this dissertation presents a matrix-based spatiotemporal event sequence representation that unifies punctual and interval-based representation of events. This framework supports different event data types and provides support for data mining and sequence classification and clustering. The similarity measure is based on the modified Jaccard index with temporal order constraints and accommodates different event data types. This approach is demonstrated through simulated data examples and the performance of the similarity measures is evaluated with a k-nearest neighbor algorithm (k-NN) classification test on synthetic datasets. These similarity measures are incorporated into a clustering method and successfully demonstrate the usefulness in a case study analysis of event sequences extracted from space time series of a water quality monitoring system. This dissertation further proposes a new similarity measure for event setting sequences, which involve the space and time in which events occur. While similarity measures for spatiotemporal event sequences have been studied, the settings and setting sequences have not yet been considered. While modeling event setting sequences, spatial and temporal scales are considered to define the bounds of the setting and incorporate dynamic variables along with static variables. Using a matrix-based representation and an extended Jaccard index, new similarity measures are developed to allow for the use of all variable data types. With these similarity measures coupled with other multivariate statistical analysis approaches, results from a case study involving setting sequences and pollution event sequences associated with the same monitoring stations, support the hypothesis that more similar spatial-temporal settings or setting sequences may generate more similar events or event sequences. To test the scalability of STES similarity measure in a larger dataset and an extended application in different fields, this dissertation compares and contrasts the prospective space-time scan statistic with the STES similarity approach for identifying COVID-19 hotspots. The COVID-19 pandemic has highlighted the importance of detecting hotspots or clusters of COVID-19 to provide decision makers at various levels with better information for managing distribution of human and technical resources as the outbreak in the USA continues to grow. The prospective space-time scan statistic has been used to help identify emerging disease clusters yet results from this approach can encounter strategic limitations imposed by the spatial constraints of the scanning window. The STES-based approach adapted for this pandemic context computes the similarity of evolving normalized COVID-19 daily cases by county and clusters these to identify counties with similarly evolving COVID-19 case histories. This dissertation analyzes the spread of COVID-19 within the continental US through four periods beginning from late January 2020 using the COVID-19 datasets maintained by John Hopkins University, Center for Systems Science and Engineering (CSSE). Results of the two approaches can complement with each other and taken together can aid in tracking the progression of the pandemic. Overall, the dissertation highlights the importance of developing similarity measures for analyzing spatiotemporal event sequences and associated settings, which can be applied to different event data types and used for data mining, sequence classification, and clustering

    Applying data analytics to analyze activity sequences for an assessment of fragmentation in daily travel patterns: a case study of the metropolitan region of Barcelona

    Get PDF
    Sequence analysis is a robust methodological approach that has gained popularity in various fields, including transportation research. It provides a comprehensive way to understand the dynamics and patterns of individual behaviors over time. In the context of the Metropolitan Region of Barcelona, applying sequence analysis to mobility surveys offers valuable insights into the sequencing of travel activities and modes, shedding light on the complex interrelationship between individuals and their travel choices and the built environment. Sequence analysis allows us to examine travel behaviors as dynamic processes and reveal the underlying structure and evolution of travel patterns over a day. Here, we describe a data analytics approach that enables the identification of common travel patterns and the exploration of variations across demographic groups or geographical regions. We propose a method for discovering similarities in travel behavior segments from diaries included in travel surveys in order to refine transport policies for selected segments. Unfortunately, the data collected by the authorities in the analyzed travel surveys do not include family structure, which seems critical in explaining the segmentation of travel sequences.This research was funded by Spanish R+D Programs (PID2020-112967GB-C31) and by Secretaria d’Universitats-i-Recerca-Generalitat de Catalunya—2021 SGR 01252 Information Modeling and Processing.Peer ReviewedObjectius de Desenvolupament Sostenible::11 - Ciutats i Comunitats SosteniblesObjectius de Desenvolupament Sostenible::5 - Igualtat de GènerePostprint (published version

    Event Discovery and Classification in Space-Time Series: A Case Study for Storms

    Get PDF
    Recent advancement in sensor technology has enabled the deployment of wireless sensors for surveillance and monitoring of phenomenon in diverse domains such as environment and health. Data generated by these sensors are typically high-dimensional and therefore difficult to analyze and comprehend. Additionally, high level phenomenon that humans commonly recognize, such as storms, fire, traffic jams are often complex and multivariate which individual univariate sensors are incapable of detecting. This thesis describes the Event Oriented approach, which addresses these challenges by providing a way to reduce dimensionality of space-time series and a way to integrate multivariate data over space and/or time for the purpose of detecting and exploring high level events. The proposed Event Oriented approach is implemented using space-time series data from the Gulf of Maine Ocean Observation System (GOMOOS). GOMOOS is a long standing network of wireless sensors in the Gulf of Maine monitoring the high energy ocean environment. As a case study, high level storm events are detected and classified using the Event Oriented approach. A domain-independent ontology for detecting high level xvi composite events called a General Composite Event Ontology is presented and used as a basis of the Storm Event Ontology. Primitive events are detected from univariate sensors and assembled into Composite Storm Events using the Storm Event Ontology. To evaluate the effectiveness of the Event Oriented approach, the resulting candidate storm events are compared with an independent historic Storm Events Database from the National Climatic Data Center (NCDC) indicating that the Event Oriented approach detected about 92% of the storms recorded by the NCDC. The Event Oriented approach facilitates classification of high level composite event. In the case study, candidate storms were classified based on their spatial progression and profile. Since ontological knowledge is used for constructing high level event ontology, detection of candidate high level events could help refine existing ontological knowledge about them. In summary, this thesis demonstrates the Event Oriented approach to reduce dimensionality in complex space-time series sensor data and the facility to integrate ime series data over space for detecting high level phenomenon

    Analysing gender equality in Barcelona through (spatiotemporal) segmentation

    Get PDF
    Citizens take part in different activities to satisfy their needs, to invest in their socio-economic progress, participate in social and health activities that improve their well-being. However, activity participation is influenced by many factors in the built environment, but also individual’s attributes. Herein we analyze activity participation and travel through sequence analysis. This method explores sequences of daily activity and travel employing techniques from the sequencing of events in the life course of individuals. Studying sequences of daily episodes (each activity and each trip) considers the entire trajectory of a person’s activity during a day while at the same time considering the number of activities, order of activities in a day, and their durations jointly. We applied this method to a sample of residents in the Metropolitan Area of Barcelona (RMB) in the 2018, 2019 and 2020 EMEF Travel Surveys. The EMEF2020 deserves a particular analysis since activity patterns are expected to vary compared to pre-COVID19 spread. We have focused on that fragmentation in activity participation over the mean among persons in specific gender, age, activity and transportation mode.This research was funded by PID2020-112967GB-C31 Spanish R+D Programs and by Secretaria d’Universitats-i-Recerca-Generalitat de Catalunya- 2017-SGR- 1749. The datasets were kindly shared by the Autoritat del Transport Metropolità (ATM). Their contribution to our research is gratefully acknowledged.Objectius de Desenvolupament Sostenible::10 - Reducció de les DesigualtatsObjectius de Desenvolupament Sostenible::5 - Igualtat de GènerePreprin

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Probabilistic temporal multimedia datamining

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore