4,486 research outputs found

    Discovery of Spatiotemporal Event Sequences

    Get PDF
    Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms

    Development and Applications of Similarity Measures for Spatial-Temporal Event and Setting Sequences

    Get PDF
    Similarity or distance measures between data objects are applied frequently in many fields or domains such as geography, environmental science, biology, economics, computer science, linguistics, logic, business analytics, and statistics, among others. One area where similarity measures are particularly important is in the analysis of spatiotemporal event sequences and associated environs or settings. This dissertation focuses on developing a framework of modeling, representation, and new similarity measure construction for sequences of spatiotemporal events and corresponding settings, which can be applied to different event data types and used in different areas of data science. The first core part of this dissertation presents a matrix-based spatiotemporal event sequence representation that unifies punctual and interval-based representation of events. This framework supports different event data types and provides support for data mining and sequence classification and clustering. The similarity measure is based on the modified Jaccard index with temporal order constraints and accommodates different event data types. This approach is demonstrated through simulated data examples and the performance of the similarity measures is evaluated with a k-nearest neighbor algorithm (k-NN) classification test on synthetic datasets. These similarity measures are incorporated into a clustering method and successfully demonstrate the usefulness in a case study analysis of event sequences extracted from space time series of a water quality monitoring system. This dissertation further proposes a new similarity measure for event setting sequences, which involve the space and time in which events occur. While similarity measures for spatiotemporal event sequences have been studied, the settings and setting sequences have not yet been considered. While modeling event setting sequences, spatial and temporal scales are considered to define the bounds of the setting and incorporate dynamic variables along with static variables. Using a matrix-based representation and an extended Jaccard index, new similarity measures are developed to allow for the use of all variable data types. With these similarity measures coupled with other multivariate statistical analysis approaches, results from a case study involving setting sequences and pollution event sequences associated with the same monitoring stations, support the hypothesis that more similar spatial-temporal settings or setting sequences may generate more similar events or event sequences. To test the scalability of STES similarity measure in a larger dataset and an extended application in different fields, this dissertation compares and contrasts the prospective space-time scan statistic with the STES similarity approach for identifying COVID-19 hotspots. The COVID-19 pandemic has highlighted the importance of detecting hotspots or clusters of COVID-19 to provide decision makers at various levels with better information for managing distribution of human and technical resources as the outbreak in the USA continues to grow. The prospective space-time scan statistic has been used to help identify emerging disease clusters yet results from this approach can encounter strategic limitations imposed by the spatial constraints of the scanning window. The STES-based approach adapted for this pandemic context computes the similarity of evolving normalized COVID-19 daily cases by county and clusters these to identify counties with similarly evolving COVID-19 case histories. This dissertation analyzes the spread of COVID-19 within the continental US through four periods beginning from late January 2020 using the COVID-19 datasets maintained by John Hopkins University, Center for Systems Science and Engineering (CSSE). Results of the two approaches can complement with each other and taken together can aid in tracking the progression of the pandemic. Overall, the dissertation highlights the importance of developing similarity measures for analyzing spatiotemporal event sequences and associated settings, which can be applied to different event data types and used for data mining, sequence classification, and clustering

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term ā€œalternative splicingā€ to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

    When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events

    Full text link
    Predicting both the time and the location of human movements is valuable but challenging for a variety of applications. To address this problem, we propose an approach considering both the periodicity and the sociality of human movements. We first define a new concept, Social Spatial-Temporal Event (SSTE), to represent social interactions among people. For the time prediction, we characterise the temporal dynamics of SSTEs with an ARMA (AutoRegressive Moving Average) model. To dynamically capture the SSTE kinetics, we propose a Kalman Filter based learning algorithm to learn and incrementally update the ARMA model as a new observation becomes available. For the location prediction, we propose a ranking model where the periodicity and the sociality of human movements are simultaneously taken into consideration for improving the prediction accuracy. Extensive experiments conducted on real data sets validate our proposed approach

    Urban Anomaly Analytics: Description, Detection, and Prediction

    Get PDF
    Urban anomalies may result in loss of life or property if not handled properly. Automatically alerting anomalies in their early stage or even predicting anomalies before happening is of great value for populations. Recently, data-driven urban anomaly analysis frameworks have been forming, which utilize urban big data and machine learning algorithms to detect and predict urban anomalies automatically. In this survey, we make a comprehensive review of the state-of-the-art research on urban anomaly analytics. We first give an overview of four main types of urban anomalies, traffic anomaly, unexpected crowds, environment anomaly, and individual anomaly. Next, we summarize various types of urban datasets obtained from diverse devices, i.e., trajectory, trip records, CDRs, urban sensors, event records, environment data, social media and surveillance cameras. Subsequently, a comprehensive survey of issues on detecting and predicting techniques for urban anomalies is presented. Finally, research challenges and open problems as discussed.Peer reviewe
    • ā€¦
    corecore