Search CORE

638 research outputs found

ST-Hadoop: A MapReduce Framework for Big Spatio-temporal Data Management

Author: Alarabi Louai
Publication venue
Publication date: 01/05/2019
Field of study

University of Minnesota Ph.D. dissertation.May 2019. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); x, 123 pages.Apache Hadoop, employing the MapReduce programming paradigm, that has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework was not genuinely exploited towards processing large scale spatio-temporal data, especially with the emergence and popularity of applications that create them in large-scale. The huge volumes of spatio-temporal data come from applications, like Taxi fleet in urban computing, Asteroids in astronomy research studies, animal movements in habitat studies, neuron analysis in neuroscience research studies, and contents of social networks (e.g., Twitter or Facebook). Managing space and time are two fundamental characteristics that raised the demand for processing spatio-temporal data created by these applications. Besides the massive size of data, the complexity of shapes and formats associated with these data raised many challenges in managing spatio-temporal data. The goal of the dissertation is centered on establishing a full-fledged big spatio-temporal data management system that serves the need for a wide range of spatio-temporal applications. This involves indexing, querying, and analyzing spatio-temporal data. We propose ST-Hadoop; the first full-fledged open-source system with native support for big spatio-temporal data, available to download http://st-hadoop.cs.umn.edu/. ST- Hadoop injects spatio-temporal data awareness inside the highly popular Hadoop system that is considered state-of-the-art for off-line analysis of big data systems. Considering a distributed environment, we focus on the following: (1) indexing spatio-temporal data and (2) Supporting various fundamental spatio-temporal operations, such as range, kNN, and join (3) Supporting indexing and querying trajectories, which is considered as a special class of spatio-temporal data that require special handling. Throughout this dissertation, we will touch base on the background and related work, motivate for the proposed system, and highlight our contributions

University of Minnesota Digital Conservancy

DCMS: A data analytics and management system for molecular simulation

Author: Anand Kumar
Joseph C Fogarty
Meryem Berrada
Sagar A Pandit
Vladimir Grupcev
Xingquan Zhu
Yi-Cheng Tu
Yuni Xia
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Doctor of Philosophy

Author: Kobayashi Tetsuo
Publication venue: University of Utah
Publication date: 04/01/2011
Field of study

dissertationRecent advancements in mobile devices - such as Global Positioning System (GPS), cellular phones, car navigation system, and radio-frequency identification (RFID) - have greatly influenced the nature and volume of data about individual-based movement in space and time. Due to the prevalence of mobile devices, vast amounts of mobile objects data are being produced and stored in databases, overwhelming the capacity of traditional spatial analytical methods. There is a growing need for discovering unexpected patterns, trends, and relationships that are hidden in the massive mobile objects data. Geographic visualization (GVis) and knowledge discovery in databases (KDD) are two major research fields that are associated with knowledge discovery and construction. Their major research challenges are the integration of GVis and KDD, enhancing the ability to handle large volume mobile objects data, and high interactivity between the computer and users of GVis and KDD tools. This dissertation proposes a visualization toolkit to enable highly interactive visual data exploration for mobile objects datasets. Vector algebraic representation and online analytical processing (OLAP) are utilized for managing and querying the mobile object data to accomplish high interactivity of the visualization tool. In addition, reconstructing trajectories at user-defined levels of temporal granularity with time aggregation methods allows exploration of the individual objects at different levels of movement generality. At a given level of generality, individual paths can be combined into synthetic summary paths based on three similarity measures, namely, locational similarity, directional similarity, and geometric similarity functions. A visualization toolkit based on the space-time cube concept exploits these functionalities to create a user-interactive environment for exploring mobile objects data. Furthermore, the characteristics of visualized trajectories are exported to be utilized for data mining, which leads to the integration of GVis and KDD. Case studies using three movement datasets (personal travel data survey in Lexington, Kentucky, wild chicken movement data in Thailand, and self-tracking data in Utah) demonstrate the potential of the system to extract meaningful patterns from the otherwise difficult to comprehend collections of space-time trajectories

The University of Utah: J. Willard Marriott Digital Library

THREE TEMPORAL PERSPECTIVES ON DECENTRALIZED LOCATION-AWARE COMPUTING: PAST, PRESENT, FUTURE

Author: Chapuis Bertil
Publication venue: Université de Lausanne, Faculté des hautes études commerciales
Publication date: 01/01/2018
Field of study

Durant les quatre dernières décennies, la miniaturisation a permis la diffusion à large échelle des ordinateurs, les rendant omniprésents. Aujourd’hui, le nombre d’objets connectés à Internet ne cesse de croitre et cette tendance n’a pas l’air de ralentir. Ces objets, qui peuvent être des téléphones mobiles, des véhicules ou des senseurs, génèrent de très grands volumes de données qui sont presque toujours associés à un contexte spatiotemporel. Le volume de ces données est souvent si grand que leur traitement requiert la création de système distribués qui impliquent la coopération de plusieurs ordinateurs. La capacité de traiter ces données revêt une importance sociétale. Par exemple: les données collectées lors de trajets en voiture permettent aujourd’hui d’éviter les em-bouteillages ou de partager son véhicule. Un autre exemple: dans un avenir proche, les données collectées à l’aide de gyroscopes capables de détecter les trous dans la chaussée permettront de mieux planifier les interventions de maintenance à effectuer sur le réseau routier. Les domaines d’applications sont par conséquent nombreux, de même que les problèmes qui y sont associés. Les articles qui composent cette thèse traitent de systèmes qui partagent deux caractéristiques clés: un contexte spatiotemporel et une architecture décentralisée. De plus, les systèmes décrits dans ces articles s’articulent autours de trois axes temporels: le présent, le passé, et le futur. Les systèmes axés sur le présent permettent à un très grand nombre d’objets connectés de communiquer en fonction d’un contexte spatial avec des temps de réponses proche du temps réel. Nos contributions dans ce domaine permettent à ce type de système décentralisé de s’adapter au volume de donnée à traiter en s’étendant sur du matériel bon marché. Les systèmes axés sur le passé ont pour but de faciliter l’accès a de très grands volumes données spatiotemporelles collectées par des objets connectés. En d’autres termes, il s’agit d’indexer des trajectoires et d’exploiter ces indexes. Nos contributions dans ce domaine permettent de traiter des jeux de trajectoires particulièrement denses, ce qui n’avait pas été fait auparavant. Enfin, les systèmes axés sur le futur utilisent les trajectoires passées pour prédire les trajectoires que des objets connectés suivront dans l’avenir. Nos contributions permettent de prédire les trajectoires suivies par des objets connectés avec une granularité jusque là inégalée. Bien qu’impliquant des domaines différents, ces contributions s’articulent autour de dénominateurs communs des systèmes sous-jacents, ouvrant la possibilité de pouvoir traiter ces problèmes avec plus de généricité dans un avenir proche. -- During the past four decades, due to miniaturization computing devices have become ubiquitous and pervasive. Today, the number of objects connected to the Internet is in- creasing at a rapid pace and this trend does not seem to be slowing down. These objects, which can be smartphones, vehicles, or any kind of sensors, generate large amounts of data that are almost always associated with a spatio-temporal context. The amount of this data is often so large that their processing requires the creation of a distributed system, which involves the cooperation of several computers. The ability to process these data is important for society. For example: the data collected during car journeys already makes it possible to avoid traffic jams or to know about the need to organize a carpool. Another example: in the near future, the maintenance interventions to be carried out on the road network will be planned with data collected using gyroscopes that detect potholes. The application domains are therefore numerous, as are the prob- lems associated with them. The articles that make up this thesis deal with systems that share two key characteristics: a spatio-temporal context and a decentralized architec- ture. In addition, the systems described in these articles revolve around three temporal perspectives: the present, the past, and the future. Systems associated with the present perspective enable a very large number of connected objects to communicate in near real-time, according to a spatial context. Our contributions in this area enable this type of decentralized system to be scaled-out on commodity hardware, i.e., to adapt as the volume of data that arrives in the system increases. Systems associated with the past perspective, often referred to as trajectory indexes, are intended for the access to the large volume of spatio-temporal data collected by connected objects. Our contributions in this area makes it possible to handle particularly dense trajectory datasets, a problem that has not been addressed previously. Finally, systems associated with the future per- spective rely on past trajectories to predict the trajectories that the connected objects will follow. Our contributions predict the trajectories followed by connected objects with a previously unmet granularity. Although involving different domains, these con- tributions are structured around the common denominators of the underlying systems, which opens the possibility of being able to deal with these problems more generically in the near future

Serveur académique lausannois

Graph-based topic models for trajectory clustering in crowd videos

Author: AWM Smeulders
B Zhou
D Blei
D Blei
E Burceanu
G Yuan
H Rabiee
I Borg
L Kratz
L van der Maaten
M Chen
M Gariel
S Salti
T Brox
VD Silva
W Lin
W Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/06/2020
Field of study

Probabilistic topic modelings, such as latent Dirichlet allocation (LDA) and correlated topic models (CTM), have recently emerged as powerful statistical tools for processing video content. They share an important property, i.e., using a common set of topics to model all data. However such property can be too restrictive for modeling complex visual data such as crowd scenes where multiple fields of heterogeneous data jointly provide rich information about objects and events. This paper proposes graph-based extensions of LDA and CTM, referred to as GLDA and GCTM, to learn and analyze motion patterns by trajectory clustering in a highly cluttered and crowded environment. Unlike previous works that relied on a scene prior, we apply a spatio-temporal graph (STG) to uncover the spatial and temporal coherence between the trajectories of crowd motion during the learning process. The presented models advance the conventional approaches by integrating a manifold-based clustering as initialization and iterative statistical inference as optimization. The output of GLDA and GCTM are mid-level features that represent the motion patterns used later to generate trajectory clusters. Experiments on three different datasets show the effectiveness of the approaches in trajectory clustering and crowd motion modeling

Crossref

White Rose Research Online

On the Nature and Types of Anomalies: A Review

Author: Foorthuis Ralph
Publication venue
Publication date: 27/12/2020
Field of study

Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement

arXiv.org e-Print Archive

A Framework for Spatio-Temporal Trajectory Data Segmentation and Query

Author: Kang Huaqiang
Publication venue
Publication date: 26/03/2019
Field of study

Trajectory segmentation is a technique of dividing sequential trajectory data into segments. These segments are building blocks to various applications for big trajectory data. Hence a system framework is essential to support trajectory segment indexing, storage, and query. When the size of segments is beyond the computing capacity of a single processing node, a distributed solution is proposed. In this thesis, a distributed trajectory segmentation framework that includes a greedy-split segmentation method is created. This framework consists of distributed in-memory processing and a cluster of graph storage respectively. For fast trajectory queries, distributed spatial R-tree index of trajectory segments is applied. Using the trajectory indexes, this framework builds queries of segments from in-memory processing and from the graph storage. Based on this segmentation framework, two metrics to measure trajectory similarity and chance of collision are defined. These two metrics are further applied to identify moving groups of trajectories. This study quantitatively evaluates the effects of data partition, parallelism, and data size on the system. The study identifies the bottleneck factors at the data partition stage, and validate two mitigation solutions. The evaluation demonstrates the distributed segmentation method and the system framework scale as the growth of the workload and the size of the parallel cluster

Concordia University Research Repository