927 research outputs found
Representation and Exploitation of Event Sequences
Programa Oficial de Doutoramento en Computación . 5009V01[Abstract]
The Ten Commandments, the thirty best smartphones in the market and
the five most wanted people by the FBI. Our life is ruled by sequences:
thought sequences, number sequences, event sequences. . . a history book
is nothing more than a compilation of events and our favorite film is
just a sequence of scenes. All of them have something in common, it
is possible to acquire relevant information from them. Frequently, by
accumulating some data from the elements of each sequence we may
access hidden information (e.g. the passengers transported by a bus
on a journey is the sum of the passengers who got on in the sequence
of stops made); other times, reordering the elements by any of their
characteristics facilitates the access to the elements of interest (e.g. the
publication of books in 2019 can be ordered chronologically, by author,
by literary genre or even by a combination of characteristics); but it
will always be sought to store them in the smallest space possible.
Thus, this thesis proposes technological solutions for the storage
and subsequent processing of events, focusing specifically on three
fundamental aspects that can be found in any application that needs
to manage them: compressed and dynamic storage, aggregation
or accumulation of elements of the sequence and element sequence
reordering by their different characteristics or dimensions.
The first contribution of this work is a compact structure for the
dynamic compression of event sequences. This structure allows any
sequence to be compressed in a single pass, that is, it is capable of
compressing in real time as elements arrive. This contribution is
a milestone in the world of compression since, to date, this is the
first proposal for a variable-to-variable dynamic compressor for general purpose.
Regarding aggregation, a data warehouse-like proposal is presented
capable of storing information on any characteristic of the events in a
sequence in an aggregated, compact and accessible way. Following the
philosophy of current data warehouses, we avoid repeating cumulative
operations and speed up aggregate queries by preprocessing the
information and keeping it in this separate structure.
Finally, this thesis addresses the problem of indexing event sequences
considering their different characteristics and possible reorderings. A new
approach for simultaneously keeping the elements of a sequence ordered
by different characteristics is presented through compact structures.
Thus, it is possible to consult the information and perform operations
on the elements of the sequence using any possible rearrangement in a
simple and efficient way.[Resumen]
Los diez mandamientos, los treinta mejores móviles del mercado y las
cinco personas más buscadas por el FBI. Nuestra vida está gobernada
por secuencias: secuencias de pensamientos, secuencias de números,
secuencias de eventos. . . un libro de historia no es más que una sucesión
de eventos y nuestra película favorita no es sino una secuencia de
escenas. Todas ellas tienen algo en común, de todas podemos extraer
información relevante. A veces, al acumular algún dato de los elementos
de cada secuencia accedemos a información oculta (p. ej. los viajeros
transportados por un autobús en un trayecto es la suma de los pasajeros
que se subieron en la secuencia de paradas realizadas); otras veces, la
reordenación de los elementos por alguna de sus características facilita
el acceso a los elementos de interés (p. ej. la publicación de obras
literarias en 2019 puede ordenarse cronológicamente, por autor, por
género literario o incluso por una combinación de características); pero
siempre se buscará almacenarlas en el espacio más reducido posible sin
renunciar a su contenido.
Por ello, esta tesis propone soluciones tecnológicas para el almacenamiento
y posterior procesamiento de secuencias, centrándose
concretamente en tres aspectos fundamentales que se pueden encontrar
en cualquier aplicación que precise gestionarlas: el almacenamiento
comprimido y dinámico, la agregación o acumulación de algún dato
sobre los elementos de la secuencia y la reordenación de los elementos
de la secuencia por sus diferentes características o dimensiones.
La primera contribución de este trabajo es una estructura compacta
para la compresión dinámica de secuencias. Esta estructura permite
comprimir cualquier secuencia en una sola pasada, es decir, es capaz de comprimir en tiempo real a medida que llegan los elementos de la
secuencia. Esta aportación es un hito en el mundo de la compresión ya
que, hasta la fecha, es la primera propuesta de un compresor dinámico
“variable to variable” de carácter general.
En cuanto a la agregación, se presenta una propuesta de almacén
de datos capaz de guardar la información acumulada sobre alguna
característica de los eventos de la secuencia de modo compacto y
fácilmente accesible. Siguiendo la filosofía de los actuales almacenes de
datos, el objetivo es evitar repetir operaciones de acumulación y agilizar
las consultas agregadas mediante el preprocesado de la información
manteniéndola en esta estructura.
Por último, esta tesis aborda el problema de la indexación de
secuencias de eventos considerando sus diferentes características y
posibles reordenaciones. Se presenta una nueva forma de mantener
simultáneamente ordenados los elementos de una secuencia por diferentes
características a través de estructuras compactas. Así se permite
consultar la información y realizar operaciones sobre los elementos
de la secuencia usando cualquier posible ordenación de una manera
sencilla y eficiente
Interpreting Pedestrian Behaviour by Visualising and Clustering Movement Data
Recent technological advances have increased the quantity of movement data being recorded. While valuable knowledge can be gained by analysing such data, its sheer volume creates challenges. Geovisual analytics, which helps the human cognition process by using tools to reason about data, offers powerful techniques to resolve these challenges. This paper introduces such a geovisual analytics environment for exploring movement trajectories, which provides visualisation interfaces, based on the classic space-time cube. Additionally, a new approach, using the mathematical description of motion within a space-time cube, is used to determine the similarity of trajectories and forms the basis for clustering them. These techniques were used to analyse pedestrian movement. The results reveal interesting and useful spatiotemporal patterns and clusters of pedestrians exhibiting similar behaviour
NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA
Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002
Spatio-temporal pattern mining from global positioning systems (GPS) trajectories dataset
Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesThe increasing frequency of use location-acquisition technology like the Global Positioning System is leading to the collection of large spatio-temporal datasets. The prospect of discovering usable knowledge about movement behavior, which encourages for the discovery of interesting relationships and characteristics users that may exist implicitly in spatial databases. Therefore spatial data mining is emerging as a novel area of research.
In this study, the experiments were conducted following the Knowledge Discovery in Database process model. The Knowledge Discovery in Database process model starts from selection of the datasets. The GPS trajectory dataset for this research collected from Microsoft Research Asia Geolife project. After taking the data, it has been preprocessed. The major preprocessing activities include:
Fill in missed values and remove outliers;
Resolve inconsistencies, integration of data that contains both labeled and unlabeled datasets,
Dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study.
A total of 4,273 trajectory dataset are used for training the models. For validating the performance of the selected model a separate 1,018 records are used as a testing set. For building a spatiotemporal model of this study the K-nearest Neighbors (KNN), decision tree and Bayes algorithms have been tasted as supervised approach.
The model that was created using 10-fold cross validation with K value 11 and other default parameter values showed the best classification accuracy. The model has a prediction accuracy of 98.5% on the training datasets and 93.12% on the test dataset to classify the new instances as bike, bus, car, subway, train and walk classes. The findings of this study have shown that the spatiotemporal data mining methods help to classify user mobility transportation modes. Future research directions are forwarded to come up an applicable system in the area of the study
Robotic Exploration for Learning Human Motion Patterns
Understanding how people are likely to move is key to efficient and safe robot navigation in human environments. However, mobile robots can only observe a fraction of the environment at a time, while the activity patterns of people may also change at different times. This paper introduces a new methodology for mobile robot exploration to maximise the knowledge of human activity patterns by deciding where and when to collect observations. We introduce an exploration policy driven by the entropy levels in a spatio-temporal map of pedestrian flows, and compare multiple spatio-temporal exploration strategies including both informed and uninformed approaches. The evaluation is performed by simulating mobile robot exploration using real sensory data from three long-term pedestrian datasets. The results show that for certain scenarios the models built with proposed exploration system can better predict the flow patterns than uninformed strategies, allowing the robot to move in a more socially compliant way, and that the exploration ratio is a key factor when it comes to the model prediction accuracy
Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective
Data-driven decision making is becoming an integral part of manufacturing
companies. Data is collected and commonly used to improve efficiency and
produce high quality items for the customers. IoT-based and other forms of
object tracking are an emerging tool for collecting movement data of
objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over
space and time. Movement data can provide valuable insights like process
bottlenecks, resource utilization, effective working time etc. that can be used
for decision making and improving efficiency.
Turning movement data into valuable information for industrial management and
decision making requires analysis methods. We refer to this process as movement
analytics. The purpose of this document is to review the current state of work
for movement analytics both in manufacturing and more broadly.
We survey relevant work from both a theoretical perspective and an
application perspective. From the theoretical perspective, we put an emphasis
on useful methods from two research areas: machine learning, and logic-based
knowledge representation. We also review their combinations in view of movement
analytics, and we discuss promising areas for future development and
application. Furthermore, we touch on constraint optimization.
From an application perspective, we review applications of these methods to
movement analytics in a general sense and across various industries. We also
describe currently available commercial off-the-shelf products for tracking in
manufacturing, and we overview main concepts of digital twins and their
applications
Recommended from our members
Pre-flight conflict detection and resolution for UAV integration in shared airspace: Sendai 2030 model case
The increasing demand for services performed by Unmanned Aerial Vehicles (UAVs) requires the simulation of Unmanned Aircraft System Traffic Management (UTM) systems. In particular, Pre-Flight Conflict Detection and Resolution (CDR) methods need to scale to future demand levels and generate conflict-free paths for a potentially large number of UAVs before actual takeoff. However, few studies have examined realistic scenarios and the requirements for the UTM system. In this paper, we focus on the Sendai 2030 model case, a realistic projection of UAV usage for deliveries in one area in Japan. This model case considers up to 21,000 requests for Unmanned Aircraft Systems (UAS) operations over a 13 hour service time, and thus poses a challenge for the Pre-Flight CDR methods. Therefore, we propose an airspace reservation method based on 4DT (3D plus time Trajectories) and map the Pre-Flight CDR problem to a Multi-Agent Path Finding (MAPF) problem. We study first-come first-served (FCFS) and “batch” processing of UAS operation requests, and compare the throughput of those methods. We analyze the air traffic topology of deliveries by UAVs, and discuss several metrics to better understand the complexity of air traffic in the Sendai model case
Using Semantic Web technologies in the development of data warehouses: A systematic mapping
The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043
- …