6 research outputs found

    Literature Review on Temporal, Spatial, and Spatiotermpoal Data Models

    Get PDF
    This paper reviews papers on temporal databases, spatial databases, and spatio-temporal databases

    An XML-based implementation of the parametric model for ad-hoc query of temporal and spatiotemporal data

    Get PDF
    The parametric model is one of the data models for dimensional data. Values in the parametric model are defined as functions. Such modeling concept helps one achieve a one-to-one correspondence between objects in the real world and records in a database. One of the important requirements is that domains of values should be closed under the set theoretic operations such as union, intersection, and complementation. Because of this, ParaSQL, a query language of the parametric model, is able to mimic natural languages more closely. In this dissertation we validate and implement the parametric model for temporal and spatiotemporal data. We also develop a preliminary prototype for the users of NC-94, an interesting dataset in agriculture;Viewing values as functions leads variable-length tuples. Potentially, such values vary in size ranging from a few bytes to gigabytes and beyond. This makes implementation of the parametric model a challenging problem. To meet the challenge, we develop an XML-based storage and deploy it in our implementation. Incidentally, XML is also used for interfacing various modules and artifacts like parse tree, expression tree, and iterators to fetch data from a disk;The NC-94 dataset, mentioned above, contains the most complete record of spatiotemporal variables that characterize the dynamics of agriculture covering the north central region in the United States. To support ad-hoc query of data in its geospatial context, a novel hybrid structure is designed and implemented. We use GML to describe geospatial information. Use of GML is a good match, because it is XML-based. More importantly, it meets the set theoretic closure requirements proposed by the parametric model;Validation and implementation methodologies introduced in this dissertation will contribute to database and GIS communities. The validation demonstrates the ease of use and efficiency of the parametric model for temporal and spatiotemporal data. This should help settle a debate in temporal database community which has continued since the mid 1980s. The findings also extend to spatial and spatiotemporal data. It is an important baby-step toward full-fledged implementation of the parametric model. We hope that this work will also help bring database and GIS communities together

    A parametric prototype for spatiotemporal databases

    Get PDF
    The main goal of this project is to design and implement the parametric database (ParaDB). Conceptually, ParaDB consists of the parametric data model (ParaDM) and the parametric structured query language (ParaSQL). Parametric data model is a data model for multi-dimensional databases such as temporal, spatial, spatiotemporal, or multi-level secure databases. Main difference compared to the classical relational data model is that ParaDM models an object as a single tuple, and an attribute is defined as a function from parametric elements. The set of parametric elements is closed under union, intersection, and complementation. These operations are counterparts of or, and, and not in a natural language like English. Therefore, the closure properties provide very flexible ways to query on objects without introducing additional self-join operations which are frequently required in other multi-dimensional database models

    A framework for the management of deformable moving objects

    Get PDF
    There is an emergence of a growing number of applications and services based on spatiotemporal data in the most diverse areas of knowledge and human activity. The Internet of Things (IoT), the emergence of technologies that make it possible to collect information about the evolution of real world phenomena and the widespread use of devices that can use the Global Positioning System (GPS), such as smartphones and navigation systems, suggest that the volume and value of these data will increase significantly in the future. It is necessary to develop tools capable of extracting knowledge from these data and for this it is necessary to manage them: represent, manipulate, analyze and store, in an efficient way. But this data can be complex, its management is not trivial and there is not yet a complete system capable of performing this task. Works on moving points, that represent the position of objects over time, are frequent in the literature. On the contrary there are much less solutions for the representation of moving regions, that represent the continuous changes in position, shape and extent of objects over time, e.g., storms, fires and icebergs. The representation of the evolution of moving regions is complex and requires the use of more elaborate techniques, e.g., morphing and interpolation techniques, capable of producing realistic and geometrically valid representations. In this dissertation we present and propose a data model for moving objects (moving points and moving regions), in particular for moving regions, based on the concept of mesh and compatible triangulation and rigid interpolation methods. This model was implemented in a framework that is not client or application dependent and we also implemented a spatiotemporal extension for PostgreSQL that uses this framework to manipulate and analyze moving objects, as a proof of concept that our framework works with real applications. The tests’ results using real data, obtained from satellite images of the evolution of 2 icebergs over time, show that our data model works. Besides the results obtained one important contribution of this work is the development of a basic framework for moving objects that can be used as a basis for further investigation in this area. A few problems still remain that must be further studied and analyzed, in particular, the ones that were found when using the compatible triangulation and rigid interpolation methods with real data.Assistimos ao aparecimento de um número crescente de aplicações e serviços baseados em dados espácio-temporais nas mais diversas áreas do conhecimento e da atividade humana. A internet das coisas (IoT), o aparecimento de novas tecnologias que permitem obter dados sobre a evolução de fenómenos do mundo real e o uso generalizado de dispositivos que usam o sistema de posicionamento global (GPS), por exemplo, smartphones e sistemas de navegação, sugerem que o volume e o valor destes dados aumente significativamente no futuro. Torna-se necessário desenvolver ferramentas capazes de extrair conhecimento destes dados e para isso é necessário geri-los: representar, manipular, analisar e armazenar, de uma forma eficiente. Mas estes dados podem ser complexos, a sua gestão não é trivial e ainda não existe um sistema completo capaz de executar essa tarefa. Existe muito trabalho na literatura sobre pontos móveis, que representam as alterações da posição de objectos ao longo do tempo, mas existe muito menos trabalho realizado sobre regiões móveis, que representam as alterações da posição e da forma de regiões ao longo do tempo, por exemplo, uma tempestade, um incêndio ou um derramamento de petroleo. A representação da evolução de regiões móveis ao longo do tempo é complexa e exige o uso de técnicas mais elaboradas, por exemplo, técnicas de morphing e interpolação, capazes de produzir representações realistas e geometricamente válidas. Nesta dissertação apresentamos e propomos um modelo de dados para trabalhar com objetos móveis (pontos móveis e regiões móveis), em particular regiões móveis, baseado no conceito de malha e em métodos de triangulação compatível e interpolação rígida. Este modelo foi implementado num framework que é independente do cliente e da aplicação. Também implementámos uma extensão espácio-temporal para o sistema de gestão de base de dados PostgreSQL, que usa este framework para manipular e analisar objectos móveis, como uma prova de conceito que o nosso framework funciona com aplicações reais. Os resultados dos testes com dados reais, obtidos a partir de imagens de satélite da evolução de 2 icebergs ao longo do tempo, demonstram que o nosso modelo funciona. Para além dos resultados obtidos, um contributo importante desta dissertação é o desenvolvimento de um framework que pode ser usado como a base para trabalho futuro e investigação nesta área. Existem alguns problemas ainda por resolver e que devem ser analisados e estudados com mais cuidado, em particular, os que foram encontrados quando usámos os métodos de triangulação compatível e interpolação rigída em dados reais.Mestrado em Engenharia Informátic

    BigSQLTraj: A SQL-extended framework for storing & querying big mobility data

    Get PDF
    Τα τελευταία χρόνια, λόγω της ευρείας χρήση αισθητήρων και έξυπνων συσκευών, παρατηρείται μια εκθετική παραγωγή δεδομένων κίνησης, που εντάσσονται στην κατηγορία δεδομένα μεγάλης κλίμακας (big data). Για παράδειγμα εφαρμογές δρομολόγησης, παρακολούθηση κυκλοφοριακής ροής, έλεγχος στόλου ακόμη και προβλέψεις ή αποφυγή κινδύνων βασίζονται στην επεξεργασία χωρικών και χωροχρονικών δεδομένων. Τα δεδομένα αυτά πρέπει να αποθηκεύονται και να επεξεργάονται κατάλληλα ώστε στη συνέχεια να αποτελέσουν γνώση για τους οργανισμούς. Προφανώς η διδακασία αυτή απαιτεί συστήματα και τεχνολογίες κατάλληλες για τον μεγάλο όγκο δεδομένων εισόδου. Στην παρούσα διπλωματική εργασία χρησιμοποιήσαμε δεδομένων από κινήσεις πλοίων και πιο συγκεκριμένα δεδομένα που παράγονται από το automatic identification system (AIS). Για τους σκοπούς της συγκεκριμένης διπλωματικής εργασίας αναπτύχθηκε το σύστημα BigSQLTraj: Ένα πλαίσιο βασισμένο σε SQL για την αποθήκευση και επερώτηση μεγάλων δεδομένων απο κινούμενα αντικείμενα. Οι εφαρμογές μεγάλων δεδομένων περιλαμβάνουν τα επίπεδα διαχείρισης, επεξεργασίας, αναλυτικές και οπτικοποίησης δεδομένων απο ετερογενής πηγές ή σε ιστορικά δεδομένα ή σε δεδομένα ροών. Στην παρούσα διπλωματική εργασία εξετάζουμε τα επίπεδα διαχείρισης και επεξεργασίας μεγάλων ιστορικών δεδομένων. Στόχος του συστήματος είναι να παρέχει την δυνατότητα σε χρήστες να αποθηκεύουν και να επεξεργάζονται με αποδοτικό τρόπο μεγάλα γεωχωρικά και χωροχρονικά δεδομένα πάνω από ένα κατενεμημένο σύστημα επεκτείνωντας ή αναπαράγοντας μεθόδους και αλγορίθμους από ήδη υπάρχοντα συστήματα. Πρώτος στόχος της εργασίας είναι να επιλεχθούν εργαλία που θα μπορούν να επικοινωνούν μεταξύ τους και θα παρουσιάζουν μια ενιαία εικόνα στους εξωτερικούς χρήστες. Οι καινοτομίες που παρέχει το σύστημα είναι η δημιουργία μεθόδων για ισοκατανεμημένη, αλλά ταυτόχρονα βασισμένη στην ομοιότητα, διαμέριση των δεδομένων στους κόμβους της συστάδας υπολογιστών, η δημιουργία μιας SQL διεπαφής στο κατανεμημένο σύστημα που θα παρέχει εξελιγμένες μεθόδους για την επεξεργασία των αποθηκευμένων δεδομένων και θα επιτρέπει σε συστήματα που ήδη αλληλεπιδρούν με συστήματα βασισμένα σε SQL να μεταφερθούν σε τεχνολογίες μεγάλων δεδομένων με τις ελάχιστες δυνατές αλλαγές. Πρώτος στόχος της παρούσας διπλωματικής εργασίας είναι η ενσωμάτωση (integration) διάφορων τεχνολογιών. Η υλοποίηση της παρούσας διπλωματικής βασίζεται σε βιβλιοθήκες ανοιχτού κώδικα για επεξεργασία μεγάλων δεδομένων. Οι βιβλιοθήκες αυτές είναι: Apache Hadoop, Apache Spark, Apache Hive και Apache Tez. Οι βασικότερες λειτουργίες που παρέχει η βιβλιοθήκη Apache Hadoop είναι το κατανεμημένο σύστημα αρχείων (Hadoop Distributed File System) που γράφονται και διαβάζονται τα δεδομένα. Επιπλέον ο διαχειριστής πόρων του Apache Hadoop (Yarn - resource manager) που ελέγχει το φόρτο εργασίας των υπολογιστών της συστάδας και αναθέτει τις διεργασίες που πρέπει να εκτελεστούν. Τα δύο αυτά εργαλεία είναι αποτελούν τον πυλώνα τις ενσωμάτωσης μεταξύ των υπολογιστών της συστάδας αλλά και των βιβλιοθηκών που τρέχουν στη συστάδα. Η βιβλιοθήκη Apache Spark, μέσω του προγραμματιστικού πλασίου MapReduce, παρέχει την λειτουργία την επεξεργασίας είτε σε ιστορικά δεδομένα είτε σε ροές δεδομένων και την αποθηκευσή τους στο κατανεμημένο σύστημα αρχείων του Hadoop. Στη συνέχεια το Apache Hive μας δίνει την δυνατότητα για εκτέλεση ερωτήματων σε αρχεία που βρίσκονται στο κατανεμημένο σύστημα αρχείων του Hadoop μέσω της HiveQL γλώσσας που είναι ισοδύναμη με της παραδοσιακή SQL, ενώ οι βιβλιοθήκες Apache Spark και Apache Tez αποτελούν την μηχανή εκτέλεσης (execution engine) ενός HiveQL ερωτήματος και μεταφράζουν την επερώτηση σε MapReduce διαδικασία. Κανένα από τα παραπάνω συστήματα δεν έχει την δυνατότητα επεξεργασίας γεωχωρικών ή δεδομένων κίνησης στην βασική του εκδοχή. Οι προθήκες που έγιναν περιλαμβάνουν: 1)δημιουργία συναρτήσεων για τον καθαρισμό χωροχρονικών σημείων και δημιουργία τροχιών κινούμενων αντικειμένων από τα σημεία αυτά με την βιβλιοθήκη Apache Spark, 2)χωροχρονικός καταμερισμός των τροχιών στους υπολογιστές της συστάδας, δημιουργία ευρετηρίων. Τα ευρετήρια περιλαμβανουν την χωροχρονική έκταση της διαμιρασμένης πληροφορίας και μια κωδικοποίηση βασισμένη σε τρισδιάστατα τοπικά ευρετήρια βάσει της πληροφορίας που έχει κάθε υπολογιστής με χρήση των βιβλιοθηκών Apache Spark και Apache Hadoop, 3) Δημιουργία κατάλληλων μεθόδων, για την αξιοποίηση της αποθήκευσης τους προηγούμενου βήματος, για επερωτήσης διαστήματος (range queries) και επερωτήσεων ομοιότητας (kNN queries). H σύγκριση που πραγματοποιήσαμε αφορά τη χρονική απóδοση των επερωτήσεων διαστήματος (range queries) και επερωτήσεων ομοιότητας (kNN queries), βάσει του τρόπου αποθήκευσης των δεδομένων όπως αναφέρθηκε προηγουμένως. Σε πρώτη φάση συγκρίναμε την χρονική διάρκεια ολοκλήρωσης των παραπάνω ερωτημάτων για τους διαθέσιμους τρόπους αποθήκευσης και για τους διαθέσιμους μηχανισμούς εκτέλεσης συναρτήσει του αριθμού των υπολογιστών που τρέχουν στο κατανεμημένο σύστημα (scalability). Στη συνέχεια συγκρίναμε την χρονική διάρκεια ολοκλήρωσης των παραπάνω ερωτημάτων για τους διαθέσιμους τρόπους αποθήκευσης και για τους διαθέσιμους μηχανισμούς εκτέλεσης συναρτήσει του όγκου δεδομένων (speed-up), αυξάνοντας σε κάθε βήμα των όγκο δεδομένων. Τα αποτελέσματα μας έδειξαν ότι ο πιο αποδοτικός τρόπος εκτέλεσης των ερωτημάτων με τη χρήση ενός ευρετηρίου για την διαμιρασμένη πληροφορία και στην συνέχεια η χρήση μιας κωδικοποίησης βασισμένη σε τοπικά ευρετήρια για την ανάκτηση του τελικού αποτελέσματος με μηχανισμό εκτέλεσης τη βιβλιοθήκη Apache Spark.Last decades, the need for performing advanced queries over massively produced data, such as mobility traces, in efficient and scalable ways is particularly important. This thesis describes BigSQLTraj a framework that supports efficient storing, partitioning, indexing and querying on spatial and spatio-temporal (i.e. mobility) data over a distributed engine. Every big data end-to-end application is consists of four layers, data management, data processing, data analytics and data visualization for heterogeneous data sources for batch or streaming data. This thesis focuses on data management and data processing for historical data. The first goal is finding systems that offers ready-to-use integration pipelines to take advantage of the best operation of each tool. For our implementation we chose open source big data frameworks such as Apache Hadoop, Apache Spark, Apache Hive and Apache Tez. Apache Hadoop and especially its distributed file system (HDFS) allowed all the other libraries to have a common read and write layer. On the other hand Hadoop's Resource Manager (Yarn) exploits the all the available computer resource. BigSQLTraj extending the functionality of existing spatial or spatio-temporal systems, centralized or distributed, to create two core and independent components. The first component is responsible for storing, spatiotemporal partitioning and indexing the data into a distributed file system and it is implemented on-top of Apache Spark. Many spatio-temporal partitioners and a 3D-STRtree index are implemented to support a collection of operators apart from existing partitioners and indexing methods that inherit from state-of-the-art distributed spatial and spatiotemporal systems. The second component is a distributed sql engine. He extend the functionality of HiveQL in order to achieve rapid access in such kind of data (i.e. geospatial and mobility data) and storing. Our final goal is optimizing Hive's join procedure that is required for both query types using the data structures from the first toolbox. We demonstrate the functionality of our approach and we conduct an extensive experimental study based on state-of-the-art benchmarks for mobility data. Our benchmark focuses on the total execution time of range queries and kNN queries based on the data storing model. At first we compare the temporal performance of different storing alternatives and execution engines for the entire dataset and vary the number of workers in order to review the systems scalability. Furthermore, we vary the size of our dataset and measure the execution time of the queries. To study the effect of dataset size, we split the original dataset into 5 chunks (20%, 40%, 60%, 80%, 100%). Βased on the results we come to the conclusion that the best workflow includes a global index structure for workers metadata and a local index-based encoding for storing the entire trajectories of a partition into a single column and the execution time seems to follow linear behaviour

    Toward Spatiotemporal Patterns

    No full text
    Existing spatiotemporal data models and query languages offer only basic support to query changes of data. In particular, although these systems often allow the formulation of queries that ask for changes at particular time points, they fall short of expressing queries for sequences of such changes. In this chapter we propose the concept of spatiotemporal patterns as a systematic and scalable concept to query developments of objects and their relationships. Based on our previous work on spatiotemporal predicates, we outline the design of spatiotemporal patterns as a query mechanism to characterize complex object behaviors in space and time. We will not present a fully-fledged design. Instead, we will focus on deriving constraints that will allow spatiotemporal patterns to become well-designed composable abstractions that can be smoothly integrated into spatiotemporal query languages. Spatiotemporal patterns can be applied in many different areas of science, for example, in geosciences, geophysics, meteorology, ecology, and environmental studies. Since users in these areas typically do not have extended formal computer training, it is often difficult for them to use advanced query languages. A visual notation for spatiotemporal patterns can help solving this problem. In particular, since spatial objects and their relationships have a natural graphical representation, a visual notation can express relationships in many cases implicitly where textual notations require the explicit application of operations and predicates. Based on our work on the visualization of spatiotemporal predicates, we will sketch the design of a visual language to formulate spatiotemporal patterns
    corecore