Search CORE

623 research outputs found

Speed Partitioning for Indexing Moving Objects

Author: CS Jensen
D Sidlauskas
J Dittrich
J Schiller
M Zhang
MA Nascimento
ML Yiu
RA Finkel
S Chen
T Brinkhoff
T Nguyen
Y Zhu
YN Silva
Publication venue
Publication date: 22/04/2015
Field of study

Indexing moving objects has been extensively studied in the past decades. Moving objects, such as vehicles and mobile device users, usually exhibit some patterns on their velocities, which can be utilized for velocity-based partitioning to improve performance of the indexes. Existing velocity-based partitioning techniques rely on some kinds of heuristics rather than analytically calculate the optimal solution. In this paper, we propose a novel speed partitioning technique based on a formal analysis over speed values of the moving objects. We first show that speed partitioning will significantly reduce the search space expansion which has direct impacts on query performance of the indexes. Next we formulate the optimal speed partitioning problem based on search space expansion analysis and then compute the optimal solution using dynamic programming. We then build the partitioned indexing system where queries are duplicated and processed in each index partition. Extensive experiments demonstrate that our method dramatically improves the performance of indexes for moving objects and outperforms other state-of-the-art velocity-based partitioning approaches

arXiv.org e-Print Archive

CiteSeerX

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

A java framework for object detection and tracking, 2007

Author
Publication venue
Publication date
Field of study

Object detection and tracking is an important problem in the automated analysis of video. There have been numerous approaches and technological advances for object detection and tracking in the video analysis. As one of the most challenging and active research areas, more algorithms will be proposed in the future. As a consequence, there will be the demand for the capability to provide a system that can effectively collect, organize, group, document and implement these approaches. The purpose of this thesis is to develop one uniform object detection and tracking framework, capable of detecting and tracking the multi-objects in the presence of occlusion. The object detection and tracking algorithms are classified into different categories and incorporated into the framework implemented in Java. The framework can adapt to different types, and different application domains, and be easy and convenient for developers to reuse. It also provides comprehensive descriptions of representative methods in each category and some examples to aspire to give developers or users, who require a tracker for a certain application, the ability to select the most suitable tracking algorithm for their particular needs

DigitalCommons@Robert W. Woodruff Library

IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS

Author: Dondera Radu
Publication venue
Publication date: 01/01/2013
Field of study

We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information. We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter. We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP. The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy. Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time

Automatic object classification for surveillance videos.

Author: Fernandez Arguedas Virginia
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2012
Field of study

PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

BigSQLTraj: A SQL-extended framework for storing & querying big mobility data

Author: Petrou Petros
Πέτρου Πέτρος
Publication venue
Publication date: 01/01/2019
Field of study

Τα τελευταία χρόνια, λόγω της ευρείας χρήση αισθητήρων και έξυπνων συσκευών, παρατηρείται μια εκθετική παραγωγή δεδομένων κίνησης, που εντάσσονται στην κατηγορία δεδομένα μεγάλης κλίμακας (big data). Για παράδειγμα εφαρμογές δρομολόγησης, παρακολούθηση κυκλοφοριακής ροής, έλεγχος στόλου ακόμη και προβλέψεις ή αποφυγή κινδύνων βασίζονται στην επεξεργασία χωρικών και χωροχρονικών δεδομένων. Τα δεδομένα αυτά πρέπει να αποθηκεύονται και να επεξεργάονται κατάλληλα ώστε στη συνέχεια να αποτελέσουν γνώση για τους οργανισμούς. Προφανώς η διδακασία αυτή απαιτεί συστήματα και τεχνολογίες κατάλληλες για τον μεγάλο όγκο δεδομένων εισόδου. Στην παρούσα διπλωματική εργασία χρησιμοποιήσαμε δεδομένων από κινήσεις πλοίων και πιο συγκεκριμένα δεδομένα που παράγονται από το automatic identification system (AIS). Για τους σκοπούς της συγκεκριμένης διπλωματικής εργασίας αναπτύχθηκε το σύστημα BigSQLTraj: Ένα πλαίσιο βασισμένο σε SQL για την αποθήκευση και επερώτηση μεγάλων δεδομένων απο κινούμενα αντικείμενα. Οι εφαρμογές μεγάλων δεδομένων περιλαμβάνουν τα επίπεδα διαχείρισης, επεξεργασίας, αναλυτικές και οπτικοποίησης δεδομένων απο ετερογενής πηγές ή σε ιστορικά δεδομένα ή σε δεδομένα ροών. Στην παρούσα διπλωματική εργασία εξετάζουμε τα επίπεδα διαχείρισης και επεξεργασίας μεγάλων ιστορικών δεδομένων. Στόχος του συστήματος είναι να παρέχει την δυνατότητα σε χρήστες να αποθηκεύουν και να επεξεργάζονται με αποδοτικό τρόπο μεγάλα γεωχωρικά και χωροχρονικά δεδομένα πάνω από ένα κατενεμημένο σύστημα επεκτείνωντας ή αναπαράγοντας μεθόδους και αλγορίθμους από ήδη υπάρχοντα συστήματα. Πρώτος στόχος της εργασίας είναι να επιλεχθούν εργαλία που θα μπορούν να επικοινωνούν μεταξύ τους και θα παρουσιάζουν μια ενιαία εικόνα στους εξωτερικούς χρήστες. Οι καινοτομίες που παρέχει το σύστημα είναι η δημιουργία μεθόδων για ισοκατανεμημένη, αλλά ταυτόχρονα βασισμένη στην ομοιότητα, διαμέριση των δεδομένων στους κόμβους της συστάδας υπολογιστών, η δημιουργία μιας SQL διεπαφής στο κατανεμημένο σύστημα που θα παρέχει εξελιγμένες μεθόδους για την επεξεργασία των αποθηκευμένων δεδομένων και θα επιτρέπει σε συστήματα που ήδη αλληλεπιδρούν με συστήματα βασισμένα σε SQL να μεταφερθούν σε τεχνολογίες μεγάλων δεδομένων με τις ελάχιστες δυνατές αλλαγές. Πρώτος στόχος της παρούσας διπλωματικής εργασίας είναι η ενσωμάτωση (integration) διάφορων τεχνολογιών. Η υλοποίηση της παρούσας διπλωματικής βασίζεται σε βιβλιοθήκες ανοιχτού κώδικα για επεξεργασία μεγάλων δεδομένων. Οι βιβλιοθήκες αυτές είναι: Apache Hadoop, Apache Spark, Apache Hive και Apache Tez. Οι βασικότερες λειτουργίες που παρέχει η βιβλιοθήκη Apache Hadoop είναι το κατανεμημένο σύστημα αρχείων (Hadoop Distributed File System) που γράφονται και διαβάζονται τα δεδομένα. Επιπλέον ο διαχειριστής πόρων του Apache Hadoop (Yarn - resource manager) που ελέγχει το φόρτο εργασίας των υπολογιστών της συστάδας και αναθέτει τις διεργασίες που πρέπει να εκτελεστούν. Τα δύο αυτά εργαλεία είναι αποτελούν τον πυλώνα τις ενσωμάτωσης μεταξύ των υπολογιστών της συστάδας αλλά και των βιβλιοθηκών που τρέχουν στη συστάδα. Η βιβλιοθήκη Apache Spark, μέσω του προγραμματιστικού πλασίου MapReduce, παρέχει την λειτουργία την επεξεργασίας είτε σε ιστορικά δεδομένα είτε σε ροές δεδομένων και την αποθηκευσή τους στο κατανεμημένο σύστημα αρχείων του Hadoop. Στη συνέχεια το Apache Hive μας δίνει την δυνατότητα για εκτέλεση ερωτήματων σε αρχεία που βρίσκονται στο κατανεμημένο σύστημα αρχείων του Hadoop μέσω της HiveQL γλώσσας που είναι ισοδύναμη με της παραδοσιακή SQL, ενώ οι βιβλιοθήκες Apache Spark και Apache Tez αποτελούν την μηχανή εκτέλεσης (execution engine) ενός HiveQL ερωτήματος και μεταφράζουν την επερώτηση σε MapReduce διαδικασία. Κανένα από τα παραπάνω συστήματα δεν έχει την δυνατότητα επεξεργασίας γεωχωρικών ή δεδομένων κίνησης στην βασική του εκδοχή. Οι προθήκες που έγιναν περιλαμβάνουν: 1)δημιουργία συναρτήσεων για τον καθαρισμό χωροχρονικών σημείων και δημιουργία τροχιών κινούμενων αντικειμένων από τα σημεία αυτά με την βιβλιοθήκη Apache Spark, 2)χωροχρονικός καταμερισμός των τροχιών στους υπολογιστές της συστάδας, δημιουργία ευρετηρίων. Τα ευρετήρια περιλαμβανουν την χωροχρονική έκταση της διαμιρασμένης πληροφορίας και μια κωδικοποίηση βασισμένη σε τρισδιάστατα τοπικά ευρετήρια βάσει της πληροφορίας που έχει κάθε υπολογιστής με χρήση των βιβλιοθηκών Apache Spark και Apache Hadoop, 3) Δημιουργία κατάλληλων μεθόδων, για την αξιοποίηση της αποθήκευσης τους προηγούμενου βήματος, για επερωτήσης διαστήματος (range queries) και επερωτήσεων ομοιότητας (kNN queries). H σύγκριση που πραγματοποιήσαμε αφορά τη χρονική απóδοση των επερωτήσεων διαστήματος (range queries) και επερωτήσεων ομοιότητας (kNN queries), βάσει του τρόπου αποθήκευσης των δεδομένων όπως αναφέρθηκε προηγουμένως. Σε πρώτη φάση συγκρίναμε την χρονική διάρκεια ολοκλήρωσης των παραπάνω ερωτημάτων για τους διαθέσιμους τρόπους αποθήκευσης και για τους διαθέσιμους μηχανισμούς εκτέλεσης συναρτήσει του αριθμού των υπολογιστών που τρέχουν στο κατανεμημένο σύστημα (scalability). Στη συνέχεια συγκρίναμε την χρονική διάρκεια ολοκλήρωσης των παραπάνω ερωτημάτων για τους διαθέσιμους τρόπους αποθήκευσης και για τους διαθέσιμους μηχανισμούς εκτέλεσης συναρτήσει του όγκου δεδομένων (speed-up), αυξάνοντας σε κάθε βήμα των όγκο δεδομένων. Τα αποτελέσματα μας έδειξαν ότι ο πιο αποδοτικός τρόπος εκτέλεσης των ερωτημάτων με τη χρήση ενός ευρετηρίου για την διαμιρασμένη πληροφορία και στην συνέχεια η χρήση μιας κωδικοποίησης βασισμένη σε τοπικά ευρετήρια για την ανάκτηση του τελικού αποτελέσματος με μηχανισμό εκτέλεσης τη βιβλιοθήκη Apache Spark.Last decades, the need for performing advanced queries over massively produced data, such as mobility traces, in efficient and scalable ways is particularly important. This thesis describes BigSQLTraj a framework that supports efficient storing, partitioning, indexing and querying on spatial and spatio-temporal (i.e. mobility) data over a distributed engine. Every big data end-to-end application is consists of four layers, data management, data processing, data analytics and data visualization for heterogeneous data sources for batch or streaming data. This thesis focuses on data management and data processing for historical data. The first goal is finding systems that offers ready-to-use integration pipelines to take advantage of the best operation of each tool. For our implementation we chose open source big data frameworks such as Apache Hadoop, Apache Spark, Apache Hive and Apache Tez. Apache Hadoop and especially its distributed file system (HDFS) allowed all the other libraries to have a common read and write layer. On the other hand Hadoop's Resource Manager (Yarn) exploits the all the available computer resource. BigSQLTraj extending the functionality of existing spatial or spatio-temporal systems, centralized or distributed, to create two core and independent components. The first component is responsible for storing, spatiotemporal partitioning and indexing the data into a distributed file system and it is implemented on-top of Apache Spark. Many spatio-temporal partitioners and a 3D-STRtree index are implemented to support a collection of operators apart from existing partitioners and indexing methods that inherit from state-of-the-art distributed spatial and spatiotemporal systems. The second component is a distributed sql engine. He extend the functionality of HiveQL in order to achieve rapid access in such kind of data (i.e. geospatial and mobility data) and storing. Our final goal is optimizing Hive's join procedure that is required for both query types using the data structures from the first toolbox. We demonstrate the functionality of our approach and we conduct an extensive experimental study based on state-of-the-art benchmarks for mobility data. Our benchmark focuses on the total execution time of range queries and kNN queries based on the data storing model. At first we compare the temporal performance of different storing alternatives and execution engines for the entire dataset and vary the number of workers in order to review the systems scalability. Furthermore, we vary the size of our dataset and measure the execution time of the queries. To study the effect of dataset size, we split the original dataset into 5 chunks (20%, 40%, 60%, 80%, 100%). Βased on the results we come to the conclusion that the best workflow includes a global index structure for workers metadata and a local index-based encoding for storing the entire trajectories of a partition into a single column and the execution time seems to follow linear behaviour