324 research outputs found
Efficient Large-scale Distance-Based Join Queries in SpatialHadoop
Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the ε Distance Join Query (εDJQ). These types of join queries are characterized by a number of desired pairs (K) or a distance threshold (ε) between the components of the pairs in the final result, over two spatial datasets. Both are expensive operations, since two spatial datasets are combined with additional constraints. Given the increasing volume of spatial data originating from multiple sources and stored in distributed servers, it is not always efficient to perform DBJQs on a centralized server. For this reason, this paper addresses the problem of computing DBJQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports efficient processing of spatial queries in a cloud-based setting. We propose novel algorithms, based on plane-sweep, to perform efficient parallel DBJQs on large-scale spatial datasets in Spatial Hadoop. We evaluate the performance of the proposed algorithms in several situations with large real-world as well as synthetic datasets. The experiments demonstrate the efficiency and scalability of our proposed methodologies
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques
People spend a significant amount of time in indoor spaces (e.g., office
buildings, subway systems, etc.) in their daily lives. Therefore, it is
important to develop efficient indoor spatial query algorithms for supporting
various location-based applications. However, indoor spaces differ from outdoor
spaces because users have to follow the indoor floor plan for their movements.
In addition, positioning in indoor environments is mainly based on sensing
devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot
apply existing spatial query evaluation techniques devised for outdoor
environments for this new challenge. Because Bayesian filtering techniques can
be employed to estimate the state of a system that changes over time using a
sequence of noisy measurements made on the system, in this research, we propose
the Bayesian filtering-based location inference methods as the basis for
evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two
novel models, indoor walking graph model and anchor point indexing model, are
created for tracking object locations in indoor environments. Based on the
inference method and tracking models, we develop innovative indoor range and k
nearest neighbor (kNN) query algorithms. We validate our solution through use
of both synthetic data and real-world data. Our experimental results show that
the proposed algorithms can evaluate indoor spatial queries effectively and
efficiently. We open-source the code, data, and floor plan at
https://github.com/DataScienceLab18/IndoorToolKit
- …