7,309 research outputs found
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
Benchmarking and improving point cloud data management in MonetDB
The popularity, availability and sizes of point cloud data sets are increasing, thus raising interesting data
management and processing challenges. Various software solutions are available for the management of
point cloud data. A benchmark for point cloud data management systems was defined and it was executed
for several solutions. In this paper we focus on the solutions based on the column-store MonetDB, the
generic out-of-the-box approach is compared with two alternative approaches that exploit the spatial
coherence of the data to improve the data access and to minimize the storage requirement
Large spatial datasets: Present Challenges, future opportunities
The key advantages of a well-designed multidimensional database is its ability to allow as many users as possible across an organisation to simultaneously gain access and view of the same data. Large spatial datasets evolve from scientific activities (from recent days) that tends to generate large databases which always come in a scale nearing terabyte of data size and in most cases are multidimensional. In this paper, we look at the issues pertaining to large spatial datasets; its feature (for example views), architecture, access methods and most importantly design technologies. We also looked at some ways of possibly improving the performance of some of the existing algorithms for managing large spatial datasets. The study reveals that the major challenges militating against effective management of large spatial datasets is storage utilization and computational complexity (both of which are characterised by the size of spatial big data which now tends to exceeds the capacity of commonly used spatial computing systems owing to their volume, variety and velocity). These problems fortunately can be combated by employing functional programming method or parallelization techniques
Trajectory-Based Spatiotemporal Entity Linking
Trajectory-based spatiotemporal entity linking is to match the same moving
object in different datasets based on their movement traces. It is a
fundamental step to support spatiotemporal data integration and analysis. In
this paper, we study the problem of spatiotemporal entity linking using
effective and concise signatures extracted from their trajectories. This
linking problem is formalized as a k-nearest neighbor (k-NN) query on the
signatures. Four representation strategies (sequential, temporal, spatial, and
spatiotemporal) and two quantitative criteria (commonality and unicity) are
investigated for signature construction. A simple yet effective dimension
reduction strategy is developed together with a novel indexing structure called
the WR-tree to speed up the search. A number of optimization methods are
proposed to improve the accuracy and robustness of the linking. Our extensive
experiments on real-world datasets verify the superiority of our approach over
the state-of-the-art solutions in terms of both accuracy and efficiency.Comment: 15 pages, 3 figures, 15 table
- …