303 research outputs found
A distributed multi-threaded data partitioner with space-filling curve orders
The problem discussed in this thesis is distributed data partitioning and data re-ordering on many-core architectures. We present extensive literature survey, with examples from various application domains - scientific computing, databases and large-scale graph processing. We propose a low-overhead partitioning framework based on geometry, that can be used to partition multi-dimensional data where the number of dimensions is >=2. The partitioner linearly orders items with good spatial locality. Partial output is stored on each process in the communication group. Space-filling curves are used to permute data - Morton order is the default curve. For dimensions <=3, we have options to generate Hilbert-like curves. Two metrics used to determine partitioning overheads are memory consumption and execution time, although these two factors are dependent on each other. The focus of this thesis is to reduce partitioning overheads as much as possible. We have described several optimizations to this end - incremental adjustments to partitions, careful dynamic memory management and using multi-threading and multi-processing to advantage. The quality of partitions is an important criteria for evaluating a partitioner. We have used graph partitioners as base-implementations against which our partitions are compared. The degree and edge-cuts of our partitions are comparable to graph partitions for regular grids. For irregular meshes, there is still room for improvement. No comparisons have been made for evaluating partitions of datasets without edges. We have deployed these partitions on two large applications - atmosphere simulation in 2D and adaptive mesh refinement in 3D. An adaptive mesh refinement benchmark was built to be part of the framework, which later became a testcase for evaluating partitions and load-balancing schemes. The performance of this benchmark is discussed in detail in the last chapter
Extending functional databases for use in text-intensive applications
This thesis continues research exploring the benefits of using functional
databases based around the functional data model for advanced database
applications-particularly those supporting investigative systems. This is a
growing generic application domain covering areas such as criminal and military
intelligence, which are characterised by significant data complexity, large data
sets and the need for high performance, interactive use. An experimental
functional database language was developed to provide the requisite semantic
richness. However, heavy use in a practical context has shown that language
extensions and implementation improvements are required-especially in the
crucial areas of string matching and graph traversal. In addition, an
implementation on multiprocessor, parallel architectures is essential to meet the
performance needs arising from existing and projected database sizes in the
chosen application area. [Continues.
Weiterentwicklung analytischer Datenbanksysteme
This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens
Polylidar3D -- Fast Polygon Extraction from 3D Data
Flat surfaces captured by 3D point clouds are often used for localization,
mapping, and modeling. Dense point cloud processing has high computation and
memory costs making low-dimensional representations of flat surfaces such as
polygons desirable. We present Polylidar3D, a non-convex polygon extraction
algorithm which takes as input unorganized 3D point clouds (e.g., LiDAR data),
organized point clouds (e.g., range images), or user-provided meshes.
Non-convex polygons represent flat surfaces in an environment with interior
cutouts representing obstacles or holes. The Polylidar3D front-end transforms
input data into a half-edge triangular mesh. This representation provides a
common level of input data abstraction for subsequent back-end processing. The
Polylidar3D back-end is composed of four core algorithms: mesh smoothing,
dominant plane normal estimation, planar segment extraction, and finally
polygon extraction. Polylidar3D is shown to be quite fast, making use of CPU
multi-threading and GPU acceleration when available. We demonstrate
Polylidar3D's versatility and speed with real-world datasets including aerial
LiDAR point clouds for rooftop mapping, autonomous driving LiDAR point clouds
for road surface detection, and RGBD cameras for indoor floor/wall detection.
We also evaluate Polylidar3D on a challenging planar segmentation benchmark
dataset. Results consistently show excellent speed and accuracy.Comment: 40 page
Efficient Algorithms for Large-Scale Image Analysis
This work develops highly efficient algorithms for analyzing large images. Applications include object-based change detection and screening. The algorithms are 10-100 times as fast as existing software, sometimes even outperforming FGPA/GPU hardware, because they are designed to suit the computer architecture. This thesis describes the implementation details and the underlying algorithm engineering methodology, so that both may also be applied to other applications
- …