Search CORE

163 research outputs found

A computationally efficient framework for large-scale distributed fingerprint matching

Author: Muhammad Atif
Publication venue
Publication date: 01/01/2017
Field of study

A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science, School of Computer Science and Applied Mathematics. May 2017.Biometric features have been widely implemented to be utilized for forensic and civil applications. Amongst many diﬀerent kinds of biometric characteristics, the ﬁngerprint is globally accepted and remains the mostly used biometric characteristic by commercial and industrial societies due to its easy acquisition, uniqueness, stability and reliability. There are currently various eﬀective solutions available, however the ﬁngerprint identiﬁcation is still not considered a fully solved problem mainly due to accuracy and computational time requirements. Although many of the ﬁngerprint recognition systems based on minutiae provide good accuracy, the systems with very large databases require fast and real time comparison of ﬁngerprints, they often either fail to meet the high performance speed requirements or compromise the accuracy. For ﬁngerprint matching that involves databases containing millions of ﬁngerprints, real time identiﬁcation can only be obtained through the implementation of optimal algorithms that may utilize the given hardware as robustly and efﬁciently as possible. There are currently no known distributed database and computing framework available that deal with real time solution for ﬁngerprint recognition problem involving databases containing as many as sixty million ﬁngerprints, the size which is close to the size of the South African population. This research proposal intends to serve two main purposes: 1) exploit and scale the best known minutiae matching algorithm for a minimum of sixty million ﬁngerprints; and 2) design a framework for distributed database to deal with large ﬁngerprint databases based on the results obtained in the former item.GR201

Wits Institutional Repository on DSPACE

Doctor of Philosophy

Author: Sun Weibin
Publication venue: University of Utah
Publication date: 01/08/2014
Field of study

dissertationAs the base of the software stack, system-level software is expected to provide ecient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What's more, today's application workloads have entered gigabyte and terabyte scales, which demand even more computing power. To solve the rapidly increased computing power demand at the system level, this dissertation proposes using parallel graphics pro- cessing units (GPUs) in system software. GPUs excel at parallel computing, and also have a much faster development trend in parallel performance than central processing units (CPUs). However, system-level software has been originally designed to be latency-oriented. GPUs are designed for long-running computation and large-scale data processing, which are throughput-oriented. Such mismatch makes it dicult to t the system-level software with the GPUs. This dissertation presents generic principles of system-level GPU computing developed during the process of creating our two general frameworks for integrating GPU computing in storage and network packet processing. The principles are generic design techniques and abstractions to deal with common system-level GPU computing challenges. Those principles have been evaluated in concrete cases including storage and network packet processing applications that have been augmented with GPU computing. The signicant performance improvement found in the evaluation shows the eectiveness and eciency of the proposed techniques and abstractions. This dissertation also presents a literature survey of the relatively young system-level GPU computing area, to introduce the state of the art in both applications and techniques, and also their future potentials

The University of Utah: J. Willard Marriott Digital Library

Weiterentwicklung analytischer Datenbanksysteme

Author: Kipf Andreas Michael
Publication venue: Technische Universität München
Publication date
Field of study

This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens

QCLab: a framework for query compilation on modern hardware platforms

Author: Funke Henning
Publication venue
Publication date: 01/01/2022
Field of study

As modern in-memory database systems achieve higher and higher processing speeds, the performance of memory becomes an increasingly limiting factor. Although there has been significant progress, the bottleneck only has shifted. While earlier systems were optimized for memory latencies, current systems are rather affected by the limited memory bandwidth. Query compilation is a proven technique to address bandwidth limitations. It translates queries via Just-In-Time compilation to native programs for the target hardware. The compiled queries execute with very high efficiency and only with a bare minimum of communication via memory. Despite these important improvements, the benefit of query compilation in certain scenarios is limited. On the one hand query compilers typically use standard compiler technology with relatively long compilation times. Therefore the overall execution time can be prolonged by the additional compilation time. On the other hand, not all emerging database technology is compatible with the approach. Query compilation uses a tuple-at-a-time processing style that departs from the column-at-a-time or vector-at- a-time approaches that in-memory systems typically use. Especially data-parallel processing techniques, e.g. SIMD or coprocessing-techniques, are challenging to use in combination with the approach. This work presents QCLab, a framework for query compilation on modern hardware platforms. The framework contains several new query compilation techniques that allow us to address the mentioned shortcomings and ultimately to extend the benefit of query compilation to new workloads and platforms. The techniques cover three aspects: compilation, communication, and processing. Together they serve as basis for building highly efficient query compilers. The techniques make efficient use of communication channels and of the large processing capacities of modern systems. They were designed for practical use and enable efficient processing, even when workload characteristics are challenging

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Runtime Management of Dynamic Dataflows with Partially Reconfigurable Pipelines on FPGAs

Author: Mätas Kaspar
Publication venue
Publication date: 31/12/2023
Field of study

The University of Manchester - Institutional Repository

Survey of Vector Database Management Systems

Author: Li Guoliang
Pan James Jie
Wang Jianguo
Publication venue
Publication date: 21/10/2023
Field of study

There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.Comment: 25 page

arXiv.org e-Print Archive

Just-in-time Analytics Over Heterogeneous Data and Hardware

Author: Karpathiotakis Manolis
Publication venue: Lausanne, EPFL
Publication date: 28/11/2017
Field of study

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

Infoscience - École polytechnique fédérale de Lausanne