856 research outputs found

    Efficient bulk-loading methods for temporal and multidimensional index structures

    Get PDF
    Nahezu alle naturwissenschaftlichen Bereiche profitieren von neuesten Analyse- und Verarbeitungsmethoden für große Datenmengen. Diese Verfahren setzten eine effiziente Verarbeitung von geo- und zeitbezogenen Daten voraus, da die Zeit und die Position wichtige Attribute vieler Daten sind. Die effiziente Anfrageverarbeitung wird insbesondere durch den Einsatz von Indexstrukturen ermöglicht. Im Fokus dieser Arbeit liegen zwei Indexstrukturen: Multiversion B-Baum (MVBT) und R-Baum. Die erste Struktur wird für die Verwaltung von zeitbehafteten Daten, die zweite für die Indexierung von mehrdimensionalen Rechteckdaten eingesetzt. Ständig- und schnellwachsendes Datenvolumen stellt eine große Herausforderung an die Informatik dar. Der Aufbau und das Aktualisieren von Indexen mit herkömmlichen Methoden (Datensatz für Datensatz) ist nicht mehr effizient. Um zeitnahe und kosteneffiziente Datenverarbeitung zu ermöglichen, werden Verfahren zum schnellen Laden von Indexstrukturen dringend benötigt. Im ersten Teil der Arbeit widmen wir uns der Frage, ob es ein Verfahren für das Laden von MVBT existiert, das die gleiche I/O-Komplexität wie das externe Sortieren besitz. Bis jetzt blieb diese Frage unbeantwortet. In dieser Arbeit haben wir eine neue Kostruktionsmethode entwickelt und haben gezeigt, dass diese gleiche Zeitkomplexität wie das externe Sortieren besitzt. Dabei haben wir zwei algorithmische Techniken eingesetzt: Gewichts-Balancierung und Puffer-Bäume. Unsere Experimenten zeigen, dass das Resultat nicht nur theoretischer Bedeutung ist. Im zweiten Teil der Arbeit beschäftigen wir uns mit der Frage, ob und wie statistische Informationen über Geo-Anfragen ausgenutzt werden können, um die Anfrageperformanz von R-Bäumen zu verbessern. Unsere neue Methode verwendet Informationen wie Seitenverhältnis und Seitenlängen eines repräsentativen Anfragerechtecks, um einen guten R-Baum bezüglich eines häufig eingesetzten Kostenmodells aufzubauen. Falls diese Informationen nicht verfügbar sind, optimieren wir R-Bäume bezüglich der Summe der Volumina von minimal umgebenden Rechtecken der Blattknoten. Da das Problem des Aufbaus von optimalen R-Bäumen bezüglich dieses Kostenmaßes NP-hart ist, führen wir zunächst das Problem auf ein eindimensionales Partitionierungsproblem zurück, indem wir die Daten bezüglich optimierte raumfüllende Kurven sortieren. Dann lösen wir dieses Problem durch Einsatz vom dynamischen Programmieren. Die I/O-Komplexität des Verfahrens ist gleich der von externem Sortieren, da die I/O-Laufzeit der Methode durch die Laufzeit des Sortierens dominiert wird. Im letzten Teil der Arbeit haben wir die entwickelten Partitionierungsvefahren für den Aufbau von Geo-Histogrammen eingesetzt, da diese ähnlich zu R-Bäumen eine disjunkte Partitionierung des Raums erzeugen. Ergebnisse von intensiven Experimenten zeigen, dass sich unter Verwendung von neuen Partitionierungstechniken sowohl R-Bäume mit besserer Anfrageperformanz als auch Geo-Histogrammen mit besserer Schätzqualität im Vergleich zu Konkurrenzverfahren generieren lassen

    Star-ND (Multi-Dimensional Star-Identification)

    Get PDF
    In order to perform star-identification with lower processing requirements, multi-dimensional techniques are implemented in this research as a database search as well as to create star pattern parameters. New star pattern parameters are presented which produce a well-distributed database, required by the database search algorithm to achieve the fastest performance. To mitigate problems introduced by the star pattern selection, incorrect entries are added to the database, which reduces the number of iterations of the run-time algorithm. The associated algorithms, star pattern parameters, and database preparation are collectively referred to as Multi-dimensional Star-Identification (Star-ND). The star pattern parameters developed may also be extended to star patterns with an arbitrarily large number of stars, while retaining the well-distributed property. The algorithm is contrasted with the current state-of-the-art star-ID algorithm, Pyramid. The database is found to grow linearly with the size of the star catalog, while Pyramid's database grows quadratically. The running time of Star-ND is found to be on average a factor of 25 times faster than the time for Pyramid

    Digital photo album management techniques: from one dimension to multi-dimension.

    Get PDF
    Lu Yang.Thesis submitted in: November 2004.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 96-103).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Our Contributions --- p.3Chapter 1.3 --- Thesis Outline --- p.5Chapter 2 --- Background Study --- p.7Chapter 2.1 --- MPEG-7 Introduction --- p.8Chapter 2.2 --- Image Analysis in CBIR Systems --- p.11Chapter 2.2.1 --- Color Information --- p.13Chapter 2.2.2 --- Color Layout --- p.19Chapter 2.2.3 --- Texture Information --- p.20Chapter 2.2.4 --- Shape Information --- p.24Chapter 2.2.5 --- CBIR Systems --- p.26Chapter 2.3 --- Image Processing in JPEG Frequency Domain --- p.30Chapter 2.4 --- Photo Album Clustering --- p.33Chapter 3 --- Feature Extraction and Similarity Analysis --- p.38Chapter 3.1 --- Feature Set in Frequency Domain --- p.38Chapter 3.1.1 --- JPEG Frequency Data --- p.39Chapter 3.1.2 --- Our Feature Set --- p.42Chapter 3.2 --- Digital Photo Similarity Analysis --- p.43Chapter 3.2.1 --- Energy Histogram --- p.43Chapter 3.2.2 --- Photo Distance --- p.45Chapter 4 --- 1-Dimensional Photo Album Management Techniques --- p.49Chapter 4.1 --- Photo Album Sorting --- p.50Chapter 4.2 --- Photo Album Clustering --- p.52Chapter 4.3 --- Photo Album Compression --- p.56Chapter 4.3.1 --- Variable IBP frames --- p.56Chapter 4.3.2 --- Adaptive Search Window --- p.57Chapter 4.3.3 --- Compression Flow --- p.59Chapter 4.4 --- Experiments and Performance Evaluations --- p.60Chapter 5 --- High Dimensional Photo Clustering --- p.67Chapter 5.1 --- Traditional Clustering Techniques --- p.67Chapter 5.1.1 --- Hierarchical Clustering --- p.68Chapter 5.1.2 --- Traditional K-means --- p.71Chapter 5.2 --- Multidimensional Scaling --- p.74Chapter 5.2.1 --- Introduction --- p.75Chapter 5.2.2 --- Classical Scaling --- p.77Chapter 5.3 --- Our Interactive MDS-based Clustering --- p.80Chapter 5.3.1 --- Principal Coordinates from MDS --- p.81Chapter 5.3.2 --- Clustering Scheme --- p.82Chapter 5.3.3 --- Layout Scheme --- p.84Chapter 5.4 --- Experiments and Results --- p.87Chapter 6 --- Conclusions --- p.94Bibliography --- p.9

    The Plant Propagation Algorithm for Discrete Optimisation

    Get PDF
    The thesis is concerned with novel Nature-Inspired heuristics for the so called NP-hard problems of optimisation. A particular algorithm which has been recently introduced and shown to be effective in continuous optimisation is the Plant Propagation Algorithm or PPA. Here, we intend to extend it to cope with combinatorial optimisation. In order to show that our extension is viable and effective, we consider three types of problems which are good representatives of the whole topic. These are the Travelling Salesman Problem or TSP, the Knapsack Problem or KP and the scheduling problem of Berth Allocation as arises in container ports or BAP. Because PPA is a population-based search heuristic, we devote a chapter to the important issue of generating good and yet computationally relatively light initial populations of solutions to kick start the search process. In the case of the TSP we revisit and extend the Strip Algorithm (SA). We introduce the 2-Part SA and show that it is better than the classical SA. We also introduce new variants such as the Adaptive SA and the Spiral SA which cope with clustered cities and instances with cities concentrated around the center of the unit square, respectively. In the case of KP we adapt the Roulette Wheel selection approach to generate solutions to start with PPA. And in the case of BAP, we introduce a number of simple heuristics which consider a schedule as a flat box with one side being the processing time and the other the position of vessels on the wharf. The heuristics try to generate schedules by avoiding overlap as much as possible. All approaches and algorithms are implemented and tested against well established algorithms. The results are recorded and discussed extensively. The thesis ends with a conclusion and ideas for further research

    An Effective Approach to Predicting Large Dataset in Spatial Data Mining Area

    Get PDF
    Due to enormous quantities of spatial satellite images, telecommunication images, health related tools etc., it is often impractical for users to have detailed and thorough examination of spatial data (S). Large dataset is very common and pervasive in a number of application areas. Discovering or predicting patterns from these datasets is very vital. This research focused on developing new methods, models and techniques for accomplishing advanced spatial data mining (ASDM) tasks. The algorithms were designed to challenge state-of-the-art data technologies and they are tested with randomly generated and actual real-world data. Two main approaches were adopted to achieve the objectives (1) identifying the actual data types (DTs), data structures and spatial content of a given dataset (to make our model versatile and robust) and (2) integrating these data types into an appropriate database management system (DBMS) framework, for easy management and manipulation. These two approaches helped to discover the general and varying types of patterns that exist within any given dataset non-spatial, spatial or even temporal (because spatial data are always influenced by temporal agents) datasets. An iterative method was adopted for system development methodology in this study. The method was adopted as a strategy to combat the irregularity that often exists within spatial datasets. In the course of this study, some of the challenges we encountered which also doubled as current challenges facing spatial data mining includes: (a) time complexity in availing useful data for analysis, (b) time complexity in loading data to storage and (c) difficulties in discovering spatial, non-spatial and temporal correlations between different data objects. However, despite the above challenges, there are some opportunities that spatial data can benefit from including: Cloud computing, Spark technology, Parallelisation, and Bulk-loading methods. Techniques and application areas of spatial data mining (SDM) were identified and their strength and limitations were equally documented. Finally, new methods and algorithms for mining very large data of spatial/non-spatial bias were created. The proposed models/systems are documented in the sections as follows: (a) Development of a new technique for parallel indexing of large dataset (PaX-DBSCAN), (b) Development of new techniques for clustering (X-DBSCAN) in a learning process, (c) Development of a new technique for detecting human skin in an image, (d) Development of a new technique for finding face in an image, (e) Development of a novel technique for management of large spatial and non-spatial datasets (aX-tree). The most prominent among our methods is the new structure used in (c) above -- packed maintained k-dimensional tree (Pmkd-tree), for fast spatial indexing and querying. The structure is a combination system that combines all the proposed algorithms to produce one solid, standard, useful and quality system. The intention of the new final algorithm (system) is to combine the entire initial proposed algorithms to come up with one strong generic effective tool for predicting large dataset SDM area, which it is capable of finding patterns that exist among spatial or non-spatial objects in a DBMS. In addition to Pmkd-tree, we also implemented a novel spatial structure, packed quad-tree (Pquad-Tree), to balance and speed up the performance of the regular quad-tree. Our systems so far have shown a manifestation of efficiency in terms of performance, storage and speed. The final Systems (Pmkd-tree and Pquad-Tree) are generic systems that are flexible, robust, light and stable. They are explicit spatial models for analysing any given problem and for predicting objects as spatially distributed events, using basic SDM algorithms. They can be applied to pattern matching, image processing, computer vision, bioinformatics, information retrieval, machine learning (classification and clustering) and many other computational tasks

    Hardware acceleration of similarity queries using graphic processor units

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 93-103A Graphic Processing Unit (GPU) is primarily designed for real-time rendering. In contrast to a Central Processing Unit (CPU) that have complex instructions and a limited number of pipelines, a GPU has simpler instructions and many execution pipelines to process vector data in a massively parallel fashion. In addition to its regular tasks, GPU instruction set can be used for performing other types of general-purpose computations as well. Several frameworks like Brook+, ATI CAL, OpenCL, and Nvidia Cuda have been proposed to utilize computational power of the GPU in general computing. This has provided interest and opportunities for accelerating different types of applications. This thesis explores ways of taking advantage of the GPU in the field of metric space-based similarity searching. The KVP index structure has a simple organization that lends itself to be easily processed in parallel, in contrast to tree-based structures that requires frequent ”pointer chasing” operations. Several implementations using the general purpose GPU programming frameworks (Brook+, ATI CAL and OpenCL) based on the ATI platform are provided. Experimental results of these implementations show that the GPU versions presented in this work are several times faster than the CPU versions.Genç, AtillaM.S

    Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)

    Full text link
    The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.Comment: This is the second edition of the boo
    • …
    corecore