1,237 research outputs found

    Continuous Probabilistic Nearest-Neighbor Queries for Uncertain Trajectories

    Get PDF
    This work addresses the problem of processing continuous nearest neighbor (NN) queries for moving objects trajectories when the exact position of a given object at a particular time instant is not known, but is bounded by an uncertainty region. As has already been observed in the literature, the answers to continuous NN-queries in spatio-temporal settings are time parameterized in the sense that the objects in the answer vary over time. Incorporating uncertainty in the model yields additional attributes that affect the semantics of the answer to this type of queries. In this work, we formalize the impact of uncertainty on the answers to the continuous probabilistic NN-queries, provide a compact structure for their representation and efficient algorithms for constructing that structure. We also identify syntactic constructs for several qualitative variants of continuous probabilistic NN-queries for uncertain trajectories and present efficient algorithms for their processing. 1

    Data and knowledge engineering for medical image and sensor data

    Get PDF

    Advanced data structures for the interpretation of image and cartographic data in geo-based information systems

    Get PDF
    A growing need to usse geographic information systems (GIS) to improve the flexibility and overall performance of very large, heterogeneous data bases was examined. The Vaster structure and the Topological Grid structure were compared to test whether such hybrid structures represent an improvement in performance. The use of artificial intelligence in a geographic/earth sciences data base context is being explored. The architecture of the Knowledge Based GIS (KBGIS) has a dual object/spatial data base and a three tier hierarchial search subsystem. Quadtree Spatial Spectra (QTSS) are derived, based on the quadtree data structure, to generate and represent spatial distribution information for large volumes of spatial data

    Diversity based Relevance Feedback for Time Series Search

    Get PDF
    We propose a diversity based relevance feedback approach for time series data to improve the accuracy of search results. We first develop the concept of relevance feedback for time series based on dual-tree complex wavelet (CWT) and SAX based approaches. We aim to enhance the search quality by incorporating diversity in the results presented to the user for feedback. We then propose a method which utilizes the representation type as part of the feedback, as opposed to a human choosing based on a preprocessing or training phase. The proposed methods utilize a weighting to handle the relevance feedback of important properties for both single and multiple representation cases. Our experiments on a large variety of time series data sets show that the proposed diversity based relevance feedback improves the retrieval performance. Results confirm that representation feedback incorporates item diversity implicitly and achieves good performance even when using simple nearest neighbor as the retrieval method. To the best of our knowledge, this is the first study on diversification of time series search to improve retrieval accuracy and representation feedback. © 2013 VLDB Endowment

    Similarity processing in multi-observation data

    Get PDF
    Many real-world application domains such as sensor-monitoring systems for environmental research or medical diagnostic systems are dealing with data that is represented by multiple observations. In contrast to single-observation data, where each object is assigned to exactly one occurrence, multi-observation data is based on several occurrences that are subject to two key properties: temporal variability and uncertainty. When defining similarity between data objects, these properties play a significant role. In general, methods designed for single-observation data hardly apply for multi-observation data, as they are either not supported by the data models or do not provide sufficiently efficient or effective solutions. Prominent directions incorporating the key properties are the fields of time series, where data is created by temporally successive observations, and uncertain data, where observations are mutually exclusive. This thesis provides research contributions for similarity processing - similarity search and data mining - on time series and uncertain data. The first part of this thesis focuses on similarity processing in time series databases. A variety of similarity measures have recently been proposed that support similarity processing w.r.t. various aspects. In particular, this part deals with time series that consist of periodic occurrences of patterns. Examining an application scenario from the medical domain, a solution for activity recognition is presented. Finally, the extraction of feature vectors allows the application of spatial index structures, which support the acceleration of search and mining tasks resulting in a significant efficiency gain. As feature vectors are potentially of high dimensionality, this part introduces indexing approaches for the high-dimensional space for the full-dimensional case as well as for arbitrary subspaces. The second part of this thesis focuses on similarity processing in probabilistic databases. The presence of uncertainty is inherent in many applications dealing with data collected by sensing devices. Often, the collected information is noisy or incomplete due to measurement or transmission errors. Furthermore, data may be rendered uncertain due to privacy-preserving issues with the presence of confidential information. This creates a number of challenges in terms of effectively and efficiently querying and mining uncertain data. Existing work in this field either neglects the presence of dependencies or provides only approximate results while applying methods designed for certain data. Other approaches dealing with uncertain data are not able to provide efficient solutions. This part presents query processing approaches that outperform existing solutions of probabilistic similarity ranking. This part finally leads to the application of the introduced techniques to data mining tasks, such as the prominent problem of probabilistic frequent itemset mining.Viele Anwendungsgebiete, wie beispielsweise die Umweltforschung oder die medizinische Diagnostik, nutzen Systeme der Sensorüberwachung. Solche Systeme müssen oftmals in der Lage sein, mit Daten umzugehen, welche durch mehrere Beobachtungen repräsentiert werden. Im Gegensatz zu Daten mit nur einer Beobachtung (Single-Observation Data) basieren Daten aus mehreren Beobachtungen (Multi-Observation Data) auf einer Vielzahl von Beobachtungen, welche zwei Schlüsseleigenschaften unterliegen: Zeitliche Veränderlichkeit und Datenunsicherheit. Im Bereich der Ähnlichkeitssuche und im Data Mining spielen diese Eigenschaften eine wichtige Rolle. Gängige Lösungen in diesen Bereichen, die für Single-Observation Data entwickelt wurden, sind in der Regel für den Umgang mit mehreren Beobachtungen pro Objekt nicht anwendbar. Der Grund dafür liegt darin, dass diese Ansätze entweder nicht mit den Datenmodellen vereinbar sind oder keine Lösungen anbieten, die den aktuellen Ansprüchen an Lösungsqualität oder Effizienz genügen. Bekannte Forschungsrichtungen, die sich mit Multi-Observation Data und deren Schlüsseleigenschaften beschäftigen, sind die Analyse von Zeitreihen und die Ähnlichkeitssuche in probabilistischen Datenbanken. Während erstere Richtung eine zeitliche Ordnung der Beobachtungen eines Objekts voraussetzt, basieren unsichere Datenobjekte auf Beobachtungen, die sich gegenseitig bedingen oder ausschließen. Diese Dissertation umfasst aktuelle Forschungsbeiträge aus den beiden genannten Bereichen, wobei Methoden zur Ähnlichkeitssuche und zur Anwendung im Data Mining vorgestellt werden. Der erste Teil dieser Arbeit beschäftigt sich mit Ähnlichkeitssuche und Data Mining in Zeitreihendatenbanken. Insbesondere werden Zeitreihen betrachtet, welche aus periodisch auftretenden Mustern bestehen. Im Kontext eines medizinischen Anwendungsszenarios wird ein Ansatz zur Aktivitätserkennung vorgestellt. Dieser erlaubt mittels Merkmalsextraktion eine effiziente Speicherung und Analyse mit Hilfe von räumlichen Indexstrukturen. Für den Fall hochdimensionaler Merkmalsvektoren stellt dieser Teil zwei Indexierungsmethoden zur Beschleunigung von ähnlichkeitsanfragen vor. Die erste Methode berücksichtigt alle Attribute der Merkmalsvektoren, während die zweite Methode eine Projektion der Anfrage auf eine benutzerdefinierten Unterraum des Vektorraums erlaubt. Im zweiten Teil dieser Arbeit wird die Ähnlichkeitssuche im Kontext probabilistischer Datenbanken behandelt. Daten aus Sensormessungen besitzen häufig Eigenschaften, die einer gewissen Unsicherheit unterliegen. Aufgrund von Mess- oder übertragungsfehlern sind gemessene Werte oftmals unvollständig oder mit Rauschen behaftet. In diversen Szenarien, wie beispielsweise mit persönlichen oder medizinisch vertraulichen Daten, können Daten auch nachträglich von Hand verrauscht werden, so dass eine genaue Rekonstruktion der ursprünglichen Informationen nicht möglich ist. Diese Gegebenheiten stellen Anfragetechniken und Methoden des Data Mining vor einige Herausforderungen. In bestehenden Forschungsarbeiten aus dem Bereich der unsicheren Datenbanken werden diverse Probleme oftmals nicht beachtet. Entweder wird die Präsenz von Abhängigkeiten ignoriert, oder es werden lediglich approximative Lösungen angeboten, welche die Anwendung von Methoden für sichere Daten erlaubt. Andere Ansätze berechnen genaue Lösungen, liefern die Antworten aber nicht in annehmbarer Laufzeit zurück. Dieser Teil der Arbeit präsentiert effiziente Methoden zur Beantwortung von Ähnlichkeitsanfragen, welche die Ergebnisse absteigend nach ihrer Relevanz, also eine Rangliste der Ergebnisse, zurückliefern. Die angewandten Techniken werden schließlich auf Problemstellungen im probabilistischen Data Mining übertragen, um beispielsweise das Problem des Frequent Itemset Mining unter Berücksichtigung des vollen Gehalts an Unsicherheitsinformation zu lösen

    6 Access Methods and Query Processing Techniques

    Get PDF
    The performance of a database management system (DBMS) is fundamentally dependent on the access methods and query processing techniques available to the system. Traditionally, relational DBMSs have relied on well-known access methods, such as the ubiquitous B +-tree, hashing with chaining, and, in som

    Similarity search applications in medical images

    Get PDF

    DCMS: A data analytics and management system for molecular simulation

    Get PDF
    Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

    Design, implementation, and evaluation of scalable content-based image retrieval techniques.

    Get PDF
    Wong, Yuk Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 95-100).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgement --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Contribution --- p.3Chapter 1.3 --- Organization of This Work --- p.5Chapter 2 --- Literature Review --- p.6Chapter 2.1 --- Content-based Image Retrieval --- p.6Chapter 2.1.1 --- Query Technique --- p.6Chapter 2.1.2 --- Relevance Feedback --- p.7Chapter 2.1.3 --- Previously Proposed CBIR systems --- p.7Chapter 2.2 --- Invariant Local Feature --- p.8Chapter 2.3 --- Invariant Local Feature Detector --- p.9Chapter 2.3.1 --- Harris Corner Detector --- p.9Chapter 2.3.2 --- DOG Extrema Detector --- p.10Chapter 2.3.3 --- Harris-Laplacian Corner Detector --- p.13Chapter 2.3.4 --- Harris-Affine Covariant Detector --- p.14Chapter 2.4 --- Invariant Local Feature Descriptor --- p.15Chapter 2.4.1 --- Scale Invariant Feature Transform (SIFT) --- p.15Chapter 2.4.2 --- Shape Context --- p.17Chapter 2.4.3 --- PCA-SIFT --- p.18Chapter 2.4.4 --- Gradient Location and Orientation Histogram (GLOH) --- p.19Chapter 2.4.5 --- Geodesic-Intensity Histogram (GIH) --- p.19Chapter 2.4.6 --- Experiment --- p.21Chapter 2.5 --- Feature Matching --- p.27Chapter 2.5.1 --- Matching Criteria --- p.27Chapter 2.5.2 --- Distance Measures --- p.28Chapter 2.5.3 --- Searching Techniques --- p.29Chapter 3 --- A Distributed Scheme for Large-Scale CBIR --- p.31Chapter 3.1 --- Overview --- p.31Chapter 3.2 --- Related Work --- p.33Chapter 3.3 --- Scalable Content-Based Image Retrieval Scheme --- p.34Chapter 3.3.1 --- Overview of Our Solution --- p.34Chapter 3.3.2 --- Locality-Sensitive Hashing --- p.34Chapter 3.3.3 --- Scalable Indexing Solutions --- p.35Chapter 3.3.4 --- Disk-Based Multi-Partition Indexing --- p.36Chapter 3.3.5 --- Parallel Multi-Partition Indexing --- p.37Chapter 3.4 --- Feature Representation --- p.43Chapter 3.5 --- Empirical Evaluation --- p.44Chapter 3.5.1 --- Experimental Testbed --- p.44Chapter 3.5.2 --- Performance Evaluation Metrics --- p.44Chapter 3.5.3 --- Experimental Setup --- p.45Chapter 3.5.4 --- Experiment I: Disk-Based Multi-Partition Indexing Approach --- p.45Chapter 3.5.5 --- Experiment II: Parallel-Based Multi-Partition Indexing Approach --- p.48Chapter 3.6 --- Application to WWW Image Retrieval --- p.55Chapter 3.7 --- Summary --- p.55Chapter 4 --- Image Retrieval System for IND Detection --- p.60Chapter 4.1 --- Overview --- p.60Chapter 4.1.1 --- Motivation --- p.60Chapter 4.1.2 --- Related Work --- p.61Chapter 4.1.3 --- Objective --- p.62Chapter 4.1.4 --- Contribution --- p.63Chapter 4.2 --- Database Construction --- p.63Chapter 4.2.1 --- Image Representations --- p.63Chapter 4.2.2 --- Index Construction --- p.64Chapter 4.2.3 --- Keypoint and Image Lookup Tables --- p.67Chapter 4.3 --- Database Query --- p.67Chapter 4.3.1 --- Matching Strategies --- p.68Chapter 4.3.2 --- Verification Processes --- p.71Chapter 4.3.3 --- Image Voting --- p.75Chapter 4.4 --- Performance Evaluation --- p.76Chapter 4.4.1 --- Evaluation Metrics --- p.76Chapter 4.4.2 --- Results --- p.77Chapter 4.4.3 --- Summary --- p.81Chapter 5 --- Shape-SIFT Feature Descriptor --- p.82Chapter 5.1 --- Overview --- p.82Chapter 5.2 --- Related Work --- p.83Chapter 5.3 --- SHAPE-SIFT Descriptors --- p.84Chapter 5.3.1 --- Orientation assignment --- p.84Chapter 5.3.2 --- Canonical orientation determination --- p.84Chapter 5.3.3 --- Keypoint descriptor --- p.87Chapter 5.4 --- Performance Evaluation --- p.88Chapter 5.5 --- Summary --- p.90Chapter 6 --- Conclusions and Future Work --- p.92Chapter 6.1 --- Conclusions --- p.92Chapter 6.2 --- Future Work --- p.93Chapter A --- Publication --- p.94Bibliography --- p.9
    • …
    corecore