22 research outputs found

    New generic indexing technology

    Get PDF
    There has been no fundamental change in the dynamic indexing methods supporting database systems since the invention of the B-tree twenty-five years ago. And yet the whole classical approach to dynamic database indexing has long since become inappropriate and increasingly inadequate. We are moving rapidly from the conventional one-dimensional world of fixed-structure text and numbers to a multi-dimensional world of variable structures, objects and images, in space and time. But, even before leaving the confines of conventional database indexing, the situation is highly unsatisfactory. In fact, our research has led us to question the basic assumptions of conventional database indexing. We have spent the past ten years studying the properties of multi-dimensional indexing methods, and in this paper we draw the strands of a number of developments together - some quite old, some very new, to show how we now have the basis for a new generic indexing technology for the next generation of database systems

    A Survey on Spatial Indexing

    Get PDF
    Spatial information processing has been a centre of attention of research in the previous decade. In spatial databases, data related with spatial coordinates and extents are retrieved based on spatial proximity. A large number of spatial indexes have been proposed to make ease of efficient indexing of spatial objects in large databases and spatial data retrieval. The goal of this paper is to review the advance techniques of the access methods. This paper tries to classify the existing multidimensional access methods, according to the types of indexing, and their performance over spatial queries. K-d trees out performs quad tress without requiring additional memory usage

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Qualitative Spatial Configuration Queries Towards Next Generation Access Methods for GIS

    Get PDF
    For a long time survey, management, and provision of geographic information in Geographic Information Systems (GIS) have mainly had an authoritative nature. Today the trend is changing and such an authoritative geographic information source is now accompanied by a public and freely available one which is usually referred to as Volunteered Geographic Information (VGI). Actually, the term VGI does not refer only to the mere geographic information, but, more generally, to the whole process which assumes the engagement of volunteers to collect and maintain such information in freely accessible GIS. The quick spread of VGI gives new relevance to a well-known challenge: developing new methods and techniques to ease down the interaction between users and GIS. Indeed, in spite of continuous improvements, GIS mainly provide interfaces tailored for experts, denying the casual user usually a non-expert the possibility to access VGI information. One main obstacle resides in the different ways GIS and humans deal with spatial information: GIS mainly encode spatial information in a quantitative format, whereas human beings typically prefer a qualitative and relational approach. For example, we use expressions like the lake is to the right-hand side of the wood or is there a supermarket close to the university? which qualitatively locate a spatial entity with respect to another. Nowadays, such a gap in representation has to be plugged by the user, who has to learn about the system structure and to encode his requests in a form suitable to the system. Contrarily, enabling gis to explicitly deal with qualitative spatial information allows for shifting the translation effort to the system side. Thus, to facilitate the interaction with human beings, GIS have to be enhanced with tools for efficiently handling qualitative spatial information. The work presented in this thesis addresses the problem of enabling Qualitative Spatial Configuration Queries (QSCQs) in GIS. A QSCQ is a spatial database query which allows for an automatic mapping of spatial descriptions produced by humans: A user naturally expresses his request of spatial information by drawing a sketch map or producing a verbal description. The qualitative information conveyed by such descriptions is automatically extracted and encoded into a QSCQ. The focus of this work is on two main challenges: First, the development of a framework that allows for managing in a spatial database the variety of spatial aspects that might be enclosed in a spatial description produced by a human. Second, the conception of Qualitative Spatial Access Methods (QSAMs): algorithms and data structures tailored for efficiently solving QSCQs. The main objective of a QSAM is that of countering the exponential explosion in terms of storage space occurring when switching from a quantitative to a qualitative spatial representation while keeping query response time acceptable

    Scene Understanding For Real Time Processing Of Queries Over Big Data Streaming Video

    Get PDF
    With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation. These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to reuse advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language. With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; iii success in addressing the trust issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications

    Efficient Processing of Range Queries in Main Memory

    Get PDF
    Datenbanksysteme verwenden Indexstrukturen, um Suchanfragen zu beschleunigen. Im Laufe der letzten Jahre haben Forscher verschiedene Ansätze zur Indexierung von Datenbanktabellen im Hauptspeicher entworfen. Hauptspeicherindexstrukturen versuchen möglichst häufig Daten zu verwenden, die bereits im Zwischenspeicher der CPU vorrätig sind, anstatt, wie bei traditionellen Datenbanksystemen, die Zugriffe auf den externen Speicher zu optimieren. Die meisten vorgeschlagenen Indexstrukturen für den Hauptspeicher beschränken sich jedoch auf Punktabfragen und vernachlässigen die ebenso wichtigen Bereichsabfragen, die in zahlreichen Anwendungen, wie in der Analyse von Genomdaten, Sensornetzwerken, oder analytischen Datenbanksystemen, zum Einsatz kommen. Diese Dissertation verfolgt als Hauptziel die Fähigkeiten von modernen Hauptspeicherdatenbanksystemen im Ausführen von Bereichsabfragen zu verbessern. Dazu schlagen wir zunächst die Cache-Sensitive Skip List, eine neue aktualisierbare Hauptspeicherindexstruktur, vor, die für die Zwischenspeicher moderner Prozessoren optimiert ist und das Ausführen von Bereichsabfragen auf einzelnen Datenbankspalten ermöglicht. Im zweiten Abschnitt analysieren wir die Performanz von multidimensionalen Bereichsabfragen auf modernen Serverarchitekturen, bei denen Daten im Hauptspeicher hinterlegt sind und Prozessoren über SIMD-Instruktionen und Multithreading verfügen. Um die Relevanz unserer Experimente für praktische Anwendungen zu erhöhen, schlagen wir zudem einen realistischen Benchmark für multidimensionale Bereichsabfragen vor, der auf echten Genomdaten ausgeführt wird. Im letzten Abschnitt der Dissertation präsentieren wir den BB-Tree als neue, hochperformante und speichereffziente Hauptspeicherindexstruktur. Der BB-Tree ermöglicht das Ausführen von multidimensionalen Bereichs- und Punktabfragen und verfügt über einen parallelen Suchoperator, der mehrere Threads verwenden kann, um die Performanz von Suchanfragen zu erhöhen.Database systems employ index structures as means to accelerate search queries. Over the last years, the research community has proposed many different in-memory approaches that optimize cache misses instead of disk I/O, as opposed to disk-based systems, and make use of the grown parallel capabilities of modern CPUs. However, these techniques mainly focus on single-key lookups, but neglect equally important range queries. Range queries are an ubiquitous operator in data management commonly used in numerous domains, such as genomic analysis, sensor networks, or online analytical processing. The main goal of this dissertation is thus to improve the capabilities of main-memory database systems with regard to executing range queries. To this end, we first propose a cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, which targets the execution of range queries on single database columns. Second, we study the performance of multidimensional range queries on modern hardware, where data are stored in main memory and processors support SIMD instructions and multi-threading. We re-evaluate a previous rule of thumb suggesting that, on disk-based systems, scans outperform index structures for selectivities of approximately 15-20% or more. To increase the practical relevance of our analysis, we also contribute a novel benchmark consisting of several realistic multidimensional range queries applied to real- world genomic data. Third, based on the outcomes of our experimental analysis, we devise a novel, fast and space-effcient, main-memory based index structure, the BB- Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs

    Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search Criteria

    Get PDF
    This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types. Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems. This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme. To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria

    Técnicas de particionamiento multidimensional basadas en índices multiatributo en bases de datos paralelas

    Get PDF
    Los requerimientos cada día más exigentes de modernas aplicaciones de bases de datos, tales como GIS, CAD, CASE y otras, imponen la necesidad de encontrar nuevas vías de solución al problema del tratamiento de grandes volúmenes de información. La potencia de procesamiento de computadores paralelos económicamente abordables, ha atraído la atención de una gran comunidad de investigadores y técnicos que encuentran en los sistemas paralelos de bases de datos la respuesta eficiente a las exigencias de nuevas aplicaciones. Específicamente, la tecnología del paralelismo resulta una atractiva vía de solución a la problemática tradicional del cuello de botella que representan las operaciones de entrada/salida. Con objeto de minimizar el tiempo de respuesta a una consulta, los sistemas de bases de datos paralelas particionan los datos entre un conjunto de dispositivos de almacenamiento, favoreciendo el acceso en paralelo a los mismos y permitiendo, en definitiva la participación concurrente de varios procesadores en la ejecución de una consulta. Habitualmente, el particionamiento de las relaciones se efectúa por un sólo atributo, enviando las tupias a distintos dispositivos dependiendo del valor de dicha tupia sobre el atributo de particionamiento. Esta forma de fragmentar los datos resulta adecuada cuando el predicado de la consulta incluye el atributo de particionamiento. Sin embargo, en aquellos casos en que esto no sea así, la consulta debe ser dirigida hacia todos los nodos de procesamiento encargados de gestionar algún fragmento de la relación o relaciones implicadas en la consulta. Este modo de proceder afecta negativamente no sólo al tiempo de ejecución de la consulta, sino también al throughput del sistema. En la tesis que se presenta, se proponen modelos de particionamiento multidimensional, basados en la consideración de múltiples atributos. Básicamente, la técnica propuesta consiste en realizar un particionamiento por múltiples dimensiones del espacio de tupias, enviando posteriormente los diferentes fragmentos en que queda dividido este espacio a un determinado número de discos del sistema. Por su parte, la fragmentación del espacio de tupias se realiza equilibradamente por medio de un nuevo mecanismo de indexación multiatributo, conocido bajo el nombre de árbol Q. En el desarrollo de esta memoria de tesis, se exponen las ideas que han conducido al establecimiento del árbol Q; se definen con detalle las estructuras y algoritmos de manipulación del árbol Q; se presentan diversas estrategias de particionamiento basadas en esta estructura y se exhiben los resultados de rendimiento de las diferentes propuestas, basados en los trabajos de implementación realizados durante la fase de ejecución de esta tesis. Abstract The demanding requirements of modern datábase applications, such as GIS, CAD, CASE and others, claim for new solutions to the problem of managing large quantity of information. The processing power of inexpensive parallel computers has focussed the attention of many searchers who find in such computer systems the answer to the demands of these new applications. Specifically, the parallelism technology seems an attractive via to solve the traditional bottlelneck found in input/output operations. With the goal of minimizing the response time of a query, the parallel datábase systems decluster data among a number of storage devices, by favouring the access in parallel to data and by permitting the contribution of several processors in the execution of a query. Frequently, the partitioning of relations is made by a single attribute, sending tupies to different disks by depending on the valué of the tupie on the partitioning attribute. This way to fragment data is useful when the partitioning attribute is involved in the predícate of the query. However, in those situations where it is not the case, the query must be directed to every processing node which is in charge of some fragment of the relation or relations involved in the query. This approach affects negatively to both, the response time of the query and the throughput of the system. In the thesis we present, a multidimensional partitioning model is proposed. In short, the proposed technique partitions, on the base of múltiple attributes, the tupie space by sending the different fragments of the space to a specific number of disks in the system. By its hand, the tupie space partitioning is made in a balanced way by means of a new multi-attribute indexing method, called the Q-tree. In this thesis dissertation, we present the ideas which have guided the stablishment of the Q-tree. In addition, we define the structures and algorithms for manipulating the Q-tree, we introduce several partitioning strategies based on this structure and, finally, we include the performance results of the different proposals, based on the implementation tasks carried out during the execution of this doctoral thesis
    corecore