5 research outputs found

    Dynamicity and Durability in Scalable Visual Instance Search.

    Get PDF
    Visual instance search involves retrieving from a collection of images the ones that contain an instance of a visual query. Systems designed for visual instance search face the major challenge of scalability: a collection of a few million images used for instance search typically creates a few billion features that must be indexed. Furthermore, as real image collections grow rapidly, systems must also provide dynamicity, i.e., be able to handle on-line insertions while concurrently serving retrieval operations. Durability, which is the ability to recover correctly from software and hardware crashes, is the natural complement of dynamicity. Durability, however, has rarely been integrated within scalable and dynamic high-dimensional indexing solutions. This article addresses the issue of dynamicity and durability for scalable indexing of very large and rapidly growing collections of local features for instance retrieval. By extending the NV-tree, a scalable disk-based high-dimensional index, we show how to implement the ACID properties of transactions which ensure both dynamicity and durability. We present a detailed performance evaluation of the transactional NV-tree: (i) We show that the insertion throughput is excellent despite the overhead for enforcing the ACID properties; (ii) We also show that this transactional index is truly scalable using a standard image benchmark embedded in collections of up to 28.5 billion high-dimensional vectors; the largest single-server evaluations reported in the literature

    Data Management for Dynamic Multimedia Analytics and Retrieval

    Get PDF
    Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

    NV-tree : a scalable disk-based high-dimensional index

    Get PDF
    This thesis presents the NV-tree (Nearest Vector tree), which addresses thespecific problem of efficiently and effectively finding the approximatek-nearest neighbors within large collections of high-dimensional data points.The NV-tree is a very compact index, as only six bytes are kept in the in-dex for each high-dimensional descriptor. It thus scales extremely well whenindexing large collections of high-dimensional descriptors. The NV-tree ef-ficiently produces results of good quality, even at such a large scale that theindices can no longer be kept entirely in main memory. We demonstrate thiswith extensive experiments presenting results from various collection sizesfrom 36 million up to nearly 30 billion SIFT (Scale Invariant Feature Trans-form) descriptors.We also study the conditions under which a nearest neighbour search pro-vides meaningful results. Following this analysis we compare the NV-tree toLSH (Locality Sensitive Hashing), the most popular method for -distancesearch, showing that the NV-tree outperforms LSH when it comes to theproblem of nearest neighbour retrieval. Beyond this analysis we also dis-cuss how the NV-tree index can be used in practise in industrial applicationsand address two frequently overlooked requirements: dynamicity—the abil-ity to cope with on-line insertions of new high-dimensional items into theindexed collection—and durability—the ability to recover from crashes andavoid losing the indexed data if a failure occurs. As far as we know, no othernearest neighbor algorithm published so far is able to cope with all threerequirements: scale, dynamicity and durability.Í þessari ritgerð setjum við fram vísinn NV-tré (e. NV-tree) sem lausn áákveðnu afmörkuðu vandamáli: að finna, á hraðvirkan og markvirkan hátt,nálgun áknæstu nágrönnum í stóru safni margvíðra gagnapunkta. NV-tréðer mjög fyrirferðarlítill vísir, þar sem aðeins sex bæti eru geymd fyrir hvernmargvíðan lýsivektor (e. descriptor). NV-tréð skalast því mjög vel þegar þvíer beitt á stór söfn margvíðra lýsivektora. NV-tréð skilar góðum niðurstöðumá skömmum tíma, jafnvel þegar vísarnir komast ekki fyrir í minni. Viðsýnum fram á þetta með niðurstöðum tilrauna á söfnum sem innihalda frá 36milljónum upp í nærri 30 milljarða SIFT (e. Scale Invariant Feature Trans-form) lýsivektora. Við rannsökum einnig þau skilyrði sem þurfa að vera fyrir hendi til að leitað næstu nágrönnum skili merkingarbærum niðurstöðum. Í framhaldi afþeirri greiningu berum við NV-tréð saman við LSH (e. Locality SensitiveHashing), sem er vinsælasta aðferðin fyrir -fjarlægðarleit, og sýnum að NV-tréð er mun hraðvirkara en LSH. Til viðbótar við þessa greiningu ræðumvið hagnýtingu NV-trésins í iðnaði og uppfyllum tvær þarfir sem oft er litiðframhjá: breytileika (e. dynamicity)—getu til að höndla í rauntíma viðbæ-tur við lýsingasafnið—og varanleika (e. durability)—getu til að endurheimtavísinn og forðast gagnatap ef um tölvubilun er að ræða. Að því er við bestvitum, uppfyllir enginn annar þekktur vísir allar þessar þrjár þarfir: skalan-leika, breytileika og varanleika

    Dynamic behavior of balanced NV-trees

    No full text
    International audienceIn recent years, some approximate high-dimensional indexing techniques have shown promising results by trading off quality guarantees for improved query performance. While the query performance and quality of these methods has been well studied, however, the performance of index maintenance has not yet been reported in any detail. Here, we focus on the dynamic behavior of the balanced NV-tree, which is a disk-based approximate index for very large collections. We report on an initial study of the effects of several implementation choices for the balanced NV-tree, and show that with appropriate implementation, significant performance improvements are possible. Overall, the proposed techniques not only reduce maintenance cost, but can also improve search performance significantly with minimal loss of search quality
    corecore