22 research outputs found

    NV-tree : a scalable disk-based high-dimensional index

    Get PDF
    This thesis presents the NV-tree (Nearest Vector tree), which addresses thespecific problem of efficiently and effectively finding the approximatek-nearest neighbors within large collections of high-dimensional data points.The NV-tree is a very compact index, as only six bytes are kept in the in-dex for each high-dimensional descriptor. It thus scales extremely well whenindexing large collections of high-dimensional descriptors. The NV-tree ef-ficiently produces results of good quality, even at such a large scale that theindices can no longer be kept entirely in main memory. We demonstrate thiswith extensive experiments presenting results from various collection sizesfrom 36 million up to nearly 30 billion SIFT (Scale Invariant Feature Trans-form) descriptors.We also study the conditions under which a nearest neighbour search pro-vides meaningful results. Following this analysis we compare the NV-tree toLSH (Locality Sensitive Hashing), the most popular method for -distancesearch, showing that the NV-tree outperforms LSH when it comes to theproblem of nearest neighbour retrieval. Beyond this analysis we also dis-cuss how the NV-tree index can be used in practise in industrial applicationsand address two frequently overlooked requirements: dynamicity—the abil-ity to cope with on-line insertions of new high-dimensional items into theindexed collection—and durability—the ability to recover from crashes andavoid losing the indexed data if a failure occurs. As far as we know, no othernearest neighbor algorithm published so far is able to cope with all threerequirements: scale, dynamicity and durability.Í þessari ritgerð setjum við fram vísinn NV-tré (e. NV-tree) sem lausn áákveðnu afmörkuðu vandamáli: að finna, á hraðvirkan og markvirkan hátt,nálgun áknæstu nágrönnum í stóru safni margvíðra gagnapunkta. NV-tréðer mjög fyrirferðarlítill vísir, þar sem aðeins sex bæti eru geymd fyrir hvernmargvíðan lýsivektor (e. descriptor). NV-tréð skalast því mjög vel þegar þvíer beitt á stór söfn margvíðra lýsivektora. NV-tréð skilar góðum niðurstöðumá skömmum tíma, jafnvel þegar vísarnir komast ekki fyrir í minni. Viðsýnum fram á þetta með niðurstöðum tilrauna á söfnum sem innihalda frá 36milljónum upp í nærri 30 milljarða SIFT (e. Scale Invariant Feature Trans-form) lýsivektora. Við rannsökum einnig þau skilyrði sem þurfa að vera fyrir hendi til að leitað næstu nágrönnum skili merkingarbærum niðurstöðum. Í framhaldi afþeirri greiningu berum við NV-tréð saman við LSH (e. Locality SensitiveHashing), sem er vinsælasta aðferðin fyrir -fjarlægðarleit, og sýnum að NV-tréð er mun hraðvirkara en LSH. Til viðbótar við þessa greiningu ræðumvið hagnýtingu NV-trésins í iðnaði og uppfyllum tvær þarfir sem oft er litiðframhjá: breytileika (e. dynamicity)—getu til að höndla í rauntíma viðbæ-tur við lýsingasafnið—og varanleika (e. durability)—getu til að endurheimtavísinn og forðast gagnatap ef um tölvubilun er að ræða. Að því er við bestvitum, uppfyllir enginn annar þekktur vísir allar þessar þrjár þarfir: skalan-leika, breytileika og varanleika

    Scalability of the NV-tree: Three Experiments

    Get PDF
    International audienceThe NV-tree is a scalable approximate high-dimensional indexing method specifically designed for large-scale visual instance search. In this paper, we report on three experiments designed to evaluate the performance of the NV-tree. Two of these experiments embed standard benchmarks within collections of up to 28.5 billion features, representing the largest single-server collection ever reported in the literature. The results show that indeed the NV-tree performs very well for visual instance search applications over large-scale collections

    Dynamicity and Durability in Scalable Visual Instance Search.

    Get PDF
    Visual instance search involves retrieving from a collection of images the ones that contain an instance of a visual query. Systems designed for visual instance search face the major challenge of scalability: a collection of a few million images used for instance search typically creates a few billion features that must be indexed. Furthermore, as real image collections grow rapidly, systems must also provide dynamicity, i.e., be able to handle on-line insertions while concurrently serving retrieval operations. Durability, which is the ability to recover correctly from software and hardware crashes, is the natural complement of dynamicity. Durability, however, has rarely been integrated within scalable and dynamic high-dimensional indexing solutions. This article addresses the issue of dynamicity and durability for scalable indexing of very large and rapidly growing collections of local features for instance retrieval. By extending the NV-tree, a scalable disk-based high-dimensional index, we show how to implement the ACID properties of transactions which ensure both dynamicity and durability. We present a detailed performance evaluation of the transactional NV-tree: (i) We show that the insertion throughput is excellent despite the overhead for enforcing the ACID properties; (ii) We also show that this transactional index is truly scalable using a standard image benchmark embedded in collections of up to 28.5 billion high-dimensional vectors; the largest single-server evaluations reported in the literature

    NV-tree : a scalable disk-based high-dimensional index

    No full text
    This thesis presents the NV-tree (Nearest Vector tree), which addresses the specific problem of efficiently and effectively finding the approximate k- nearest neighbors within large collections of high-dimensional data points. The NV-tree is a very compact index, as only six bytes are kept in the index for each high-dimensional descriptor. It thus scales extremely well when indexing large collections of high-dimensional descriptors. The NV-tree efficiently produces results of good quality, even at such a large scale that the indices can no longer be kept entirely in main memory. We demonstrate this with extensive experiments presenting results from various collection sizes from 36 million up to nearly 30 billion SIFT (Scale Invariant Feature Transform) descriptors. We also study the conditions under which a nearest neighbour search provides meaningful results. Following this analysis we compare the NV-tree to LSH (Locality Sensitive Hashing), the most popular method for ϵ-distance search, showing that the NV-tree outperforms LSH when it comes to the problem of nearest neighbour retrieval. Beyond this analysis we also discuss how the NV-tree index can be used in practise in industrial applications and address two frequently overlooked requirements: dynamicity—the ability to cope with on-line insertions of new high-dimensional items into the indexed collection—and durability—the ability to recover from crashes and avoid losing the indexed data if a failure occurs. As far as we know, no other nearest neighbor algorithm published so far is able to cope with all three requirements: scale, dynamicity and durability.Í þessari ritgerð setjum við fram vísinn NV-tré (e. NV-tree) sem lausn á ákveðnu afmörkuðu vandamáli: að finna, á hraðvirkan og markvirkan hátt, nálgun á k næstu nágrönnum í stóru safni margvíðra gagnapunkta. NV-tréð er mjög fyrirferðarlítill vísir, þar sem aðeins sex bæti eru geymd fyrir hvern margvíðan lýsivektor (e. descriptor). NV-tréð skalast því mjög vel þegar því er beitt á stór söfn margvíðra lýsivektora. NV-tréð skilar góðum niðurstöðum á skömmum tíma, jafnvel þegar vísarnir komast ekki fyrir í minni. Við sýnum fram á þetta með niðurstöðum tilrauna á söfnum sem innihalda frá 36 milljónum upp í nærri 30 milljarða SIFT (e. Scale Invariant Feature Transform) lýsivektora. Við rannsökum einnig þau skilyrði sem þurfa að vera fyrir hendi til að leit að næstu nágrönnum skili merkingarbærum niðurstöðum. Í framhaldi af þeirri greiningu berum við NV-tréð saman við LSH (e. Locality Sensitive Hashing), sem er vinsælasta aðferðin fyrir ϵ-fjarlægðarleit, og sýnum að NVtréð er mun hraðvirkara en LSH. Til viðbótar við þessa greiningu ræðum við hagnýtingu NV-trésins í iðnaði og uppfyllum tvær þarfir sem oft er litið framhjá: breytileika (e. dynamicity)—getu til að höndla í rauntíma viðbætur við lýsingasafnið—og varanleika (e. durability)—getu til að endurheimta vísinn og forðast gagnatap ef um tölvubilun er að ræða. Að því er við best vitum, uppfyllir enginn annar þekktur vísir allar þessar þrjár þarfir: skalanleika, breytileika og varanleika

    The PvS-Index: An Indexing Method for Local Image Descriptors

    No full text
    In the last twenty years the problem of finding near neighbours to a specified data point in high dimensional space has become of increasing interest to the database community, especially in the context of Time series and Multimedia data. First approaches, published in the 1980s and the early 90s have shown to work very well for up to 12 and in selected cases up to 20 dimensions. In higher dimensional spaces, the performance of these indexing structures degraded drastically, making a sequential scan still the best choice in respect of performance. In this thesis I present the PvS-Index, a new approximate indexing technique for this kind of high dimensional data. Due to its low calculation-cost and a fixed amount of I/O operations it provides a good evaluation time independent from the actual size of the collection. When applying this PvS-Index to the problem of image copyright protection using local descriptors, my measurements showed a performance gain of several orders of magnitude in comparison with a sequential scan through the whole data set.Á síðustu tveimur áratugum hefur áhuginn á því að finna næstu nágranna fyrirspurnapunkta í margvíðu rúmi vakið áhuga rannsakenda í gagnasafnsfræði, sér í lagi í tengslum við tímaraðir og margmiðlunargögn. Fyrstu aðferðirnar, sem birtust á níunda og snemma á tíunda áratug síðustu aldar, hafa virkað vel upp að 12 víddum, og jafnvel 20 víddum í sérstökum tilfellum. Með fleiri víddum versna afköst þessarra aðferða hins vegar hratt, þannig að runuleit gegnum allt gagnasafnið verður hraðvirkasta leitaraðferðin. Í þessari meistararitgerð kynni ég PvS-vísinn, sem er nýr vísir fyrir nálgunarfyrirspurnir í margvíð gögn. Vegna lítils reiknikostnaðar og fasts diskakostnaðar gefur PvS-vísirinn góðan svartíma, sem er óháður stærð gagnasafnsins. Niðurstöður mínar sýna að þegar PvS-vísinum er beitt við höfundarréttarvörn fyrir stafrænar myndir, með því að nota staðværa lýsinga, þá eru afköst nokkrum stærðargráðum betri en með runuleit gegnum allt gagnasafnið

    NV-Tree: nearest neighbors at the billion scale

    No full text
    International audienceThis paper presents the NV-Tree (Nearest Vector Tree). It addresses the specific, yet important, problem of efficiently and effectively finding the approximate k-nearest neighbors within a collection of a few billion high-dimensional data points. The NV-Tree is a very compact index, as only six bytes are kept in the index for each high-dimensional descriptor. It thus scales extremely well when indexing large collections of high-dimensional descriptors. The NV-Tree efficiently produces results of good quality, even at such a large scale that the indices cannot be kept entirely in main memory any more. We demonstrate this with extensive experiments using a collection of 2.5 billion SIFT (Scale Invariant Feature Transform) descriptors

    SSD Technology Enables Dynamic Maintenance of Persistent High-Dimensional Indexes

    Get PDF
    International audienceIn today's world of ever-increasing multimedia collections, dynamically and persistently maintaining high-dimensional indexes is imperative for industrial applications. Since HDD performance is the main bottleneck in index maintenance, we investigate the impact of SSD technology. We use the NV-tree to drive our analysis, as the only high-dimensional index in the literature which has seriously addressed updates. Our simulation model indicates that an index of 1.5 billion descriptors can be built dynamically on a high-end SSD in just over four hours of disk time, which is more than 500x faster than using a high-end HDD. Relatively small investment in the new SSD technology can thus make dynamic and persistent high-dimensional indexes very feasible
    corecore