Search CORE

10 research outputs found

Early burst detection for memory-efficient image retrieval

Author: Avrithis Yannis S.
Jégou Hervé
Shi Miaojing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

International audienceRecent works show that image comparison based on local descriptors is corrupted by visual bursts, which tend to dominate the image similarity. The existing strategies, like power-law normalization, improve the results by discounting the contribution of visual bursts to the image similarity. In this paper, we propose to explicitly detect the visual bursts in an image at an early stage. We compare several detection strategies jointly taking into account feature similarity and geometrical quantities. The bursty groups are merged into meta-features, which are used as input to state-of-the-art image search systems such as VLAD or the selective match kernel. Then, we show the interest of using this strategy in an asymmetrical manner, with only the database features being aggregated but not those of the query. Extensive experiments performed on public benchmarks for visual retrieval show the benefits of our method, which achieves performance on par with the state of the art but with a significantly reduced complexity, thanks to the lower number of features fed to the indexing system

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Hal-Diderot

HAL-Rennes 1

Coplanar Repeats by Energy Minimization

Author: Chum Ondrej
Kumar M. Pawan
Pritts James
Rozumnyi Denys
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2016
Field of study

This paper proposes an automated method to detect, group and rectify arbitrarily-arranged coplanar repeated elements via energy minimization. The proposed energy functional combines several features that model how planes with coplanar repeats are projected into images and captures global interactions between different coplanar repeat groups and scene planes. An inference framework based on a recent variant of

\alpha

-expansion is described and fast convergence is demonstrated. We compare the proposed method to two widely-used geometric multi-model fitting methods using a new dataset of annotated images containing multiple scene planes with coplanar repeats in varied arrangements. The evaluation shows a significant improvement in the accuracy of rectifications computed from coplanar repeats detected with the proposed method versus those detected with the baseline methods.Comment: 14 pages with supplemental materials attache

arXiv.org e-Print Archive

Crossref

Dynamic match kernel with deep convolutional features for image retrieval

Author: Liang Jie
Rosin Paul L.
Shen Hui
Wang Kai
Yang Jufeng
Yang Ming-Hsuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2018
Field of study

For image retrieval methods based on bag of visual words, much attention has been paid to enhancing the discriminative powers of the local features. Although retrieved images are usually similar to a query in minutiae, they may be significantly different from a semantic perspective, which can be effectively distinguished by convolutional neural networks (CNN). Such images should not be considered as relevant pairs. To tackle this problem, we propose to construct a dynamic match kernel by adaptively calculating the matching thresholds between query and candidate images based on the pairwise distance among deep CNN features. In contrast to the typical static match kernel which is independent to the global appearance of retrieved images, the dynamic one leverages the semantical similarity as a constraint for determining the matches. Accordingly, we propose a semantic-constrained retrieval framework by incorporating the dynamic match kernel, which focuses on matched patches between relevant images and filters out the ones for irrelevant pairs. Furthermore, we demonstrate that the proposed kernel complements recent methods, such as hamming embedding, multiple assignment, local descriptors aggregation, and graph-based re-ranking, while it outperforms the static one under various settings on off-the-shelf evaluation metrics. We also propose to evaluate the matched patches both quantitatively and qualitatively. Extensive experiments on five benchmark data sets and large-scale distractors validate the merits of the proposed method against the state-of-the-art methods for image retrieval

Crossref

Online Research @ Cardiff

DCT Inspired Feature Transform for Image Retrieval and Reconstruction

Author: Shi Miaojing
Wang Yunhe
Xu Chao
You Shan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Scale invariant feature transform (SIFT) is effective for representing images in computer vision tasks, as one of the most resistant feature descriptions to common image deformations. However, two issues should be addressed: first, feature description based on gradient accumulation is not compact and contains redundancies; second, multiple orientations are often extracted from one local region and therefore produce multiple descriptions, which is not good for memory efficiency. To resolve these two issues, this paper introduces a novel method to determine the dominant orientation for multiple-orientation cases, named discrete cosine transform (DCT) intrinsic orientation, and a new DCT inspired feature transform (DIFT). In each local region, it first computes a unique DCT intrinsic orientation via DCT matrix and rotates the region accordingly, and then describes the rotated region with partial DCT matrix coefficients to produce an optimized low-dimensional descriptor. We test the accuracy and robustness of DIFT on real image matching. Afterward, extensive applications performed on public benchmarks for visual retrieval show that using DCT intrinsic orientation achieves performance on a par with SIFT, but with only 60% of its features; replacing the SIFT description with DIFT reduces dimensions from 128 to 32 and improves precision. Image reconstruction resulting from DIFT is presented to show another of its advantages over SIFT.National Natural Science Foundation of China [NSFC 61375026, 2015BAF15B00]SCI(E)[email protected]; [email protected]; [email protected]; [email protected]

Crossref

Edinburgh Research Explorer

King's Research Portal

Large Scale Pattern Detection in Videos and Images from the Wild

Author: Henderson Craig Darren Mark
Publication venue: 'Queen Mary University of London'
Publication date: 25/09/2017
Field of study

PhDPattern detection is a well-studied area of computer vision, but still current methods are unstable in images of poor quality. This thesis describes improvements over contemporary methods in the fast detection of unseen patterns in a large corpus of videos that vary tremendously in colour and texture definition, captured “in the wild” by mobile devices and surveillance cameras. We focus on three key areas of this broad subject; First, we identify consistency weaknesses in existing techniques of processing an image and it’s horizontally reflected (mirror) image. This is important in police investigations where subjects change their appearance to try to avoid recognition, and we propose that invariance to horizontal reflection should be more widely considered in image description and recognition tasks too. We observe online Deep Learning system behaviours in this respect, and provide a comprehensive assessment of 10 popular low level feature detectors. Second, we develop simple and fast algorithms that combine to provide memory- and processing-efficient feature matching. These involve static scene elimination in the presence of noise and on-screen time indicators, a blur-sensitive feature detection that finds a greater number of corresponding features in images of varying sharpness, and a combinatorial texture and colour feature matching algorithm that matches features when either attribute may be poorly defined. A comprehensive evaluation is given, showing some improvements over existing feature correspondence methods. Finally, we study random decision forests for pattern detection. A new method of indexing patterns in video sequences is devised and evaluated. We automatically label positive and negative image training data, reducing a task of unsupervised learning to one of supervised learning, and devise a node split function that is invariant to mirror reflection and rotation through 90 degree angles. A high dimensional vote accumulator encodes the hypothesis support, yielding implicit back-projection for pattern detection.European Union’s Seventh Framework Programme, specific topic “framework and tools for (semi-) automated exploitation of massive amounts of digital data for forensic purposes”, under grant agreement number 607480 (LASIE IP project)

Queen Mary Research Online

Kontextmodelle für lokale Merkmale zur inhaltsbasierten Bildsuche in großen Bilddatenbanken

Author: Manger Daniel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Vor allem seit Smartphones für viele zum ständigen Begleiter geworden sind, wächst die Menge der aufgenommenen Bilder rasant an. Oft werden die Bilder schon unmittelbar nach der Aufnahme über soziale Netzwerke mit anderen geteilt. Zur späteren Verwendung der Aufnahmen hingegen wird es zunehmend wichtiger, die für den jeweiligen Zweck relevanten Bilder in der Masse wiederzufinden. Für viele bekannte Objektklassen ist die automatische Verschlagwortung mit entsprechenden Detektionsverfahren bereits eine große Hilfe. Anhand der Metadaten können außerdem häufig Ort oder Zeit der gesuchten Aufnahmen eingegrenzt werden. Dennoch führt in bestimmten Fällen nur eine inhaltsbasierte Bildsuche zum Ziel, da dort explizit mit einem Anfragebild nach individuellen Objekten oder Szenen gesucht werden kann. Obwohl die Forschung im Bereich der inhaltsbasierten Bildsuche im letzten Jahrzehnt bereits zu vielen Anwendungen geführt hat, ist die Skalierbarkeit der sehr genauen Varianten noch eingeschränkt. Das bedeutet, dass die existierenden Verfahren, mit denen ein Bildpaar robust auf lokal ähnliche Teilinhalte untersucht werden kann, nicht ohne weiteres auf die Suche in vielen Millionen von Bildern ausgeweitet werden können. Diese Dissertation widmet sich dieser Art der inhaltsbasierten Bildsuche, die Bilder anhand ihrer lokalen Bildmerkmale indexiert, und adressiert zwei wesentliche Einschränkungen des populären Bag-of-Words-Modells. Zum einen sind die Quantisierung und Komprimierung der lokalen Merkmale, die für die Suchgeschwindigkeit in großen Bildmengen essentiell sind, mit einem gewissen Verlust von Detailinformation verbunden. Zum anderen müssen die indexierten Merkmale aller Bilder immer im Arbeitsspeicher vorliegen, da jede Suchanfrage den schnellen Zugriff auf einen beträchtlichen Teil des Index erfordert. Konkret beschäftigt sich die Arbeit mit Repräsentationen, die im Index nicht nur die quantisierten Merkmale, sondern auch ihren Kontext einbeziehen. Abweichend zu den bisher üblichen Ansätzen, wird der Kontext, also die größere Umgebung eines lokalen Merkmals, als eigenständiges Merkmal erfasst und ebenfalls quantisiert, was den Index um eine Dimension erweitert. Zunächst wird dafür ein Framework für die Evaluation solcher Umgebungsrepräsentationen entworfen. Anschließend werden zwei Repräsentationen vorgeschlagen: einerseits basierend auf den benachbarten lokalen Merkmalen, die mittels des Fisher Vektors aggregiert werden, andererseits auf Basis der Ergebnisse von Faltungsschichten von künstlichen neuronalen Netzen. Nach einem Vergleich der beiden Repräsentationen sowie Kombinationen davon im Rahmen des Evaluationsframeworks, werden die Vorteile für ein Gesamtsystem der inhaltsbasierten Bildsuche anhand von vier öffentlichen Datensätzen bewertet. Für die Suche in einer Million Bildern verbessern die vorgeschlagenen Repräsentationen auf Basis der neuronalen Netze die Suchergebnisse des Bag-of-Words-Modells deutlich. Da die zusätzliche Indexdimension einen effektiveren Zugriff auf die indexierten Merkmale ermöglicht, wird darüber hinaus eine neue Realisierung des Gesamtsystems vorgeschlagen. Das System ist bezüglich des Index nicht mehr auf den Arbeitsspeicher angewiesen, sondern kann von aktuellen nichtflüchtigen Speichermedien profitieren, etwa von SSD-Laufwerken. Von der Kombination der vorgeschlagenen Umgebungsrepräsentation der lokalen Merkmale und der Realisierung mit großen und günstigen SSD-Laufwerken können bereits heutige Systeme profitieren, denn sie können dadurch noch größere Bilddatenbanken für die inhaltsbasierte Bildsuche zugänglich machen

KITopen

Fraunhofer-ePrints

Early burst detection for memory-efficient image retrieval: – Extended version –

Author: Avrithis Yannis
Jégou Hervé
Shi Miaojing
Publication venue: HAL CCSD
Publication date: 22/06/2015
Field of study

Recent works show that image comparison based on local descriptors is corrupted by visual bursts, which tend to dominate the image similarity. The existing strategies, like power-law normalization, improve the results by discounting the contribution of visual bursts to the image similarity. In this paper, we propose to explicitly detect the visual bursts in an image at an early stage. We compare several detection strategies jointly taking into account feature similarity and geometrical quantities. The bursty groups are merged into meta-features, which are used as input to state-of-the-art image search systems such as VLAD or the selective match kernel. Then, we show the interest of using this strategy in an asymmetrical manner, with only the database features being aggregated but not those of the query. Extensive experiments performed on public benchmarks for visual retrieval show the benefits of our method, which achieves performance on par with the state of the art but with a significantly reduced complexity, thanks to the lower number of features fed to the indexing system

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1