40 research outputs found
Particular object retrieval with integral max-pooling of CNN activations
Recently, image representation built upon Convolutional Neural Network (CNN)
has been shown to provide effective descriptors for image search, outperforming
pre-CNN features as short-vector representations. Yet such models are not
compatible with geometry-aware re-ranking methods and still outperformed, on
some particular object retrieval benchmarks, by traditional image search
systems relying on precise descriptor matching, geometric re-ranking, or query
expansion. This work revisits both retrieval stages, namely initial search and
re-ranking, by employing the same primitive information derived from the CNN.
We build compact feature vectors that encode several image regions without the
need to feed multiple inputs to the network. Furthermore, we extend integral
images to handle max-pooling on convolutional layer activations, allowing us to
efficiently localize matching objects. The resulting bounding box is finally
used for image re-ranking. As a result, this paper significantly improves
existing CNN-based recognition pipeline: We report for the first time results
competing with traditional methods on the challenging Oxford5k and Paris6k
datasets
Towards Good Practices in Evaluating Transfer Adversarial Attacks
Transfer adversarial attacks raise critical security concerns in real-world,
black-box scenarios. However, the actual progress of this field is difficult to
assess due to two common limitations in existing evaluations. First, different
methods are often not systematically and fairly evaluated in a one-to-one
comparison. Second, only transferability is evaluated but another key attack
property, stealthiness, is largely overlooked. In this work, we design good
practices to address these limitations, and we present the first comprehensive
evaluation of transfer attacks, covering 23 representative attacks against 9
defenses on ImageNet. In particular, we propose to categorize existing attacks
into five categories, which enables our systematic category-wise analyses.
These analyses lead to new findings that even challenge existing knowledge and
also help determine the optimal attack hyperparameters for our attack-wise
comprehensive evaluation. We also pay particular attention to stealthiness, by
adopting diverse imperceptibility metrics and looking into new, finer-grained
characteristics. Overall, our new insights into transferability and
stealthiness lead to actionable good practices for future evaluations.Comment: An extended version can be found at arXiv:2310.11850. Code and a list
of categorized attacks are available at
https://github.com/ZhengyuZhao/TransferAttackEva
Tampering detection and localization in images from social networks : A CBIR approach
International audienceStep 1 : Content-based image retrieval system âą Goal : Find the best image in our database for a comparaison with the query âą Method : 1. Find the 10 most similar images with a dot product using the de-scriptors explain below. Research is accelerated with a KD-Tree approach 2. Reorder these 10 candidates in order to find the best candidate. Define an homography between the image query and each candidate in order to found the best homography Descriptors used based on VGG19 [1] âą Two sizes used : training images size or based on a kernelization step like [2] âą Three vectors are analyzed based on the output of three layers: last convolutional layer C 5 with a output lenght of 512 and the fully-connected layers C 6 and C 7 with a output lenght of 4096 for both âą A mean or max pooling have to be apply in the case of C 5 when we used the standard training images size and the three outputs in case of kernelized approach
Complex Document Classification and Localization Application on Identity Document Images
International audienceThis paper studies the problem of document image classification. More specifically, we address the classification of documents composed of few textual information and complex background (such as identity documents). Unlike most existing systems, the proposed approach simultaneously locates the document and recognizes its class. The latter is defined by the document nature (passport, ID, etc.), emission country, version, and the visible side (main or back). This task is very challenging due to unconstrained capturing conditions, sparse textual information, and varying components that are irrelevant to the classification, e.g. photo, names, address, etc. First, a base of document models is created from reference images. We show that training images are not necessary and only one reference image is enough to create a document model. Then, the query image is matched against all models in the base. Unknown documents are rejected using an estimated quality based on the extracted document. The matching process is optimized to guarantee an execution time independent from the number of document models. Once the document model is found, a more accurate matching is performed to locate the document and facilitate information extraction. Our system is evaluated on several datasets with up to 3042 real documents (representing 64 classes) achieving an accuracy of 96.6%
Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights
Transferable adversarial examples raise critical security concerns in
real-world, black-box attack scenarios. However, in this work, we identify two
main problems in common evaluation practices: (1) For attack transferability,
lack of systematic, one-to-one attack comparison and fair hyperparameter
settings. (2) For attack stealthiness, simply no comparisons. To address these
problems, we establish new evaluation guidelines by (1) proposing a novel
attack categorization strategy and conducting systematic and fair
intra-category analyses on transferability, and (2) considering diverse
imperceptibility metrics and finer-grained stealthiness characteristics from
the perspective of attack traceback. To this end, we provide the first
large-scale evaluation of transferable adversarial examples on ImageNet,
involving 23 representative attacks against 9 representative defenses. Our
evaluation leads to a number of new insights, including consensus-challenging
ones: (1) Under a fair attack hyperparameter setting, one early attack method,
DI, actually outperforms all the follow-up methods. (2) A state-of-the-art
defense, DiffPure, actually gives a false sense of (white-box) security since
it is indeed largely bypassed by our (black-box) transferable attacks. (3) Even
when all attacks are bounded by the same norm, they lead to dramatically
different stealthiness performance, which negatively correlates with their
transferability performance. Overall, our work demonstrates that existing
problematic evaluations have indeed caused misleading conclusions and missing
points, and as a result, hindered the assessment of the actual progress in this
field.Comment: Code is available at
https://github.com/ZhengyuZhao/TransferAttackEva
IRISA at TrecVid2015: Leveraging Multimodal LDA for Video Hyperlinking
International audienceThis paper presents the runs that we submitted in the context of the TRECVid 2015 Video Hyperlinking task. The task aims at proposing a set of video segments, called targets, to complement a query video segment defined as anchor. We used automatic transcripts and automatically extracted visual concept as input data. Two out of four runs use cross-modal LDA as a means to jointly make use of visual and audio information in the videos. As a contrast, one is based solely on visual information, and a combination of the cross-modal and visual runs is considered. After presenting the approaches, we discuss the performance obtained by the respective runs, as well as some of the limitations of the evaluation process
Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery
International audienceThe indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation
Solving the Traffic and Flitter Challenges with Tulip
International audienceWe present our visualization systems and ïŹndings for the Badge and Network TrafïŹc as well as the Social Network and Geospatial challenges of the 2009 VAST contest. The summary starts by presenting an overview of our time series encoding of badge information and network trafïŹc. Our ïŹndings suggest that employee 30 may be of interest. In the second part of the paper, we describe our system for ïŹnding subgraphs in the social network subject to degree constraints. Subsequently, we present our most likely candidate network which is similar to scenario B
Analyse vidéo de comportements humains dans les points de ventes en temps-réel
Cette thÚse est effectuée en collaboration entre le LaBRI (Laboratoire bordelais de recherche en informatique) et MIRANE S.A.S., le leader français en Publicité sur Lieu de Vente (PLV) Dynamique. Notre but est d'analyser des comportements humains dans un point de vente. Le long de cette thÚse, nous présentons un systÚme d'analyse vidéo composé de plusieurs procédés de divers niveaux. Nous présentons, dans un premier temps, l'analyse vidéo de bas niveau composée de la détection de mouvement et du suivi d'objets. Puis nous analysons le comportement de ces objets suivis, lors de l'analyse de niveau moyen. Finalement, l'analyse de haut-niveau est composée d'une interprétation sémantique de ces comportements et d'une détection de scenarios de haut-niveau.Along this thesis various subjects are studied, from the lowest to the higher level of video analysis. We first present motion detection and object tracking that compose the low-level processing part of our system. Motion detection aims at detecting moving areas, which correspond to foreground, of an image. The result of motion detection is a foreground mask that is used as input for the object tracking process. Tracking matches and identifies foreground regions across frames. Then, we analyze the behavior of the tracked objects, as the mid-level analysis. At each frame, we detect the current state of action of each tracked object currently in the scene. Finally, the system generates a semantic interpretation of these behaviors and we analyze high-level scenarios as the high-level part of our system. These two processes analyze the series of states of each object. The semantic interpretation generates sentences when state changes occur. Scenario recognition detect three different scenarios by analyzing the temporal constraints between the states
Analyse vidéo de comportements humains dans les points de ventes en temps-réel
Cette thÚse est effectuée en collaboration entre le LaBRI (Laboratoire bordelais de recherche en informatique) et MIRANE S.A.S., le leader français en Publicité sur Lieu de Vente (PLV) Dynamique. Notre but est d'analyser des comportements humains dans un point de vente. Le long de cette thÚse, nous présentons un systÚme d'analyse vidéo composé de plusieurs procédés de divers niveaux. Nous présentons, dans un premier temps, l'analyse vidéo de bas niveau composée de la détection de mouvement et du suivi d'objets. Puis nous analysons le comportement de ces objets suivis, lors de l'analyse de niveau moyen. Finalement, l'analyse de haut-niveau est composée d'une interprétation sémantique de ces comportements et d'une détection de scenarios de haut-niveau.Along this thesis various subjects are studied, from the lowest to the higher level of video analysis. We first present motion detection and object tracking that compose the low-level processing part of our system. Motion detection aims at detecting moving areas, which correspond to foreground, of an image. The result of motion detection is a foreground mask that is used as input for the object tracking process. Tracking matches and identifies foreground regions across frames. Then, we analyze the behavior of the tracked objects, as the mid-level analysis. At each frame, we detect the current state of action of each tracked object currently in the scene. Finally, the system generates a semantic interpretation of these behaviors and we analyze high-level scenarios as the high-level part of our system. These two processes analyze the series of states of each object. The semantic interpretation generates sentences when state changes occur. Scenario recognition detect three different scenarios by analyzing the temporal constraints between the states