31 research outputs found

    Particular object retrieval with integral max-pooling of CNN activations

    Get PDF
    Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets

    Towards Good Practices in Evaluating Transfer Adversarial Attacks

    Full text link
    Transfer adversarial attacks raise critical security concerns in real-world, black-box scenarios. However, the actual progress of this field is difficult to assess due to two common limitations in existing evaluations. First, different methods are often not systematically and fairly evaluated in a one-to-one comparison. Second, only transferability is evaluated but another key attack property, stealthiness, is largely overlooked. In this work, we design good practices to address these limitations, and we present the first comprehensive evaluation of transfer attacks, covering 23 representative attacks against 9 defenses on ImageNet. In particular, we propose to categorize existing attacks into five categories, which enables our systematic category-wise analyses. These analyses lead to new findings that even challenge existing knowledge and also help determine the optimal attack hyperparameters for our attack-wise comprehensive evaluation. We also pay particular attention to stealthiness, by adopting diverse imperceptibility metrics and looking into new, finer-grained characteristics. Overall, our new insights into transferability and stealthiness lead to actionable good practices for future evaluations.Comment: An extended version can be found at arXiv:2310.11850. Code and a list of categorized attacks are available at https://github.com/ZhengyuZhao/TransferAttackEva

    Tampering detection and localization in images from social networks : A CBIR approach

    Get PDF
    International audienceStep 1 : Content-based image retrieval system ‱ Goal : Find the best image in our database for a comparaison with the query ‱ Method : 1. Find the 10 most similar images with a dot product using the de-scriptors explain below. Research is accelerated with a KD-Tree approach 2. Reorder these 10 candidates in order to find the best candidate. Define an homography between the image query and each candidate in order to found the best homography Descriptors used based on VGG19 [1] ‱ Two sizes used : training images size or based on a kernelization step like [2] ‱ Three vectors are analyzed based on the output of three layers: last convolutional layer C 5 with a output lenght of 512 and the fully-connected layers C 6 and C 7 with a output lenght of 4096 for both ‱ A mean or max pooling have to be apply in the case of C 5 when we used the standard training images size and the three outputs in case of kernelized approach

    Complex Document Classification and Localization Application on Identity Document Images

    Get PDF
    International audienceThis paper studies the problem of document image classification. More specifically, we address the classification of documents composed of few textual information and complex background (such as identity documents). Unlike most existing systems, the proposed approach simultaneously locates the document and recognizes its class. The latter is defined by the document nature (passport, ID, etc.), emission country, version, and the visible side (main or back). This task is very challenging due to unconstrained capturing conditions, sparse textual information, and varying components that are irrelevant to the classification, e.g. photo, names, address, etc. First, a base of document models is created from reference images. We show that training images are not necessary and only one reference image is enough to create a document model. Then, the query image is matched against all models in the base. Unknown documents are rejected using an estimated quality based on the extracted document. The matching process is optimized to guarantee an execution time independent from the number of document models. Once the document model is found, a more accurate matching is performed to locate the document and facilitate information extraction. Our system is evaluated on several datasets with up to 3042 real documents (representing 64 classes) achieving an accuracy of 96.6%

    Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

    Full text link
    Transferable adversarial examples raise critical security concerns in real-world, black-box attack scenarios. However, in this work, we identify two main problems in common evaluation practices: (1) For attack transferability, lack of systematic, one-to-one attack comparison and fair hyperparameter settings. (2) For attack stealthiness, simply no comparisons. To address these problems, we establish new evaluation guidelines by (1) proposing a novel attack categorization strategy and conducting systematic and fair intra-category analyses on transferability, and (2) considering diverse imperceptibility metrics and finer-grained stealthiness characteristics from the perspective of attack traceback. To this end, we provide the first large-scale evaluation of transferable adversarial examples on ImageNet, involving 23 representative attacks against 9 representative defenses. Our evaluation leads to a number of new insights, including consensus-challenging ones: (1) Under a fair attack hyperparameter setting, one early attack method, DI, actually outperforms all the follow-up methods. (2) A state-of-the-art defense, DiffPure, actually gives a false sense of (white-box) security since it is indeed largely bypassed by our (black-box) transferable attacks. (3) Even when all attacks are bounded by the same LpL_p norm, they lead to dramatically different stealthiness performance, which negatively correlates with their transferability performance. Overall, our work demonstrates that existing problematic evaluations have indeed caused misleading conclusions and missing points, and as a result, hindered the assessment of the actual progress in this field.Comment: Code is available at https://github.com/ZhengyuZhao/TransferAttackEva

    IRISA at TrecVid2015: Leveraging Multimodal LDA for Video Hyperlinking

    Get PDF
    International audienceThis paper presents the runs that we submitted in the context of the TRECVid 2015 Video Hyperlinking task. The task aims at proposing a set of video segments, called targets, to complement a query video segment defined as anchor. We used automatic transcripts and automatically extracted visual concept as input data. Two out of four runs use cross-modal LDA as a means to jointly make use of visual and audio information in the videos. As a contrast, one is based solely on visual information, and a combination of the cross-modal and visual runs is considered. After presenting the approaches, we discuss the performance obtained by the respective runs, as well as some of the limitations of the evaluation process

    Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery

    Get PDF
    International audienceThe indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation

    Solving the Traffic and Flitter Challenges with Tulip

    Get PDF
    International audienceWe present our visualization systems and ïŹndings for the Badge and Network TrafïŹc as well as the Social Network and Geospatial challenges of the 2009 VAST contest. The summary starts by presenting an overview of our time series encoding of badge information and network trafïŹc. Our ïŹndings suggest that employee 30 may be of interest. In the second part of the paper, we describe our system for ïŹnding subgraphs in the social network subject to degree constraints. Subsequently, we present our most likely candidate network which is similar to scenario B

    Analyse vidéo de comportements humains dans les points de ventes en temps-réel

    No full text
    Cette thÚse est effectuée en collaboration entre le LaBRI (Laboratoire bordelais de recherche en informatique) et MIRANE S.A.S., le leader français en Publicité sur Lieu de Vente (PLV) Dynamique. Notre but est d'analyser des comportements humains dans un point de vente. Le long de cette thÚse, nous présentons un systÚme d'analyse vidéo composé de plusieurs procédés de divers niveaux. Nous présentons, dans un premier temps, l'analyse vidéo de bas niveau composée de la détection de mouvement et du suivi d'objets. Puis nous analysons le comportement de ces objets suivis, lors de l'analyse de niveau moyen. Finalement, l'analyse de haut-niveau est composée d'une interprétation sémantique de ces comportements et d'une détection de scenarios de haut-niveau.Along this thesis various subjects are studied, from the lowest to the higher level of video analysis. We first present motion detection and object tracking that compose the low-level processing part of our system. Motion detection aims at detecting moving areas, which correspond to foreground, of an image. The result of motion detection is a foreground mask that is used as input for the object tracking process. Tracking matches and identifies foreground regions across frames. Then, we analyze the behavior of the tracked objects, as the mid-level analysis. At each frame, we detect the current state of action of each tracked object currently in the scene. Finally, the system generates a semantic interpretation of these behaviors and we analyze high-level scenarios as the high-level part of our system. These two processes analyze the series of states of each object. The semantic interpretation generates sentences when state changes occur. Scenario recognition detect three different scenarios by analyzing the temporal constraints between the states

    Analyse vidéo de comportements humains dans les points de ventes en temps-réel

    No full text
    Cette thÚse est effectuée en collaboration entre le LaBRI (Laboratoire bordelais de recherche en informatique) et MIRANE S.A.S., le leader français en Publicité sur Lieu de Vente (PLV) Dynamique. Notre but est d'analyser des comportements humains dans un point de vente. Le long de cette thÚse, nous présentons un systÚme d'analyse vidéo composé de plusieurs procédés de divers niveaux. Nous présentons, dans un premier temps, l'analyse vidéo de bas niveau composée de la détection de mouvement et du suivi d'objets. Puis nous analysons le comportement de ces objets suivis, lors de l'analyse de niveau moyen. Finalement, l'analyse de haut-niveau est composée d'une interprétation sémantique de ces comportements et d'une détection de scenarios de haut-niveau.Along this thesis various subjects are studied, from the lowest to the higher level of video analysis. We first present motion detection and object tracking that compose the low-level processing part of our system. Motion detection aims at detecting moving areas, which correspond to foreground, of an image. The result of motion detection is a foreground mask that is used as input for the object tracking process. Tracking matches and identifies foreground regions across frames. Then, we analyze the behavior of the tracked objects, as the mid-level analysis. At each frame, we detect the current state of action of each tracked object currently in the scene. Finally, the system generates a semantic interpretation of these behaviors and we analyze high-level scenarios as the high-level part of our system. These two processes analyze the series of states of each object. The semantic interpretation generates sentences when state changes occur. Scenario recognition detect three different scenarios by analyzing the temporal constraints between the states
    corecore