64 research outputs found

    TUCH:Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets

    Get PDF
    Cross-view retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Cross-view hashing is to efficiently solve the cross-view retrieval problem with binary hash codes. Most existing works on cross-view hashing exploit multi-view embedding method to tackle this problem, which inevitably causes the information loss in both image and text domains. Inspired by the Generative Adversarial Nets (GANs), this paper presents a new model that is able to Turn Cross-view Hashing into single-view hashing (TUCH), thus enabling the information of image to be preserved as much as possible. TUCH is a novel deep architecture that integrates a language model network T for text feature extraction, a generator network G to generate fake images from text feature and a hashing network H for learning hashing functions to generate compact binary codes. Our architecture effectively unifies joint generative adversarial learning and cross-view hashing. Extensive empirical evidence shows that our TUCH approach achieves state-of-the-art results, especially on text to image retrieval, based on image-sentences datasets, i.e. standard IAPRTC-12 and large-scale Microsoft COCO

    Image-based Authentication

    Get PDF
    Mobile and wearable devices are popular platforms for accessing online services. However, the small form factor of such devices, makes a secure and practical experience for user authentication, challenging. Further, online fraud that includes phishing attacks, has revealed the importance of conversely providing solutions for usable authentication of remote services to online users. In this thesis, we introduce image-based solutions for mutual authentication between a user and a remote service provider. First, we propose and develop Pixie, a two-factor, object-based authentication solution for camera-equipped mobile and wearable devices. We further design ai.lock, a system that reliably extracts from images, authentication credentials similar to biometrics. Second, we introduce CEAL, a system to generate visual key fingerprint representations of arbitrary binary strings, to be used to visually authenticate online entities and their cryptographic keys. CEAL leverages deep learning to capture the target style and domain of training images, into a generator model from a large collection of sample images rather than hand curated as a collection of rules, hence provides a unique capacity for easy customizability. CEAL integrates a model of the visual discriminative ability of human perception, hence the resulting fingerprint image generator avoids mapping distinct keys to images which are not distinguishable by humans. Further, CEAL deterministically generates visually pleasing fingerprint images from an input vector where the vector components are designated to represent visual properties which are either readily perceptible to human eye, or imperceptible yet are necessary for accurately modeling the target image domain. We show that image-based authentication using Pixie is usable and fast, while ai.lock extracts authentication credentials that exceed the entropy of biometrics. Further, we show that CEAL outperforms state-of-the-art solution in terms of efficiency, usability, and resilience to powerful adversarial attacks

    User-centered design and evaluation of interactive segmentation methods for medical images

    Get PDF
    Segmentation of medical images is a challenging task that aims to identify a particular structure present on the image. Among the existing methods involving the user at different levels, from a fully-manual to a fully-automated task, interactive segmentation methods provide assistance to the user during the task to reduce the variability in the results and allow occasional corrections of segmentation failures. Therefore, they offer a compromise between the segmentation efficiency and the accuracy of the results. It is the user who judges whether the results are satisfactory and how to correct them during the segmentation, making the process subject to human factors. Despite the strong influence of the user on the outcomes of a segmentation task, the impact of such factors has received little attention, with the literature focusing the assessment of segmentation processes on computational performance. Yet, involving the user performance in the analysis is more representative of a realistic scenario. Our goal is to explore the user behaviour in order to improve the efficiency of interactive image segmentation processes. This is achieved through three contributions. First, we developed a method which is based on a new user interaction mechanism to provide hints as to where to concentrate the computations. This significantly improves the computation efficiency without sacrificing the quality of the segmentation. The benefits of using such hints are twofold: (i) because our contribution is based on user interaction, it generalizes to a wide range of segmentation methods, and (ii) it gives comprehensive indications about where to focus the segmentation search. The latter advantage is used to achieve the second contribution. We developed an automated method based on a multi-scale strategy to: (i) reduce the user’s workload and, (ii) improve the computational time up to tenfold, allowing real-time segmentation feedback. Third, we have investigated the effects of such improvements in computations on the user’s performance. We report an experiment that manipulates the delay induced by the computation time while performing an interactive segmentation task. Results reveal that the influence of this delay can be significantly reduced with an appropriate interaction mechanism design. In conclusion, this project provides an effective image segmentation solution that has been developed in compliance with user performance requirements. We validated our approach through multiple user studies that provided a step forward into understanding the user behaviour during interactive image segmentation

    Segmentation d'images couleurs et multispectrales de la peau

    Get PDF
    La délimitation précise du contour des lésions pigmentées sur des images est une première étape importante pour le diagnostic assisté par ordinateur du mélanome. Cette thèse présente une nouvelle approche de la détection automatique du contour des lésions pigmentaires sur des images couleurs ou multispectrales de la peau. Nous présentons d'abord la notion de minimisation d'énergie par coupes de graphes en terme de Maxima A-Posteriori d'un champ de Markov. Après un rapide état de l'art, nous étudions l'influence des paramètres de l'algorithme sur les contours d'images couleurs. Dans ce cadre, nous proposons une fonction d'énergie basée sur des classifieurs performants (Machines à support de vecteurs et Forêts aléatoires) et sur un vecteur de caractéristiques calculé sur un voisinage local. Pour la segmentation de mélanomes, nous estimons une carte de concentration des chromophores de la peau, indices discriminants du mélanomes, à partir d'images couleurs ou multispectrales, et intégrons ces caractéristiques au vecteur. Enfin, nous détaillons le schéma global de la segmentation automatique de mélanomes, comportant une étape de sélection automatique des "graines" utiles à la coupure de graphes ainsi que la sélection des caractéristiques discriminantes. Cet outil est comparé favorablement aux méthodes classiques à base de coupure de graphes en terme de précision et de robustesse.Accurate border delineation of pigmented skin lesion (PSL) images is a vital first step in computer-aided diagnosis (CAD) of melanoma. This thesis presents a novel approach of automatic PSL border detection on color and multispectral skin images. We first introduce the concept of energy minimization by graph cuts in terms of maximum a posteriori estimation of a Markov random field (MAP-MRF framework). After a brief state of the art in interactive graph-cut based segmentation methods, we study the influence of parameters of the segmentation algorithm on color images. Under this framework, we propose an energy function based on efficient classifiers (support vector machines and random forests) and a feature vector calculated on a local neighborhood. For the segmentation of melanoma, we estimate the concentration maps of skin chromophores, discriminating indices of melanomas from color and multispectral images, and integrate these features in a vector. Finally, we detail an global framework of automatic segmentation of melanoma, which comprises two main stages: automatic selection of "seeds" useful for graph cuts and the selection of discriminating features. This tool is compared favorably to classic graph-cut based segmentation methods in terms of accuracy and robustness.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Automatic Image Captioning with Style

    Get PDF
    This thesis connects two core topics in machine learning, vision and language. The problem of choice is image caption generation: automatically constructing natural language descriptions of image content. Previous research into image caption generation has focused on generating purely descriptive captions; I focus on generating visually relevant captions with a distinct linguistic style. Captions with style have the potential to ease communication and add a new layer of personalisation. First, I consider naming variations in image captions, and propose a method for predicting context-dependent names that takes into account visual and linguistic information. This method makes use of a large-scale image caption dataset, which I also use to explore naming conventions and report naming conventions for hundreds of animal classes. Next I propose the SentiCap model, which relies on recent advances in artificial neural networks to generate visually relevant image captions with positive or negative sentiment. To balance descriptiveness and sentiment, the SentiCap model dynamically switches between two recurrent neural networks, one tuned for descriptive words and one for sentiment words. As the first published model for generating captions with sentiment, SentiCap has influenced a number of subsequent works. I then investigate the sub-task of modelling styled sentences without images. The specific task chosen is sentence simplification: rewriting news article sentences to make them easier to understand. For this task I design a neural sequence-to-sequence model that can work with limited training data, using novel adaptations for word copying and sharing word embeddings. Finally, I present SemStyle, a system for generating visually relevant image captions in the style of an arbitrary text corpus. A shared term space allows a neural network for vision and content planning to communicate with a network for styled language generation. SemStyle achieves competitive results in human and automatic evaluations of descriptiveness and style. As a whole, this thesis presents two complete systems for styled caption generation that are first of their kind and demonstrate, for the first time, that automatic style transfer for image captions is achievable. Contributions also include novel ideas for object naming and sentence simplification. This thesis opens up inquiries into highly personalised image captions; large scale visually grounded concept naming; and more generally, styled text generation with content control

    Video Motion: Finding Complete Motion Paths for Every Visible Point

    Get PDF
    <p>The problem of understanding motion in video has been an area of intense research in computer vision for decades. The traditional approach is to represent motion using optical flow fields, which describe the two-dimensional instantaneous velocity at every pixel in every frame. We present a new approach to describing motion in video in which each visible world point is associated with a sequence-length video motion path. A video motion path lists the location where a world point would appear if it were visible in every frame of the sequence. Each motion path is coupled with a vector of binary visibility flags for the associated point that identify the frames in which the tracked point is unoccluded.</p><p>We represent paths for all visible points in a particular sequence using a single linear subspace. The key insight we exploit is that, for many sequences, this subspace is low-dimensional, scaling with the complexity of the deformations and the number of independent objects in the scene, rather than the number of frames in the sequence. Restricting all paths to lie within a single motion subspace provides strong regularization that allows us to extend paths through brief occlusions, relying on evidence from the visible frames to hallucinate the unseen locations.</p><p>This thesis presents our mathematical model of video motion. We define a path objective function that optimizes a set of paths given estimates of visible intervals, under the assumption that motion is generally spatially smooth and that the appearance of a tracked point remains constant over time. We estimate visibility based on global properties of all paths, enforcing the physical requirement that at least one tracked point must be visible at every pixel in the video. The model assumes the existence of an appropriate path motion basis; we find a sequence-specific basis through analysis of point tracks from a frame-to-frame tracker. Tracking failures caused by image noise, non-rigid deformations, or occlusions complicate the problem by introducing missing data. We update standard trackers to aggressively reinitialize points lost in earlier frames. Finally, we improve on standard Principal Component Analysis with missing data by introducing a novel compaction step that associates these relocalized points, reducing the amount of missing data that must be overcome. The full system achieves state-of-the-art results, recovering dense, accurate, long-range point correspondences in the face of significant occlusions.</p>Dissertatio

    Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016)

    Get PDF
    • …
    corecore