11 research outputs found

    Learning Features by Watching Objects Move

    Full text link
    This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.Comment: CVPR 201

    Convex optimization method for quantifying image quality induced saliency variation

    Get PDF
    Visual saliency plays a significant role in image quality assessment. Image distortions cause shift of saliency from its original places. Being able to measure such distortion-included saliency variation (DSV) contributes towards the optimal use of saliency in automated image quality assessment. In our previous study a benchmark for the measurement of DSV through subjective testing was built. However, exiting saliency similarity measures are unhelpful for the quantification of DSV due to the fact that DSV highly depends on the dispersion degree of a saliency map. In this paper, we propose a novel similarity metric for the measurement of DSV, namely MDSV, based on convex optimization method. The proposed MDSV metric integrates the local saliency similarity measure and the global saliency similarity measure using the function of saliency dispersion as a modulator. We detail the parameter selection of the proposed metric and the interactions of sub-models for the convex optimization strategy. Statistical analyses show that our proposed MDSV outperforms the existing metrics in quantifying the image quality induced saliency variation

    Visual saliency computation for image analysis

    Full text link
    Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Semi-supervised learning for image classification

    Get PDF
    Object class recognition is an active topic in computer vision still presenting many challenges. In most approaches, this task is addressed by supervised learning algorithms that need a large quantity of labels to perform well. This leads either to small datasets (< 10,000 images) that capture only a subset of the real-world class distribution (but with a controlled and verified labeling procedure), or to large datasets that are more representative but also add more label noise. Therefore, semi-supervised learning is a promising direction. It requires only few labels while simultaneously making use of the vast amount of images available today. We address object class recognition with semi-supervised learning. These algorithms depend on the underlying structure given by the data, the image description, and the similarity measure, and the quality of the labels. This insight leads to the main research questions of this thesis: Is the structure given by labeled and unlabeled data more important than the algorithm itself? Can we improve this neighborhood structure by a better similarity metric or with more representative unlabeled data? Is there a connection between the quality of labels and the overall performance and how can we get more representative labels? We answer all these questions, i.e., we provide an extensive evaluation, we propose several graph improvements, and we introduce a novel active learning framework to get more representative labels.Objektklassifizierung ist ein aktives Forschungsgebiet in maschineller Bildverarbeitung was bisher nur unzureichend gelöst ist. Die meisten Ansätze versuchen die Aufgabe durch überwachtes Lernen zu lösen. Aber diese Algorithmen benötigen eine hohe Anzahl von Trainingsdaten um gut zu funktionieren. Das führt häufig entweder zu sehr kleinen Datensätzen (< 10,000 Bilder) die nicht die reale Datenverteilung einer Klasse wiedergeben oder zu sehr grossen Datensätzen bei denen man die Korrektheit der Labels nicht mehr garantieren kann. Halbüberwachtes Lernen ist eine gute Alternative zu diesen Methoden, da sie nur sehr wenige Labels benötigen und man gleichzeitig Datenressourcen wie das Internet verwenden kann. In dieser Arbeit adressieren wir Objektklassifizierung mit halbüberwachten Lernverfahren. Diese Algorithmen sind sowohl von der zugrundeliegenden Struktur, die sich aus den Daten, der Bildbeschreibung und der Distanzmasse ergibt, als auch von der Qualität der Labels abhängig. Diese Erkenntnis hat folgende Forschungsfragen aufgeworfen: Ist die Struktur wichtiger als der Algorithmus selbst? Können wir diese Struktur gezielt verbessern z.B. durch eine bessere Metrik oder durch mehr Daten? Gibt es einen Zusammenhang zwischen der Qualität der Labels und der Gesamtperformanz der Algorithmen? In dieser Arbeit beantworten wir diese Fragen indem wir diese Methoden evaluieren. Ausserdem entwickeln wir neue Methoden um die Graphstruktur und die Labels zu verbessern

    The sleeping brain and the neural basis of emotions

    Get PDF
    In addition to active wake, emotions are generated and experienced in a variety of functionally different states such as those of sleep, during which external stimulation and cognitive control are lacking. The neural basis of emotions can be specified by regarding the multitude of emotion-related brain states, as well as the distinct neuro- and psychodynamic stages (generation and regulation) of emotional experienc

    The role of the amygdala in the appraising brain

    Get PDF
    Lindquist et al. convincingly argue that the brain implements psychological operations that are constitutive of emotion rather than modules subserving discrete emotions. However, the nature of such psychological operations is open to debate. I argue that considering appraisal theories may provide alternative interpretations of the neuroimaging data with respect to the psychological operations involve
    corecore