1,398 research outputs found

    デバイスの限界を超えた正確な撮像を可能にする深層学習

    Get PDF
    Tohoku University博士(情報科学)thesi

    Training methods for facial image comparison: a literature review

    Get PDF
    This literature review was commissioned to explore the psychological literature relating to facial image comparison with a particular emphasis on whether individuals can be trained to improve performance on this task. Surprisingly few studies have addressed this question directly. As a consequence, this review has been extended to cover training of face recognition and training of different kinds of perceptual comparisons where we are of the opinion that the methodologies or findings of such studies are informative. The majority of studies of face processing have examined face recognition, which relies heavily on memory. This may be memory for a face that was learned recently (e.g. minutes or hours previously) or for a face learned longer ago, perhaps after many exposures (e.g. friends, family members, celebrities). Successful face recognition, irrespective of the type of face, relies on the ability to retrieve the to-berecognised face from long-term memory. This memory is then compared to the physically present image to reach a recognition decision. In contrast, in face matching task two physical representations of a face (live, photographs, movies) are compared and so long-term memory is not involved. Because the comparison is between two present stimuli rather than between a present stimulus and a memory, one might expect that face matching, even if not an easy task, would be easier to do and easier to learn than face recognition. In support of this, there is evidence that judgment tasks where a presented stimulus must be judged by a remembered standard are generally more cognitively demanding than judgments that require comparing two presented stimuli Davies & Parasuraman, 1982; Parasuraman & Davies, 1977; Warm and Dember, 1998). Is there enough overlap between face recognition and matching that it is useful to look at the literature recognition? No study has directly compared face recognition and face matching, so we turn to research in which people decided whether two non-face stimuli were the same or different. In these studies, accuracy of comparison is not always better when the comparator is present than when it is remembered. Further, all perceptual factors that were found to affect comparisons of simultaneously presented objects also affected comparisons of successively presented objects in qualitatively the same way. Those studies involved judgments about colour (Newhall, Burnham & Clark, 1957; Romero, Hita & Del Barco, 1986), and shape (Larsen, McIlhagga & Bundesen, 1999; Lawson, Bülthoff & Dumbell, 2003; Quinlan, 1995). Although one must be cautious in generalising from studies of object processing to studies of face processing (see, e.g., section comparing face processing to object processing), from these kinds of studies there is no evidence to suggest that there are qualitative differences in the perceptual aspects of how recognition and matching are done. As a result, this review will include studies of face recognition skill as well as face matching skill. The distinction between face recognition involving memory and face matching not involving memory is clouded in many recognition studies which require observers to decide which of many presented faces matches a remembered face (e.g., eyewitness studies). And of course there are other forensic face-matching tasks that will require comparison to both presented and remembered comparators (e.g., deciding whether any person in a video showing a crowd is the target person). For this reason, too, we choose to include studies of face recognition as well as face matching in our revie

    Learning Dense Correspondences between Photos and Sketches

    Full text link
    Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, PSC6k\textit{PSC6k}, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: https://photo-sketch-correspondence.github.ioComment: Accepted to ICML 2023. Project page: https://photo-sketch-correspondence.github.i

    Sequential Optimization for Efficient High-Quality Object Proposal Generation

    Full text link
    We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING++, which inherits the virtue of good computational efficiency of BING but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5% and 16.7% on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster.Comment: Accepted by TPAM

    Sequential optimization for efficient high-quality object proposal generation

    Full text link
    We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING ++, which inherits the virtue of good computational efficiency of BING [1] but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5 and 16.7 percent on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster

    Training and Predicting Visual Error for Real-Time Applications

    Full text link
    Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual characteristics of the human visual system. However, their complexity, computational expense, and reliance on reference images to compare against prevent their generalized use in real-time, restricting such applications to using only the simplest available metrics. In this work, we explore the abilities of convolutional neural networks to predict a variety of visual metrics without requiring either reference or rendered images. Specifically, we train and deploy a neural network to estimate the visual error resulting from reusing shading or using reduced shading rates. The resulting models account for 70%-90% of the variance while achieving up to an order of magnitude faster computation times. Our solution combines image-space information that is readily available in most state-of-the-art deferred shading pipelines with reprojection from previous frames to enable an adequate estimate of visual errors, even in previously unseen regions. We describe a suitable convolutional network architecture and considerations for data preparation for training. We demonstrate the capability of our network to predict complex error metrics at interactive rates in a real-time application that implements content-adaptive shading in a deferred pipeline. Depending on the portion of unseen image regions, our approach can achieve up to 2×2\times performance compared to state-of-the-art methods.Comment: Published at Proceedings of the ACM in Computer Graphics and Interactive Techniques. 14 Pages, 16 Figures, 3 Tables. For paper website and higher quality figures, see https://jaliborc.github.io/rt-percept

    Noise-Enhanced and Human Visual System-Driven Image Processing: Algorithms and Performance Limits

    Get PDF
    This dissertation investigates the problem of image processing based on stochastic resonance (SR) noise and human visual system (HVS) properties, where several novel frameworks and algorithms for object detection in images, image enhancement and image segmentation as well as the method to estimate the performance limit of image segmentation algorithms are developed. Object detection in images is a fundamental problem whose goal is to make a decision if the object of interest is present or absent in a given image. We develop a framework and algorithm to enhance the detection performance of suboptimal detectors using SR noise, where we add a suitable dose of noise into the original image data and obtain the performance improvement. Micro-calcification detection is employed in this dissertation as an illustrative example. The comparative experiments with a large number of images verify the efficiency of the presented approach. Image enhancement plays an important role and is widely used in various vision tasks. We develop two image enhancement approaches. One is based on SR noise, HVS-driven image quality evaluation metrics and the constrained multi-objective optimization (MOO) technique, which aims at refining the existing suboptimal image enhancement methods. Another is based on the selective enhancement framework, under which we develop several image enhancement algorithms. The two approaches are applied to many low quality images, and they outperform many existing enhancement algorithms. Image segmentation is critical to image analysis. We present two segmentation algorithms driven by HVS properties, where we incorporate the human visual perception factors into the segmentation procedure and encode the prior expectation on the segmentation results into the objective functions through Markov random fields (MRF). Our experimental results show that the presented algorithms achieve higher segmentation accuracy than many representative segmentation and clustering algorithms available in the literature. Performance limit, or performance bound, is very useful to evaluate different image segmentation algorithms and to analyze the segmentability of the given image content. We formulate image segmentation as a parameter estimation problem and derive a lower bound on the segmentation error, i.e., the mean square error (MSE) of the pixel labels considered in our work, using a modified Cramér-Rao bound (CRB). The derivation is based on the biased estimator assumption, whose reasonability is verified in this dissertation. Experimental results demonstrate the validity of the derived bound

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms
    corecore