153 research outputs found

    Do We Train on Test Data? Purging CIFAR of Near-Duplicates

    Full text link
    The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the "fair CIFAR" (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. The ciFAIR dataset and pre-trained models are available at https://cvjena.github.io/cifair/, where we also maintain a leaderboard.Comment: Journal of Imagin

    Automatic Query Image Disambiguation for Content-Based Image Retrieval

    Full text link
    Query images presented to content-based image retrieval systems often have various different interpretations, making it difficult to identify the search objective pursued by the user. We propose a technique for overcoming this ambiguity, while keeping the amount of required user interaction at a minimum. To achieve this, the neighborhood of the query image is divided into coherent clusters from which the user may choose the relevant ones. A novel feedback integration technique is then employed to re-rank the entire database with regard to both the user feedback and the original query. We evaluate our approach on the publicly available MIRFLICKR-25K dataset, where it leads to a relative improvement of average precision by 23% over the baseline retrieval, which does not distinguish between different image senses.Comment: VISAPP 2018 paper, 8 pages, 5 figures. Source code: https://github.com/cvjena/ai

    Hierarchy-based Image Embeddings for Semantic Image Retrieval

    Full text link
    Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class embeddings whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding does not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds, and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code: https://github.com/cvjena/semantic-embedding

    Active Self Calibration of a Multi Sensor System

    Get PDF
    The combination of a multi camera system with different sensor types like PMD cameras or motion sensors is called multi sensor system. Such systems offer many different application scenarios, e.g. motion studies of animals and sportsmen, 3D reconstruction or object tracking tasks. In order to work properly, each of this applications needs an accurately calibrated multi sensor system. Calibration consists of estimating the intrinsic parameters of each camera and determining the relative poses (rotation and translation) between the sensors. The second step is known as extrinsic calibration and forms the focus of this work. Self-calibration of a multi sensor system is desirable since a manual calibration is a time consuming and difficult task

    Computer Vision in Camera Networks for Analyzing Complex Dynamic Natural Scenes

    Get PDF
    Sensor or camera networks will play an important role in future applications, from surveillance tasks for workplace safety or security in general, over driver assisting systems in automotive and last but not least intelligent homes or assisted living for the elderly. Computer vision in sensor or camera networks defines a couple of currently unsolved problems. First of all, how can we calibrate cameras distributed arbitrarily in the scene without placing artificial or natural calibration patterns in the scene? Second, how do we select and fuse the information provided by different, also multimodal sensors to solve a given problem? Finally, can we handle reconstruction, recognition and tracking tasks in complex and highly dynamic natural scenes which are in almost all cases the environment camera networks are designed for

    3-D Reconstruction in Piecewise Planar Environments

    Get PDF
    The structure-from-motion problem is central in applications like visual robot navigation and visual 3d modeling. Typical solutions split the problem into feature tracking and geometric reconstruction steps. Instead we present a combined solution, where the tracking step is implicitly supported by a feedback of 3d information, and the geometric reconstruction is statistically optimal in case of Gaussian noise on image intensities. Experiments confirm an increased accuracy and reliability, and despite a significant computational overhead, the combined solution still runs at 5-10 fps

    Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets

    Full text link
    We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time budgets during application. They allow for individual budgets given a priori for each test example and for anytime prediction, i.e., a possible interruption at multiple stages during inference while still providing output estimates. Our approach can therefore tackle the computational costs and energy demands of DNNs in an adaptive manner, a property essential for real-time applications. Our Impatient DNNs are based on a new general framework of learning dynamic budget predictors using risk minimization, which can be applied to current DNN architectures by adding early prediction and additional loss layers. A key aspect of our method is that all of the intermediate predictors are learned jointly. In experiments, we evaluate our approach for different budget distributions, architectures, and datasets. Our results show a significant gain in expected accuracy compared to common baselines.Comment: British Machine Vision Conference (BMVC) 201
    • …
    corecore