12 research outputs found

    An Empirical Evaluation of Current Convolutional Architectures' Ability to Manage Nuisance Location and Scale Variability

    Full text link
    We conduct an empirical study to test the ability of Convolutional Neural Networks (CNNs) to reduce the effects of nuisance transformations of the input data, such as location, scale and aspect ratio. We isolate factors by adopting a common convolutional architecture either deployed globally on the image to compute class posterior distributions, or restricted locally to compute class conditional distributions given location, scale and aspect ratios of bounding boxes determined by proposal heuristics. In theory, averaging the latter should yield inferior performance compared to proper marginalization. Yet empirical evidence suggests the converse, leading us to conclude that - at the current level of complexity of convolutional architectures and scale of the data sets used to train them - CNNs are not very effective at marginalizing nuisance variability. We also quantify the effects of context on the overall classification task and its impact on the performance of CNNs, and propose improved sampling techniques for heuristic proposal schemes that improve end-to-end performance to state-of-the-art levels. We test our hypothesis on a classification task using the ImageNet Challenge benchmark and on a wide-baseline matching task using the Oxford and Fischer's datasets.Comment: 10 pages, 5 figures, 3 tables -- CVPR 2016, camera-ready versio

    Multi-view feature engineering and learning

    Full text link
    We frame the problem of local representation of imaging data as the computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. We show that, under very stringent condi-tions, these are related to “feature descriptors ” commonly used in Computer Vision. Such conditions can be relaxed if multiple views of the same scene are available. We pro-pose a sampling-based and a point-estimate based approx-imation of such a representation, compared empirically on image-to-(multiple)image matching, for which we introduce a multi-view wide-baseline matching benchmark, consisting of a mixture of real and synthetic objects with ground truth camera motion and dense three-dimensional geometry. 1

    Sampling Algorithms to Handle Nuisances in Large-Scale Recognition

    No full text
    Convolutional neural networks (CNNs) have risen to be the de-facto paragon for detecting the presence of objects in a scene, as portrayed by an image. CNNs are described as being "approximately invariant" to nuisance transformations such as planar translation, both by virtue of their convolutional architecture and by virtue of their approximation properties that, given sufficient parameters and training data, could in principle yield discriminants that are insensitive to nuisance transformations of the data. The fact that contemporary deep convolutional architectures appear very effective in classifying images as containing a given object regardless of its position, scale, and aspect ratio in large-scale benchmarks suggests that the network can effectively manage such nuisance variability. We conduct an empirical study and show that, contrary to popular belief, at the current level of complexity of convolutional architectures and scale of the data sets used to train them, CNNs are not very effective at marginalizing nuisance variability.This discovery leaves researchers the choice of investing more effort in the design of models that are less sensitive to nuisances or designing better region proposal algorithms in an effort to predict where the objects of interest lie and center the model around these regions. In this thesis steps towards both directions are made. First, we introduce DSP-CNN, which deploys domain-size pooling in order to transform the neural networks to be scale invariant in the convolutional operator level. Second, motivated by our empirical analysis, we propose novel sampling and pruning techniques for region proposal schemes that improve the end-to-end performance in large-scale classification, detection and wide-baseline correspondence to state-of-the-art levels. Additionally,since a proposal algorithm involves the design of a classifier, whose results are to be fed to another classifier (a Category CNN), it seems natural to leverage on the latter to design the former. Thus, we introduce a method that leverages on filters learned in the lower layers of CNNs to design a binary boosting classifier for generating class-agnostic proposals. Finally, we extend sampling over time by designing a temporal, hard-attention layer which is trained with reinforcement learning, with application in video sequences for person re-identification

    Theran wall paintings Digital Restoration

    No full text
    100 σ.Σκοπός μας είναι η αποκατάσταση των τοιχογραφιών του Ακρωτηρίου Θήρας, όπως περίπου θα το έκανε ένας ειδικός τεχνίτης, με ημι-αυτοματοποιημένο όμως τρόπο. Η ψηφιακή επιδιόρθωση ξεκινά με την αυτόματη αναγνώριση των φθαρμένων και ελλειπουσών περιοχών και συνεχίζει με την ενδοσυμπλήρωσή τους. Παρουσιάζονται αποτελέσματα πειραμάτων τόσο ενδοσυμπλήρωσης περιεχομένου με το μοντέλο ολικής μεταβολής όσο και ενδοσυμπλήρωσης υφής με τον αλγόριθμο Efros-Leung. Για εκτεταμένες ελλείψεις ο χρήστης ανασύρει πληροφορία αντίστοιχης σημασιολογίας και γεωμετρίας από άλλες περιοχές των τοιχογραφιών, η οποία επικολλάται στα κενά με αλγορίθμους οπτικά βέλτιστης συνένωσης εικόνων. Μελετώνται πρόσφατοι αλγόριθμοι σύνθεσης μωσαϊκού και για την εφαρμογή μας αξιοποιείται ο Tao et al. Ένας αλγόριθμος που χρησιμοποιεί στοιχεία μαθηματικής μορφολογίας προτείνεται για μία αδρή ανίχνευση των ελλείψεων. Στη συνέχεια ενσωματώνεται η πληροφορία των ακμών για πληρέστερη κάλυψη των ελλειπουσών περιοχών, με παράλληλη ελαφρά αύξηση της ακρίβειας της ανίχνευσής τους. Ο αλγόριθμος αυτός οδήγησε στην υλοποίηση μίας αρκετά αποτελεσματικής μεθόδου για την εξαγωγή της μάσκας των φθορών. Σε κάποιες περιπτώσεις σε μία προσπάθεια ακόμη πιο ακριβούς προσαρμογής της μάσκας πάνω στις ελλείψεις χρησιμοποιούνται επιπλέον τομές γράφων και συγκεκριμένα ο επαναληπτικός αλγόριθμος "GrabCut".We present a semi-automatic method for high quality restoration of Theran wall paintings, which is very close to the work of a specialist. Digital restoration begins with the automatic recognition of damaged and missing areas followed by their inpainting. We demonstrate the results of experiments doing structure inpainting with the Total Variation model and texture inpainting with the Efros-Leung algorithm. In the case of significant information loss, the user selects an area of similar semantics and geometry from a different location in the wall paintings, which is then stitched into the missing area seamlessly. We present an analysis of recent image compositing algorithms and demonstrate the results of Tao et al. on our problem. An algorithm based on mathematical morphology elements is proposed in order to do rough detection of damaged areas. We improve upon this approach by incorporating edge information, leading to more complete detection of damage, achieving higher recall, and usually higher precision. Our results using this approach lead to excellent identification of missing areas in the wall paintings. In some cases we further improve the extracted mask by using iterated graph cuts, specifically with the "GrabCut'' algorithm.Νικόλαος Η. Καριανάκη
    corecore