201 research outputs found

    Handling variable shaped & high resolution images for multi-class classification problem

    Get PDF
    Convolutional Neural Networks (CNNs) are usually trained using a pre-determined fixed spatial image size. While scale-invariance is considered important for visual representations, CNNs are not scale invariant with respect to the spatial resolution of the input image; since a change in image dimension may lead to a non-linear change of their output. At the same time, there are applications (e.g. in medicine) where images come in multiple scales and shapes not leaving any space for applying common transformations with which images are deformed and shrinked losing important information. Leaving high-resolution information can be a big also burden, resource-wise, with high computational costs, memory and time requirements. Like that there has been a shift of focus in research from parameter optimization and connections readjustment towards an improved architectural design of the network; since different state of the art networks such as Xception, ResNext, PolyNet and others explore the effect of different transformations on CNNs’ learning capacity. Instead of modifying the internals of CNNs METavlitó project focuses mainly on the pre-processing stage of the network in order to handle high-resolution images, as well as, the variability in their shape. METavlitó proposes two components, one for clustering images’ resolution into buckets and a training component for scale invariant learning employing an input agnostic architecture decreasing the average GPU memory requirements. Compared to a classic approach which follows the common pre-processing transformations (resizing & cropping) before training, our solution, using the same architecture controls more the overfiting, increases the accuracy by 3 ����� 5% and decreases the average GPU memory needs by approximately 43% and thus, the total duration of the training and validation time

    Nonmetric lens distortion calibration: closed-form solutions, robust estimation and model selection

    Full text link

    BLADE: Filter Learning for General Purpose Computational Photography

    Full text link
    The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of problems in computational photography. We show applications to operations which may appear in a camera pipeline including denoising, demosaicing, and stylization

    Mobile Wound Assessment and 3D Modeling from a Single Image

    Get PDF
    The prevalence of camera-enabled mobile phones have made mobile wound assessment a viable treatment option for millions of previously difficult to reach patients. We have designed a complete mobile wound assessment platform to ameliorate the many challenges related to chronic wound care. Chronic wounds and infections are the most severe, costly and fatal types of wounds, placing them at the center of mobile wound assessment. Wound physicians assess thousands of single-view wound images from all over the world, and it may be difficult to determine the location of the wound on the body, for example, if the wound is taken at close range. In our solution, end-users capture an image of the wound by taking a picture with their mobile camera. The wound image is segmented and classified using modern convolution neural networks, and is stored securely in the cloud for remote tracking. We use an interactive semi-automated approach to allow users to specify the location of the wound on the body. To accomplish this we have created, to the best our knowledge, the first 3D human surface anatomy labeling system, based off the current NYU and Anatomy Mapper labeling systems. To interactively view wounds in 3D, we have presented an efficient projective texture mapping algorithm for texturing wounds onto a 3D human anatomy model. In so doing, we have demonstrated an approach to 3D wound reconstruction that works even for a single wound image

    Contextual cropping and scaling of TV productions

    Get PDF
    This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows

    MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

    Full text link
    We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion Score (MOS). We propose a novel approach where the Probabilistic Linear Discriminative Analysis (PLDA) generative model is used on top of an embedding obtained from a self-supervised learning (SSL) neural network (NN) model. We show that PLDA works well with a non-finetuned SSL model when trained only on 136 utterances (ca. one minute training time) and that PLDA consistently improves various neural MOS prediction models, even state-of-the-art models with task-specific fine-tuning. Our ablation study shows PLDA training superiority over SSL model fine-tuning in a low-resource scenario. We also improve SSL model fine-tuning using a convenient optimizer choice and additional contrastive and multi-task training objectives. The fine-tuned MooseNet NN with the PLDA module achieves the best results, surpassing the SSL baseline on the VoiceMOS Challenge data.Comment: Accepted to SSW 12: https://openreview.net/forum?id=V6RZk6RzS

    Robust Detection of Non-overlapping Ellipses from Points with Applications to Circular Target Extraction in Images and Cylinder Detection in Point Clouds

    Full text link
    This manuscript provides a collection of new methods for the automated detection of non-overlapping ellipses from edge points. The methods introduce new developments in: (i) robust Monte Carlo-based ellipse fitting to 2-dimensional (2D) points in the presence of outliers; (ii) detection of non-overlapping ellipse from 2D edge points; and (iii) extraction of cylinder from 3D point clouds. The proposed methods were thoroughly compared with established state-of-the-art methods, using simulated and real-world datasets, through the design of four sets of original experiments. It was found that the proposed robust ellipse detection was superior to four reliable robust methods, including the popular least median of squares, in both simulated and real-world datasets. The proposed process for detecting non-overlapping ellipses achieved F-measure of 99.3% on real images, compared to F-measures of 42.4%, 65.6%, and 59.2%, obtained using the methods of Fornaciari, Patraucean, and Panagiotakis, respectively. The proposed cylinder extraction method identified all detectable mechanical pipes in two real-world point clouds, obtained under laboratory, and industrial construction site conditions. The results of this investigation show promise for the application of the proposed methods for automatic extraction of circular targets from images and pipes from point clouds

    Visual Odometry Estimation Using Selective Features

    Get PDF
    The rapid growth in computational power and technology has enabled the automotive industry to do extensive research into autonomous vehicles. So called self- driven cars are seen everywhere, being developed from many companies like, Google, Mercedes Benz, Delphi, Tesla, Uber and many others. One of the challenging tasks for these vehicles is to track incremental motion in runtime and to analyze surroundings for accurate localization. This crucial information is used by many internal systems like active suspension control, autonomous steering, lane change assist and many such applications. All these systems rely on incremental motion to infer logical conclusions. Measurement of incremental change in pose or perspective, in other words, changes in motion, measured using visual only information is called Visual Odometry. This thesis proposes an approach to solve the Visual Odometry problem by using stereo-camera vision to incrementally estimate the pose of a vehicle by examining changes that motion induces on the background in the frame captured from stereo cameras. The approach in this thesis research uses a selective feature based motion tracking method to track the motion of the vehicle by analyzing the motion of its static surroundings and discarding the motion induced by dynamic background (outliers). The proposed approach considers that the surrounding may have moving objects like a truck, a car or a pedestrian body which has its own motion which may be different with respect to the vehicle. Use of stereo camera adds depth information which provides more crucial information necessary for detecting and rejecting outliers. Refining the interest point location using sinusoidal interpolation further increases the accuracy of the motion estimation results. The results show that by using a process that chooses features only on the static background and by tracking these features accurately, robust semantic information can be obtained

    Omnidirectional Stereo Vision for Autonomous Vehicles

    Get PDF
    Environment perception with cameras is an important requirement for many applications for autonomous vehicles and robots. This work presents a stereoscopic omnidirectional camera system for autonomous vehicles which resolves the problem of a limited field of view and provides a 360° panoramic view of the environment. We present a new projection model for these cameras and show that the camera setup overcomes major drawbacks of traditional perspective cameras in many applications
    • …
    corecore