201 research outputs found
Handling variable shaped & high resolution images for multi-class classification problem
Convolutional Neural Networks (CNNs) are usually trained using a pre-determined
fixed spatial image size. While scale-invariance is considered important for visual
representations, CNNs are not scale invariant with respect to the spatial resolution
of the input image; since a change in image dimension may lead to a non-linear
change of their output. At the same time, there are applications (e.g. in medicine)
where images come in multiple scales and shapes not leaving any space for applying
common transformations with which images are deformed and shrinked losing
important information. Leaving high-resolution information can be a big also burden,
resource-wise, with high computational costs, memory and time requirements.
Like that there has been a shift of focus in research from parameter optimization
and connections readjustment towards an improved architectural design of the network;
since different state of the art networks such as Xception, ResNext, PolyNet
and others explore the effect of different transformations on CNNs’ learning capacity.
Instead of modifying the internals of CNNs METavlitĂł project focuses mainly on
the pre-processing stage of the network in order to handle high-resolution images, as
well as, the variability in their shape. METavlitĂł proposes two components, one for
clustering images’ resolution into buckets and a training component for scale invariant
learning employing an input agnostic architecture decreasing the average GPU
memory requirements. Compared to a classic approach which follows the common
pre-processing transformations (resizing & cropping) before training, our solution,
using the same architecture controls more the overfiting, increases the accuracy by
3 ����� 5% and decreases the average GPU memory needs by approximately 43% and
thus, the total duration of the training and validation time
BLADE: Filter Learning for General Purpose Computational Photography
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano,
Isidoro, and Milanfar is a computationally efficient image upscaling method
using a trained set of filters. We describe a generalization of RAISR, which we
name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable
edge-adaptive filtering framework that is general, simple, computationally
efficient, and useful for a wide range of problems in computational
photography. We show applications to operations which may appear in a camera
pipeline including denoising, demosaicing, and stylization
Mobile Wound Assessment and 3D Modeling from a Single Image
The prevalence of camera-enabled mobile phones have made mobile wound assessment a viable treatment option for millions of previously difficult to reach patients. We have designed a complete mobile wound assessment platform to ameliorate the many challenges related to chronic wound care. Chronic wounds and infections are the most severe, costly and fatal types of wounds, placing them at the center of mobile wound assessment. Wound physicians assess thousands of single-view wound images from all over the world, and it may be difficult to determine the location of the wound on the body, for example, if the wound is taken at close range. In our solution, end-users capture an image of the wound by taking a picture with their mobile camera. The wound image is segmented and classified using modern convolution neural networks, and is stored securely in the cloud for remote tracking. We use an interactive semi-automated approach to allow users to specify the location of the wound on the body. To accomplish this we have created, to the best our knowledge, the first 3D human surface anatomy labeling system, based off the current NYU and Anatomy Mapper labeling systems. To interactively view wounds in 3D, we have presented an efficient projective texture mapping algorithm for texturing wounds onto a 3D human anatomy model. In so doing, we have demonstrated an approach to 3D wound reconstruction that works even for a single wound image
Contextual cropping and scaling of TV productions
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
We present MooseNet, a trainable speech metric that predicts the listeners'
Mean Opinion Score (MOS). We propose a novel approach where the Probabilistic
Linear Discriminative Analysis (PLDA) generative model is used on top of an
embedding obtained from a self-supervised learning (SSL) neural network (NN)
model. We show that PLDA works well with a non-finetuned SSL model when trained
only on 136 utterances (ca. one minute training time) and that PLDA
consistently improves various neural MOS prediction models, even
state-of-the-art models with task-specific fine-tuning. Our ablation study
shows PLDA training superiority over SSL model fine-tuning in a low-resource
scenario. We also improve SSL model fine-tuning using a convenient optimizer
choice and additional contrastive and multi-task training objectives. The
fine-tuned MooseNet NN with the PLDA module achieves the best results,
surpassing the SSL baseline on the VoiceMOS Challenge data.Comment: Accepted to SSW 12: https://openreview.net/forum?id=V6RZk6RzS
Robust Detection of Non-overlapping Ellipses from Points with Applications to Circular Target Extraction in Images and Cylinder Detection in Point Clouds
This manuscript provides a collection of new methods for the automated
detection of non-overlapping ellipses from edge points. The methods introduce
new developments in: (i) robust Monte Carlo-based ellipse fitting to
2-dimensional (2D) points in the presence of outliers; (ii) detection of
non-overlapping ellipse from 2D edge points; and (iii) extraction of cylinder
from 3D point clouds. The proposed methods were thoroughly compared with
established state-of-the-art methods, using simulated and real-world datasets,
through the design of four sets of original experiments. It was found that the
proposed robust ellipse detection was superior to four reliable robust methods,
including the popular least median of squares, in both simulated and real-world
datasets. The proposed process for detecting non-overlapping ellipses achieved
F-measure of 99.3% on real images, compared to F-measures of 42.4%, 65.6%, and
59.2%, obtained using the methods of Fornaciari, Patraucean, and Panagiotakis,
respectively. The proposed cylinder extraction method identified all detectable
mechanical pipes in two real-world point clouds, obtained under laboratory, and
industrial construction site conditions. The results of this investigation show
promise for the application of the proposed methods for automatic extraction of
circular targets from images and pipes from point clouds
Visual Odometry Estimation Using Selective Features
The rapid growth in computational power and technology has enabled the automotive industry to do extensive research into autonomous vehicles. So called self- driven cars are seen everywhere, being developed from many companies like, Google, Mercedes Benz, Delphi, Tesla, Uber and many others. One of the challenging tasks for these vehicles is to track incremental motion in runtime and to analyze surroundings for accurate localization. This crucial information is used by many internal systems like active suspension control, autonomous steering, lane change assist and many such applications. All these systems rely on incremental motion to infer logical conclusions. Measurement of incremental change in pose or perspective, in other words, changes in motion, measured using visual only information is called Visual Odometry. This thesis proposes an approach to solve the Visual Odometry problem by using stereo-camera vision to incrementally estimate the pose of a vehicle by examining changes that motion induces on the background in the frame captured from stereo cameras.
The approach in this thesis research uses a selective feature based motion tracking method to track the motion of the vehicle by analyzing the motion of its static surroundings and discarding the motion induced by dynamic background (outliers). The proposed approach considers that the surrounding may have moving objects like a truck, a car or a pedestrian body which has its own motion which may be different with respect to the vehicle. Use of stereo camera adds depth information which provides more crucial information necessary for detecting and rejecting outliers. Refining the interest point location using sinusoidal interpolation further increases the accuracy of the motion estimation results. The results show that by using a process that chooses features only on the static background and by tracking these features accurately, robust semantic information can be obtained
Omnidirectional Stereo Vision for Autonomous Vehicles
Environment perception with cameras is an important requirement for many applications for autonomous vehicles and robots. This work presents a stereoscopic omnidirectional camera system for autonomous vehicles which resolves the problem of a limited field of view and provides a 360° panoramic view of the environment. We present a new projection model for these cameras and show that the camera setup overcomes major drawbacks of traditional perspective cameras in many applications
- …