760 research outputs found
Deep CNN and MLP-based vision systems for algae detection in automatic inspection of underwater pipelines
Artificial neural networks, such as the multilayer perceptron (MLP), have been increasingly employed in various applications. Recently, deep neural networks, specially convolutional neural networks (CNN), have received considerable attention due to their ability to extract and represent high-level abstractions in data sets. This work describes a vision inspection system based on deep learning and computer vision algorithms for detection of algae in underwater pipelines. The proposed algorithm comprises a CNN or a MLP network, followed by a post-processing stage operating in spatial and temporal domains, employing clustering of neighboring detection positions and a region interception framebuffer. The performances of MLP, employing different descriptors, and CNN classifiers are compared in real-world scenarios. It is shown that the post-processing stage considerably decreases the number of false positives, resulting in an accuracy rate of 99.39%.Redes neurais artificiais, como o perceptron multicamada (MLP), tĂŞm sido cada vez mais empregadas em várias aplicações. Recentemente, as redes neurais profundas (deep neural networks), especialmente as redes neurais convolutivas (CNN), receberam atenção considerável devido Ă sua capacidade de extrair e representar abstrações de alto nĂvel em conjuntos de dados. Esta dissertação descreve um sistema de inspeção automático baseado em algoritmos de aprendizado profundo (deep learning) e visĂŁo computacional para detecção de algas em dutos submarinos. O algoritmo proposto compreende uma rede CNN ou MLP, seguida de uma fase de pĂłs-processamento que opera em domĂnios espaciais e temporais, empregando agrupamento de posições de detecção vizinhas e um buffer das regiões de interseção ao longo dos quadros. Os desempenhos de MLP, empregando diferentes descritores, e os classificadores CNN sĂŁo comparados em cenários do mundo real. Mostra-se que a fase de pos-processamento diminui consideravelmente o nĂşmero de falsos positivos, resultando em uma taxa de acerto de 99,39%
Optimal Transport for Domain Adaptation
Domain adaptation from one data space (or domain) to another is one of the
most challenging tasks of modern data analytics. If the adaptation is done
correctly, models built on a specific data space become more robust when
confronted to data depicting the same semantic concepts (the classes), but
observed by another observation system with its own specificities. Among the
many strategies proposed to adapt a domain to another, finding a common
representation has shown excellent properties: by finding a common
representation for both domains, a single classifier can be effective in both
and use labelled samples from the source domain to predict the unlabelled
samples of the target domain. In this paper, we propose a regularized
unsupervised optimal transportation model to perform the alignment of the
representations in the source and target domains. We learn a transportation
plan matching both PDFs, which constrains labelled samples in the source domain
to remain close during transport. This way, we exploit at the same time the few
labeled information in the source and the unlabelled distributions observed in
both domains. Experiments in toy and challenging real visual adaptation
examples show the interest of the method, that consistently outperforms state
of the art approaches
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Instrumenting and collecting annotated visual grasping datasets to train
modern machine learning algorithms can be extremely time-consuming and
expensive. An appealing alternative is to use off-the-shelf simulators to
render synthetic data for which ground-truth annotations are generated
automatically. Unfortunately, models trained purely on simulated data often
fail to generalize to the real world. We study how randomized simulated
environments and domain adaptation methods can be extended to train a grasping
system to grasp novel objects from raw monocular RGB images. We extensively
evaluate our approaches with a total of more than 25,000 physical test grasps,
studying a range of simulation conditions and domain adaptation methods,
including a novel extension of pixel-level domain adaptation that we term the
GraspGAN. We show that, by using synthetic data and domain adaptation, we are
able to reduce the number of real-world samples needed to achieve a given level
of performance by up to 50 times, using only randomly generated simulated
objects. We also show that by using only unlabeled real-world data and our
GraspGAN methodology, we obtain real-world grasping performance without any
real-world labels that is similar to that achieved with 939,777 labeled
real-world samples.Comment: 9 pages, 5 figures, 3 table
Interpreting Deep Visual Representations via Network Dissection
The success of recent deep convolutional neural networks (CNNs) depends on
learning hidden representations that can summarize the important factors of
variation behind the data. However, CNNs often criticized as being black boxes
that lack interpretability, since they have millions of unexplained model
parameters. In this work, we describe Network Dissection, a method that
interprets networks by providing labels for the units of their deep visual
representations. The proposed method quantifies the interpretability of CNN
representations by evaluating the alignment between individual hidden units and
a set of visual semantic concepts. By identifying the best alignments, units
are given human interpretable labels across a range of objects, parts, scenes,
textures, materials, and colors. The method reveals that deep representations
are more transparent and interpretable than expected: we find that
representations are significantly more interpretable than they would be under a
random equivalently powerful basis. We apply the method to interpret and
compare the latent representations of various network architectures trained to
solve different supervised and self-supervised training tasks. We then examine
factors affecting the network interpretability such as the number of the
training iterations, regularizations, different initializations, and the
network depth and width. Finally we show that the interpreted units can be used
to provide explicit explanations of a prediction given by a CNN for an image.
Our results highlight that interpretability is an important property of deep
neural networks that provides new insights into their hierarchical structure.Comment: *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27
figure
Automatic Monitoring Cheese Ripeness Using Computer Vision and Artificial Intelligence
Ripening is a very important process that contributes to cheese quality, as its characteristics are determined by the biochemical changes that occur during this period. Therefore, monitoring ripening time is a fundamental task to market a quality product in a timely manner. However, it is difficult to accurately determine the degree of cheese ripeness. Although some scientific methods have also been proposed in the literature, the conventional methods adopted in dairy industries are typically based on visual and weight control. This study proposes a novel approach aimed at automatically monitoring the cheese ripening based on the analysis of cheese images acquired by a photo camera. Both computer vision and machine learning techniques have been used to deal with this task. The study is based on a dataset of 195 images (specifically collected from an Italian dairy industry), which represent Pecorino cheese forms at four degrees of ripeness. All stages but the one labeled as 'day 18', which has 45 images, consist of 50 images. These images have been handled with image processing techniques and then classified according to the degree of ripening, i.e., 18, 22, 24, and 30 days. A 5-fold cross-validation strategy was used to empirically evaluate the performance of the models. During this phase, each training fold was augmented online. This strategy allowed to use 624 images for training, leaving 39 original images per fold for testing. Experimental results have demonstrated the validity of the approach, showing good performance for most of the trained models
- …