760 research outputs found

    Deep CNN and MLP-based vision systems for algae detection in automatic inspection of underwater pipelines

    Get PDF
    Artificial neural networks, such as the multilayer perceptron (MLP), have been increasingly employed in various applications. Recently, deep neural networks, specially convolutional neural networks (CNN), have received considerable attention due to their ability to extract and represent high-level abstractions in data sets. This work describes a vision inspection system based on deep learning and computer vision algorithms for detection of algae in underwater pipelines. The proposed algorithm comprises a CNN or a MLP network, followed by a post-processing stage operating in spatial and temporal domains, employing clustering of neighboring detection positions and a region interception framebuffer. The performances of MLP, employing different descriptors, and CNN classifiers are compared in real-world scenarios. It is shown that the post-processing stage considerably decreases the number of false positives, resulting in an accuracy rate of 99.39%.Redes neurais artificiais, como o perceptron multicamada (MLP), têm sido cada vez mais empregadas em várias aplicações. Recentemente, as redes neurais profundas (deep neural networks), especialmente as redes neurais convolutivas (CNN), receberam atenção considerável devido à sua capacidade de extrair e representar abstrações de alto nível em conjuntos de dados. Esta dissertação descreve um sistema de inspeção automático baseado em algoritmos de aprendizado profundo (deep learning) e visão computacional para detecção de algas em dutos submarinos. O algoritmo proposto compreende uma rede CNN ou MLP, seguida de uma fase de pós-processamento que opera em domínios espaciais e temporais, empregando agrupamento de posições de detecção vizinhas e um buffer das regiões de interseção ao longo dos quadros. Os desempenhos de MLP, empregando diferentes descritores, e os classificadores CNN são comparados em cenários do mundo real. Mostra-se que a fase de pos-processamento diminui consideravelmente o número de falsos positivos, resultando em uma taxa de acerto de 99,39%

    Optimal Transport for Domain Adaptation

    Get PDF
    Domain adaptation from one data space (or domain) to another is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly, models built on a specific data space become more robust when confronted to data depicting the same semantic concepts (the classes), but observed by another observation system with its own specificities. Among the many strategies proposed to adapt a domain to another, finding a common representation has shown excellent properties: by finding a common representation for both domains, a single classifier can be effective in both and use labelled samples from the source domain to predict the unlabelled samples of the target domain. In this paper, we propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source and target domains. We learn a transportation plan matching both PDFs, which constrains labelled samples in the source domain to remain close during transport. This way, we exploit at the same time the few labeled information in the source and the unlabelled distributions observed in both domains. Experiments in toy and challenging real visual adaptation examples show the interest of the method, that consistently outperforms state of the art approaches

    Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

    Full text link
    Instrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms can be extremely time-consuming and expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically. Unfortunately, models trained purely on simulated data often fail to generalize to the real world. We study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images. We extensively evaluate our approaches with a total of more than 25,000 physical test grasps, studying a range of simulation conditions and domain adaptation methods, including a novel extension of pixel-level domain adaptation that we term the GraspGAN. We show that, by using synthetic data and domain adaptation, we are able to reduce the number of real-world samples needed to achieve a given level of performance by up to 50 times, using only randomly generated simulated objects. We also show that by using only unlabeled real-world data and our GraspGAN methodology, we obtain real-world grasping performance without any real-world labels that is similar to that achieved with 939,777 labeled real-world samples.Comment: 9 pages, 5 figures, 3 table

    Interpreting Deep Visual Representations via Network Dissection

    Full text link
    The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations. The proposed method quantifies the interpretability of CNN representations by evaluating the alignment between individual hidden units and a set of visual semantic concepts. By identifying the best alignments, units are given human interpretable labels across a range of objects, parts, scenes, textures, materials, and colors. The method reveals that deep representations are more transparent and interpretable than expected: we find that representations are significantly more interpretable than they would be under a random equivalently powerful basis. We apply the method to interpret and compare the latent representations of various network architectures trained to solve different supervised and self-supervised training tasks. We then examine factors affecting the network interpretability such as the number of the training iterations, regularizations, different initializations, and the network depth and width. Finally we show that the interpreted units can be used to provide explicit explanations of a prediction given by a CNN for an image. Our results highlight that interpretability is an important property of deep neural networks that provides new insights into their hierarchical structure.Comment: *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figure

    Automatic Monitoring Cheese Ripeness Using Computer Vision and Artificial Intelligence

    Get PDF
    Ripening is a very important process that contributes to cheese quality, as its characteristics are determined by the biochemical changes that occur during this period. Therefore, monitoring ripening time is a fundamental task to market a quality product in a timely manner. However, it is difficult to accurately determine the degree of cheese ripeness. Although some scientific methods have also been proposed in the literature, the conventional methods adopted in dairy industries are typically based on visual and weight control. This study proposes a novel approach aimed at automatically monitoring the cheese ripening based on the analysis of cheese images acquired by a photo camera. Both computer vision and machine learning techniques have been used to deal with this task. The study is based on a dataset of 195 images (specifically collected from an Italian dairy industry), which represent Pecorino cheese forms at four degrees of ripeness. All stages but the one labeled as 'day 18', which has 45 images, consist of 50 images. These images have been handled with image processing techniques and then classified according to the degree of ripening, i.e., 18, 22, 24, and 30 days. A 5-fold cross-validation strategy was used to empirically evaluate the performance of the models. During this phase, each training fold was augmented online. This strategy allowed to use 624 images for training, leaving 39 original images per fold for testing. Experimental results have demonstrated the validity of the approach, showing good performance for most of the trained models
    • …
    corecore