17,662 research outputs found
Mobile machine vision for railway surveillance system using deep learning algorithm
Trains have been a popular transportation in our daily life. However, there is no proper surveillance system for obstacle detection at the railway, leading to the happen of unwanted accidents. In order to overcome this issue, machine vision embedded with deep learning algorithm can be implemented. Obstacle detection can be achieved through vision-based object detection, where the object classification model computes the images similarity to its respective classes, classifying its potential as an obstacle. In this paper, object detection model is developed and implemented with deep learning algorithm. Object classification model is produced through the model training with Deep Neural Networks (DNN). The detection model used in this paper is Single-Shot multibox Detection (SSD) MobileNet detection model. This model can be implemented with Raspberry Pi to simulate the object detection algorithm virtually. During simulation, the object recognition algorithm is able to detect and classify various objects into its respective classes. By applying past research approaches, the developed object detection model is able to analyze image as well as real-time video feed to identify multiple objects. Any object that has been detected at the Region of Interest (ROI) can be characterized as an obstacle
Domain generalization in deep learning-based mass detection in mammography: A large-scale multi-center study
Computer-aided detection systems based on deep learning have shown great
potential in breast cancer detection. However, the lack of domain
generalization of artificial neural networks is an important obstacle to their
deployment in changing clinical environments. In this work, we explore the
domain generalization of deep learning methods for mass detection in digital
mammography and analyze in-depth the sources of domain shift in a large-scale
multi-center setting. To this end, we compare the performance of eight
state-of-the-art detection methods, including Transformer-based models, trained
in a single domain and tested in five unseen domains. Moreover, a single-source
mass detection training pipeline is designed to improve the domain
generalization without requiring images from the new domain. The results show
that our workflow generalizes better than state-of-the-art transfer
learning-based approaches in four out of five domains while reducing the domain
shift caused by the different acquisition protocols and scanner manufacturers.
Subsequently, an extensive analysis is performed to identify the covariate
shifts with bigger effects on the detection performance, such as due to
differences in patient age, breast density, mass size, and mass malignancy.
Ultimately, this comprehensive study provides key insights and best practices
for future research on domain generalization in deep learning-based breast
cancer detection
LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks
In this work, a deep learning approach has been developed to carry out road
detection by fusing LIDAR point clouds and camera images. An unstructured and
sparse point cloud is first projected onto the camera image plane and then
upsampled to obtain a set of dense 2D images encoding spatial information.
Several fully convolutional neural networks (FCNs) are then trained to carry
out road detection, either by using data from a single sensor, or by using
three fusion strategies: early, late, and the newly proposed cross fusion.
Whereas in the former two fusion approaches, the integration of multimodal
information is carried out at a predefined depth level, the cross fusion FCN is
designed to directly learn from data where to integrate information; this is
accomplished by using trainable cross connections between the LIDAR and the
camera processing branches.
To further highlight the benefits of using a multimodal system for road
detection, a data set consisting of visually challenging scenes was extracted
from driving sequences of the KITTI raw data set. It was then demonstrated
that, as expected, a purely camera-based FCN severely underperforms on this
data set. A multimodal system, on the other hand, is still able to provide high
accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI
road benchmark where it achieved excellent performance, with a MaxF score of
96.03%, ranking it among the top-performing approaches
Detecting the Unexpected via Image Resynthesis
Classical semantic segmentation methods, including the recent deep learning
ones, assume that all classes observed at test time have been seen during
training. In this paper, we tackle the more realistic scenario where unexpected
objects of unknown classes can appear at test time. The main trends in this
area either leverage the notion of prediction uncertainty to flag the regions
with low confidence as unknown, or rely on autoencoders and highlight
poorly-decoded regions. Having observed that, in both cases, the detected
regions typically do not correspond to unexpected objects, in this paper, we
introduce a drastically different strategy: It relies on the intuition that the
network will produce spurious labels in regions depicting unexpected objects.
Therefore, resynthesizing the image from the resulting semantic map will yield
significant appearance differences with respect to the input image. In other
words, we translate the problem of detecting unknown classes to one of
identifying poorly-resynthesized image regions. We show that this outperforms
both uncertainty- and autoencoder-based methods
How do neural networks see depth in single images?
Deep neural networks have lead to a breakthrough in depth estimation from
single images. Recent work often focuses on the accuracy of the depth map,
where an evaluation on a publicly available test set such as the KITTI vision
benchmark is often the main result of the article. While such an evaluation
shows how well neural networks can estimate depth, it does not show how they do
this. To the best of our knowledge, no work currently exists that analyzes what
these networks have learned.
In this work we take the MonoDepth network by Godard et al. and investigate
what visual cues it exploits for depth estimation. We find that the network
ignores the apparent size of known obstacles in favor of their vertical
position in the image. Using the vertical position requires the camera pose to
be known; however we find that MonoDepth only partially corrects for changes in
camera pitch and roll and that these influence the estimated depth towards
obstacles. We further show that MonoDepth's use of the vertical image position
allows it to estimate the distance towards arbitrary obstacles, even those not
appearing in the training set, but that it requires a strong edge at the ground
contact point of the object to do so. In future work we will investigate
whether these observations also apply to other neural networks for monocular
depth estimation.Comment: Submitte
- …