174 research outputs found
Iteratively Optimized Patch Label Inference Network for Automatic Pavement Disease Detection
We present a novel deep learning framework named the Iteratively Optimized
Patch Label Inference Network (IOPLIN) for automatically detecting various
pavement diseases that are not solely limited to specific ones, such as cracks
and potholes. IOPLIN can be iteratively trained with only the image label via
the Expectation-Maximization Inspired Patch Label Distillation (EMIPLD)
strategy, and accomplish this task well by inferring the labels of patches from
the pavement images. IOPLIN enjoys many desirable properties over the
state-of-the-art single branch CNN models such as GoogLeNet and EfficientNet.
It is able to handle images in different resolutions, and sufficiently utilize
image information particularly for the high-resolution ones, since IOPLIN
extracts the visual features from unrevised image patches instead of the
resized entire image. Moreover, it can roughly localize the pavement distress
without using any prior localization information in the training phase. In
order to better evaluate the effectiveness of our method in practice, we
construct a large-scale Bituminous Pavement Disease Detection dataset named
CQU-BPDD consisting of 60,059 high-resolution pavement images, which are
acquired from different areas at different times. Extensive results on this
dataset demonstrate the superiority of IOPLIN over the state-of-the-art image
classification approaches in automatic pavement disease detection. The source
codes of IOPLIN are released on \url{https://github.com/DearCaat/ioplin}.Comment: Revision on IEEE Trans on IT
Deep Domain Adaptation for Pavement Crack Detection
Deep learning-based pavement cracks detection methods often require
large-scale labels with detailed crack location information to learn accurate
predictions. In practice, however, crack locations are very difficult to be
manually annotated due to various visual patterns of pavement crack. In this
paper, we propose a Deep Domain Adaptation-based Crack Detection Network
(DDACDN), which learns to take advantage of the source domain knowledge to
predict the multi-category crack location information in the target domain,
where only image-level labels are available. Specifically, DDACDN first
extracts crack features from both the source and target domain by a two-branch
weights-shared backbone network. And in an effort to achieve the cross-domain
adaptation, an intermediate domain is constructed by aggregating the
three-scale features from the feature space of each domain to adapt the crack
features from the source domain to the target domain. Finally, the network
involves the knowledge of both domains and is trained to recognize and localize
pavement cracks. To facilitate accurate training and validation for domain
adaptation, we use two challenging pavement crack datasets CQU-BPDD and
RDD2020. Furthermore, we construct a new large-scale Bituminous Pavement
Multi-label Disease Dataset named CQU-BPMDD, which contains 38994
high-resolution pavement disease images to further evaluate the robustness of
our model. Extensive experiments demonstrate that DDACDN outperforms
state-of-the-art pavement crack detection methods in predicting the crack
location on the target domain.Comment: 12 pages, 10 figure
Deep convolutional generative adversarial network-based synthesis of datasets for road pavement distress segmentation
В данной работе рассматривается комплекс задач обнаружения различных дефектов дорожного полотна автомобильных дорог и современные методы их решения. Представленное сравнение общедоступных наборов данных позволяет сделать вывод о сложности и малой разработанности задачи сегментации дефектов дорожного покрытия по изображению общего вида автомобильных дорог. Для решения данной проблемы разработаны алгоритмы генерации синтетического набора данных для сегментации дефектов классов трещин и выбоин на основе методов компьютерной графики и генеративносостязательных сетей. Проведено сравнение точности сегментации дефектов дорожного покрытия полносверточной нейронной сетью U-Net на реальном и комбинированных наборах данных
Coping with Data Scarcity in Deep Learning and Applications for Social Good
The recent years are experiencing an extremely fast evolution of the Computer Vision and
Machine Learning fields: several application domains benefit from the newly developed
technologies and industries are investing a growing amount of money in Artificial Intelligence.
Convolutional Neural Networks and Deep Learning substantially contributed to the rise and
the diffusion of AI-based solutions, creating the potential for many disruptive new businesses.
The effectiveness of Deep Learning models is grounded by the availability of a huge
amount of training data. Unfortunately, data collection and labeling is an extremely expensive
task in terms of both time and costs; moreover, it frequently requires the collaboration of
domain experts.
In the first part of the thesis, I will investigate some methods for reducing the cost
of data acquisition for Deep Learning applications in the relatively constrained industrial
scenarios related to visual inspection. I will primarily assess the effectiveness of Deep Neural
Networks in comparison with several classical Machine Learning algorithms requiring a
smaller amount of data to be trained. Hereafter, I will introduce a hardware-based data
augmentation approach, which leads to a considerable performance boost taking advantage of
a novel illumination setup designed for this purpose. Finally, I will investigate the situation in
which acquiring a sufficient number of training samples is not possible, in particular the most
extreme situation: zero-shot learning (ZSL), which is the problem of multi-class classification
when no training data is available for some of the classes. Visual features designed for image
classification and trained offline have been shown to be useful for ZSL to generalize towards
classes not seen during training. Nevertheless, I will show that recognition performances
on unseen classes can be sharply improved by learning ad hoc semantic embedding (the
pre-defined list of present and absent attributes that represent a class) and visual features, to
increase the correlation between the two geometrical spaces and ease the metric learning
process for ZSL.
In the second part of the thesis, I will present some successful applications of state-of-the-
art Computer Vision, Data Analysis and Artificial Intelligence methods. I will illustrate
some solutions developed during the 2020 Coronavirus Pandemic for controlling the disease
vii
evolution and for reducing virus spreading. I will describe the first publicly available
dataset for the analysis of face-touching behavior that we annotated and distributed, and
I will illustrate an extensive evaluation of several computer vision methods applied to the
produced dataset. Moreover, I will describe the privacy-preserving solution we developed
for estimating the \u201cSocial Distance\u201d and its violations, given a single uncalibrated image
in unconstrained scenarios. I will conclude the thesis with a Computer Vision solution
developed in collaboration with the Egyptian Museum of Turin for digitally unwrapping
mummies analyzing their CT scan, to support the archaeologists during mummy analysis
and avoiding the devastating and irreversible process of physically unwrapping the bandages
for removing amulets and jewels from the body
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
Deep Learning Methods for 3D Aerial and Satellite Data
Recent advances in digital electronics have led to an overabundance of observations from electro-optical (EO) imaging sensors spanning high spatial, spectral and temporal resolution. This unprecedented volume, variety, and velocity is overwhelming our capacity to manage and translate that data into actionable information. Although decades of image processing research have taken the human out of the loop for many important tasks, the human analyst is still an irreplaceable link in the image exploitation chain, especially for more complex tasks requiring contextual understanding, memory, discernment, and learning. If knowledge discovery is to keep pace with the growing availability of data, new processing paradigms are needed in order to automate the analysis of earth observation imagery and ease the burden of manual interpretation.
To address this gap, this dissertation advances fundamental and applied research in deep learning for aerial and satellite imagery. We show how deep learning---a computational model inspired by the human brain---can be used for (1) tracking, (2) classifying, and (3) modeling from a variety of data sources including full-motion video (FMV), Light Detection and Ranging (LiDAR), and stereo photogrammetry. First we assess the ability of a bio-inspired tracking method to track small targets using aerial videos. The tracker uses three kinds of saliency maps: appearance, location, and motion. Our approach achieves the best overall performance, including being the only method capable of handling long-term occlusions.
Second, we evaluate the classification accuracy of a multi-scale fully convolutional network to label individual points in LiDAR data. Our method uses only the 3D-coordinates and corresponding low-dimensional spectral features for each point. Evaluated using the ISPRS 3D Semantic Labeling Contest, our method scored second place with an overall accuracy of 81.6\%. Finally, we validate the prediction capability of our neighborhood-aware network to model the bare-earth surface of LiDAR and stereo photogrammetry point clouds. The network bypasses traditionally-used ground classifications and seamlessly integrate neighborhood features with point-wise and global features to predict a per point Digital Terrain Model (DTM). We compare our results with two widely used softwares for DTM extraction, ENVI and LAStools. Together, these efforts have the potential to alleviate the manual burden associated with some of the most challenging and time-consuming geospatial processing tasks, with implications for improving our response to issues of global security, emergency management, and disaster response
Optimization for Deep Learning Systems Applied to Computer Vision
149 p.Since the DL revolution and especially over the last years (2010-2022), DNNs have become an essentialpart of the CV field, and they are present in all its sub-fields (video-surveillance, industrialmanufacturing, autonomous driving, ...) and in almost every new state-of-the-art application that isdeveloped. However, DNNs are very complex and the architecture needs to be carefully selected andadapted in order to maximize its efficiency. In many cases, networks are not specifically designed for theconsidered use case, they are simply recycled from other applications and slightly adapted, without takinginto account the particularities of the use case or the interaction with the rest of the system components,which usually results in a performance drop.This research work aims at providing knowledge and tools for the optimization of systems based on DeepLearning applied to different real use cases within the field of Computer Vision, in order to maximizetheir effectiveness and efficiency
A Review on Deep Learning in UAV Remote Sensing
Deep Neural Networks (DNNs) learn representation from data with an impressive
capability, and brought important breakthroughs for processing images,
time-series, natural language, audio, video, and many others. In the remote
sensing field, surveys and literature revisions specifically involving DNNs
algorithms' applications have been conducted in an attempt to summarize the
amount of information produced in its subfields. Recently, Unmanned Aerial
Vehicles (UAV) based applications have dominated aerial sensing research.
However, a literature revision that combines both "deep learning" and "UAV
remote sensing" thematics has not yet been conducted. The motivation for our
work was to present a comprehensive review of the fundamentals of Deep Learning
(DL) applied in UAV-based imagery. We focused mainly on describing
classification and regression techniques used in recent applications with
UAV-acquired data. For that, a total of 232 papers published in international
scientific journal databases was examined. We gathered the published material
and evaluated their characteristics regarding application, sensor, and
technique used. We relate how DL presents promising results and has the
potential for processing tasks associated with UAV-based image data. Lastly, we
project future perspectives, commentating on prominent DL paths to be explored
in the UAV remote sensing field. Our revision consists of a friendly-approach
to introduce, commentate, and summarize the state-of-the-art in UAV-based image
applications with DNNs algorithms in diverse subfields of remote sensing,
grouping it in the environmental, urban, and agricultural contexts.Comment: 38 pages, 10 figure
- …