227 research outputs found
Contributions and applications around low resource deep learning modeling
El aprendizaje profundo representa la vanguardia del aprendizaje automático en multitud de aplicaciones. Muchas de estas tareas requieren una gran cantidad de recursos computacionales, lo que limita su adopción en dispositivos integrados. El objetivo principal de esta tesis es estudiar métodos y algoritmos que permiten abordar problemas utilizando aprendizaje profundo con bajos recursos computacionales. Este trabajo también tiene como objetivo presentar aplicaciones de aprendizaje profundo en la industria.
La primera contribución es una nueva función de activación para redes de aprendizaje profundo: la función de módulo. Los experimentos muestran que la función de activación propuesta logra resultados superiores en tareas de visión artificial cuando se compara con las alternativas encontradas en la literatura.
La segunda contribución es una nueva estrategia para combinar modelos preentrenados usando destilación de conocimiento. Los resultados de este capítulo muestran que es posible aumentar significativamente la precisión de los modelos preentrenados más pequeños, lo que permite un alto rendimiento a un menor costo computacional.
La siguiente contribución de esta tesis aborda el problema de la previsión de ventas en el campo de la logística. Se proponen dos sistemas de extremo a extremo con dos técnicas diferentes de aprendizaje profundo (modelos de secuencia a secuencia y transformadores). Los resultados de este capítulo concluyen que es posible construir sistemas integrales para predecir las ventas de múltiples productos individuales, en múltiples puntos de venta y en diferentes momentos con un único modelo de aprendizaje automático. El modelo propuesto supera las alternativas encontradas en la literatura.
Finalmente, las dos últimas contribuciones pertenecen al campo de la tecnología del habla. El primero estudia cómo construir un sistema de reconocimiento de voz Keyword Spotting utilizando una versión eficiente de una red neuronal convolucional. En este estudio, el sistema propuesto es capaz de superar el rendimiento de todos los puntos de referencia encontrados en la literatura cuando se prueba contra las subtareas más complejas. El último estudio propone un modelo independiente de texto a voz de última generación capaz de sintetizar voz inteligible en miles de perfiles de voz, mientras genera un discurso con variaciones de prosodia significativas y expresivas. El enfoque propuesto elimina la dependencia de los modelos anteriores de un sistema de voz adicional, lo que hace que el sistema propuesto sea más eficiente en el tiempo de entrenamiento e inferencia, y permite operaciones fuera de línea y en el dispositivo.Deep learning is the state of the art for several machine learning tasks. Many of these tasks require large amount of computational resources, which limits their adoption in embedded devices. The main goal of this dissertation is to study methods and algorithms that allow to approach problems using deep learning with restricted computational resources. This work also aims at presenting applications of deep learning in industry.
The first contribution is a new activation function for deep learning networks: the modulus function. The experiments show that the proposed activation function achieves superior results in computer vision tasks when compared with the alternatives found in the literature.
The second contribution is a new strategy to combine pre-trained models using knowledge distillation. The results of this chapter show that it is possible to significantly increase the accuracy of the smallest pre-trained models, allowing high performance at a lower computational cost.
The following contribution in this thesis tackles the problem of sales fore- casting in the field of logistics. Two end-to-end systems with two different deep learning techniques (sequence-to-sequence models and transformers) are pro- posed. The results of this chapter conclude that it is possible to build end-to-end systems to predict the sales of multiple individual products, at multiple points of sale and different times with a single machine learning model. The proposed model outperforms the alternatives found in the literature.
Finally, the last two contributions belong to the speech technology field. The former, studies how to build a Keyword Spotting speech recognition system using an efficient version of a convolutional neural network. In this study, the proposed system is able to beat the performance of all the benchmarks found in the literature when tested against the most complex subtasks.
The latter study proposes a standalone state-of-the-art text-to-speech model capable of synthesizing intelligible voice in thousands of voice profiles, while generating speech with meaningful and expressive prosody variations. The proposed approach removes the dependency of previous models on an additional voice system, which makes the proposed system more efficient at training and inference time, and enables offline and on-device operations
A comprehensive review of 3D convolutional neural network-based classification techniques of diseased and defective crops using non-UAV-based hyperspectral images
Hyperspectral imaging (HSI) is a non-destructive and contactless technology that provides valuable information about the structure and composition of an object. It has the ability to capture detailed information about the chemical and physical properties of agricultural crops. Due to its wide spectral range, compared with multispectral-or RGB-based imaging methods, HSI can be a more effective tool for monitoring crop health and productivity. With the advent of this imaging tool in agrotechnology, researchers can more accurately address issues related to the detection of diseased and defective crops in the agriculture industry. This allows to implement the most suitable and accurate farming solutions, such as irrigation and fertilization, before crops enter a damaged and difficult-to-recover phase of growth in the field. While HSI provides valuable insights into the object under investigation, the limited number of HSI datasets for crop evaluation presently poses a bottleneck. Dealing with the curse of dimensionality presents another challenge due to the abundance of spectral and spatial information in each hyperspectral cube. State-of-the-art methods based on 1D and 2D convolutional neural networks (CNNs) struggle to efficiently extract spectral and spatial information. On the other hand, 3D-CNN-based models have shown significant promise in achieving better classification and detection results by leveraging spectral and spatial features simultaneously. Despite the apparent benefits of 3D-CNN-based models, their usage for classification purposes in this area of research has remained limited. This paper seeks to address this gap by reviewing 3D-CNN-based architectures and the typical deep learning pipeline, including preprocessing and visualization of results, for the classification of hyperspectral images of diseased and defective crops. Furthermore, we discuss open research areas and challenges when utilizing 3D-CNNs with HSI data."This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors."https://www.sciencedirect.com/science/article/pii/S277237552300145
Exploring Hyperspectral Imaging and 3D Convolutional Neural Network for Stress Classification in Plants
Hyperspectral imaging (HSI) has emerged as a transformative technology in imaging, characterized by its ability to capture a wide spectrum of light, including wavelengths beyond the visible range. This approach significantly differs from traditional imaging methods such as RGB imaging, which uses three color channels, and multispectral imaging, which captures several discrete spectral bands. Through this approach, HSI offers detailed spectral signatures for each pixel, facilitating a more nuanced analysis of the imaged subjects. This capability is particularly beneficial in applications like agricultural practices, where it can detect changes in physiological and structural characteristics of crops. Moreover, the ability of HSI to monitor these changes over time is advantageous for observing how subjects respond to different environmental conditions or treatments. However, the high-dimensional nature of hyperspectral data presents challenges in data processing and feature extraction. Traditional machine learning algorithms often struggle to handle such complexity. This is where 3D Convolutional Neural Networks (CNNs) become valuable. Unlike 1D-CNNs, which extract features from spectral dimensions, and 2D-CNNs, which focus on spatial dimensions, 3D CNNs have the capability to process data across both spectral and spatial dimensions. This makes them adept at extracting complex features from hyperspectral data. In this thesis, we explored the potency of HSI combined with 3D-CNN in agriculture domain where plant health and vitality are paramount. To evaluate this, we subjected lettuce plants to varying stress levels to assess the performance of this method in classifying the stressed lettuce at the early stages of growth into their respective stress-level groups. For this study, we created a dataset comprising 88 hyperspectral image samples of stressed lettuce. Utilizing Bayesian optimization, we developed 350 distinct 3D-CNN models to assess the method. The top-performing model achieved a 75.00\% test accuracy. Additionally, we addressed the challenge of generating valid 3D-CNN models in the Keras Tuner library through meticulous hyperparameter configuration. Our investigation also extends to the role of individual channels and channel groups within the color and near-infrared spectrum in predicting results for each stress-level group. We observed that the red and green spectra have a higher influence on the prediction results. Furthermore, we conducted a comprehensive review of 3D-CNN-based classification techniques for diseased and defective crops using non-UAV-based hyperspectral images.MITACSMaster of Science in Applied Computer Scienc
Robotic Burst Imaging for Light-Constrained 3D Reconstruction
This thesis proposes a novel input scheme, robotic burst, to improve vision-based 3D reconstruction for robots operating in low-light conditions, where existing state-of-the-art robotic vision algorithms struggle due to low signal-to-noise ratio in low-light images. We aim to improve the correspondence search stage of feature-based reconstruction using robotic burst imaging, including burst-merged images, a burst feature finder, and an end-to-end learning-based feature extractor. Firstly, we establish the use of robotic burst imaging to compute burst-merged images for feature-based reconstruction. We then develop a burst feature finder that locates features with well-defined scale and apparent motion on a burst to deal with limitations of burst-merged images such as misalignment at strong noise. To improve feature matches in burst-based reconstruction, we also present an end-to-end learning-based feature extractor that finds well-defined scale features directly on light-constrained bursts.
We evaluate our methods against state-of-the-art reconstruction methods for conventional imaging that uses both classical and learning-based feature extractors. We validate our novel input scheme using burst imagery captured on a robotic arm and drones. We demonstrate progressive improvements in low-light reconstruction using our burst-based methods against conventional approaches and overall, converging 90% of all scenes captured in millilux conditions that otherwise converge with 10% success rate using conventional methods. This work opens up new avenues for applications, including autonomous driving and drone delivery at night, mining, and behavioral studies on nocturnal animals
ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
Although many recent works have investigated generalizable NeRF-based novel
view synthesis for unseen scenes, they seldom consider the synthetic-to-real
generalization, which is desired in many practical applications. In this work,
we first investigate the effects of synthetic data in synthetic-to-real novel
view synthesis and surprisingly observe that models trained with synthetic data
tend to produce sharper but less accurate volume densities. For pixels where
the volume densities are correct, fine-grained details will be obtained.
Otherwise, severe artifacts will be produced. To maintain the advantages of
using synthetic data while avoiding its negative effects, we propose to
introduce geometry-aware contrastive learning to learn multi-view consistent
features with geometric constraints. Meanwhile, we adopt cross-view attention
to further enhance the geometry perception of features by querying features
across input views. Experiments demonstrate that under the synthetic-to-real
setting, our method can render images with higher quality and better
fine-grained details, outperforming existing generalizable novel view synthesis
methods in terms of PSNR, SSIM, and LPIPS. When trained on real data, our
method also achieves state-of-the-art results
A Machine Vision Method for Correction of Eccentric Error: Based on Adaptive Enhancement Algorithm
In the procedure of surface defects detection for large-aperture aspherical
optical elements, it is of vital significance to adjust the optical axis of the
element to be coaxial with the mechanical spin axis accurately. Therefore, a
machine vision method for eccentric error correction is proposed in this paper.
Focusing on the severe defocus blur of reference crosshair image caused by the
imaging characteristic of the aspherical optical element, which may lead to the
failure of correction, an Adaptive Enhancement Algorithm (AEA) is proposed to
strengthen the crosshair image. AEA is consisted of existed Guided Filter Dark
Channel Dehazing Algorithm (GFA) and proposed lightweight Multi-scale Densely
Connected Network (MDC-Net). The enhancement effect of GFA is excellent but
time-consuming, and the enhancement effect of MDC-Net is slightly inferior but
strongly real-time. As AEA will be executed dozens of times during each
correction procedure, its real-time performance is very important. Therefore,
by setting the empirical threshold of definition evaluation function SMD2, GFA
and MDC-Net are respectively applied to highly and slightly blurred crosshair
images so as to ensure the enhancement effect while saving as much time as
possible. AEA has certain robustness in time-consuming performance, which takes
an average time of 0.2721s and 0.0963s to execute GFA and MDC-Net separately on
ten 200pixels 200pixels Region of Interest (ROI) images with different degrees
of blur. And the eccentricity error can be reduced to within 10um by our
method
A Comparison of Image Denoising Methods
The advancement of imaging devices and countless images generated everyday
pose an increasingly high demand on image denoising, which still remains a
challenging task in terms of both effectiveness and efficiency. To improve
denoising quality, numerous denoising techniques and approaches have been
proposed in the past decades, including different transforms, regularization
terms, algebraic representations and especially advanced deep neural network
(DNN) architectures. Despite their sophistication, many methods may fail to
achieve desirable results for simultaneous noise removal and fine detail
preservation. In this paper, to investigate the applicability of existing
denoising techniques, we compare a variety of denoising methods on both
synthetic and real-world datasets for different applications. We also introduce
a new dataset for benchmarking, and the evaluations are performed from four
different perspectives including quantitative metrics, visual effects, human
ratings and computational cost. Our experiments demonstrate: (i) the
effectiveness and efficiency of representative traditional denoisers for
various denoising tasks, (ii) a simple matrix-based algorithm may be able to
produce similar results compared with its tensor counterparts, and (iii) the
notable achievements of DNN models, which exhibit impressive generalization
ability and show state-of-the-art performance on various datasets. In spite of
the progress in recent years, we discuss shortcomings and possible extensions
of existing techniques. Datasets, code and results are made publicly available
and will be continuously updated at
https://github.com/ZhaomingKong/Denoising-Comparison.Comment: In this paper, we intend to collect and compare various denoising
methods to investigate their effectiveness, efficiency, applicability and
generalization ability with both synthetic and real-world experiment
Real-World Image Restoration Using Degradation Adaptive Transformer-Based Adversarial Network
Most existing learning-based image restoration methods heavily rely on paired degraded/non-degraded training datasets that are based on simplistic handcrafted degradation assumptions. These assumptions often involve a limited set of degradations, such as Gaussian blurs, noises, and bicubic downsampling. However, when these methods are applied to real-world images, there is a significant decrease in performance due to the discrepancy between synthetic and realistic degradation. Additionally, they lack the flexibility to adapt to unknown degradations in practical scenarios, which limits their generalizability to complex and unconstrained scenes.
To address the absence of image pairs, recent studies have proposed Generative Adversarial Network (GAN)-based unpaired methods. Nevertheless, unpaired learning models based on convolution operations encounter challenges in capturing long-range pixel dependencies in real-world images. This limitation stems from their reliance on convolution operations, which offer local connectivity and translation equivariance but struggle to capture global dependencies due to their limited receptive field.
To address these challenges, this dissertation proposed an innovative unpaired image restoration basic model along with an advanced model. The proposed basic model is the DA-CycleGAN model, which is based on the CycleGAN [1] neural network and specifically designed for blind real-world Single Image Super-Resolution (SISR). The DA-CycleGAN incorporates a degradation adaptive (DA) module to learn various real-world degradations (such as noise and blur patterns) in an unpaired manner, enabling strong flexible adaptation. Additionally, an advanced model called Trans-CycleGAN was designed, which integrated the Transformer architecture into CycleGAN to leverage its global connectivity. This combination allowed for image-to-image translation using CycleGAN [1] while enabling the Transformer to model global connectivity across long-range pixels. Extensive experiments conducted on realistic images demonstrate the superior performance of the proposed method in solving real-world image restoration problems, resulting in clearer and finer details.
Overall, this dissertation presents a novel unpaired image restoration basic model and an advanced model that effectively address the limitations of existing approaches. The proposed approach achieves significant advancements in handling real-world degradations and modeling long-range pixel dependencies, thereby offering substantial improvements in image restoration tasks.
Index Terms— Cross-domain translation, generative adversarial network, image restoration, super-resolution, transformer, unpaired training
Automated UAV and Satellite Image Analysis For Wildlife Monitoring.
Very high resolution satellites and unmanned aerial vehicles (UAVs) are revolutionising our ability to monitor wildlife, especially species in remote and inaccessible regions. However, given the rapid increase in data acquisition, computer-automated approaches are urgently needed to count wildlife in the resultant imagery. In this thesis, we investigate the application of convolutional neural networks (CNNs) to the task of detecting vulnerable seabird populations in satellite and UAV imagery. In our first application we train a U-Net CNN to detect wandering albatrosses in 31-cm resolution WorldView-3 satellite imagery. We compare results across four different island colonies using a leave-one-island-out cross validation, achieving a mean average precision (mAP) score of 0.669. By collecting new data on inter-observer variation in albatross counts, we show that our U-Net results fall within the range of human accuracy for two islands, with misclassifications at other sites being simple to filter manually. In our second application we detect Abbott’s boobies nesting in forest canopy, using UAV Structure from Motion (SfM) imagery. We focus on overcoming occlusion from branches by implementing a multi-view detection method. We first train a Faster R-CNN model to detect Abbott’s booby nest sites (mAP=0.518) and guano (mAP=0.472) in the 2D UAV images. We then project Faster R-CNN detections onto the 3D SfM model, cluster multi-view detections of the same objects using DBSCAN, and use cluster features to classify proposals into true and false positives (comparing logistic regression, support vector machine, and multilayer perceptron models). Our best-performing multi-view model successfully detects nest sites (mAP=0.604) and guano (mAP=0.574), and can be incorporated with expert review to greatly expedite analysis time. Both methods have immediate real-world application
for future surveys of the target species, allowing for more frequent, expansive, and lower-cost monitoring, vital for safeguarding populations in the long-term
- …