232 research outputs found
Lip-Reading with Visual Form Classification using Residual Networks and Bidirectional Gated Recurrent Units
Lip-reading is a method that focuses on the observation and interpretation of lip movements to understand spoken language. Previous studies have exclusively concentrated on a single variation of residual networks (ResNets). This study primarily aimed to conduct a comparative analysis of several types of ResNets. This study additionally calculates metrics for several word structures included in the GRID dataset, encompassing verbs, colors, prepositions, letters, and numerals. This component has not been previously investigated in other studies. The proposed approach encompasses several stages, namely pre-processing, which involves face detection and mouth location, feature extraction, and classification. The architecture for feature extraction comprises a 3-dimensional convolutional neural network (3D-CNN) integrated with ResNets. The management of temporal sequences during the classification phase is accomplished through the utilization of the bidirectional gated recurrent units (Bi-GRU) model. The experimental results demonstrated a character error rate (CER) of 14.09% and a word error rate (WER) of 28.51%. The combination of 3D-CNN ResNet-34 and Bi-GRU yielded superior outcomes in comparison to ResNet-18 and ResNet-50. The correlation between increased network depth and enhanced performance in lip-reading models was not consistently observed. Nevertheless, the incorporation of additional trained parameters offers certain benefits. Moreover, it has demonstrated superior levels of precision in comparison to human professionals in the task of distinguishing diverse word structures. Doi: 10.28991/HIJ-2023-04-02-010 Full Text: PD
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
3D Medical Image Segmentation based on multi-scale MPU-Net
The high cure rate of cancer is inextricably linked to physicians' accuracy
in diagnosis and treatment, therefore a model that can accomplish
high-precision tumor segmentation has become a necessity in many applications
of the medical industry. It can effectively lower the rate of misdiagnosis
while considerably lessening the burden on clinicians. However, fully automated
target organ segmentation is problematic due to the irregular stereo structure
of 3D volume organs. As a basic model for this class of real applications,
U-Net excels. It can learn certain global and local features, but still lacks
the capacity to grasp spatial long-range relationships and contextual
information at multiple scales. This paper proposes a tumor segmentation model
MPU-Net for patient volume CT images, which is inspired by Transformer with a
global attention mechanism. By combining image serialization with the Position
Attention Module, the model attempts to comprehend deeper contextual
dependencies and accomplish precise positioning. Each layer of the decoder is
also equipped with a multi-scale module and a cross-attention mechanism. The
capability of feature extraction and integration at different levels has been
enhanced, and the hybrid loss function developed in this study can better
exploit high-resolution characteristic information. Moreover, the suggested
architecture is tested and evaluated on the Liver Tumor Segmentation Challenge
2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net
shows excellent segmentation results. The dice, accuracy, precision,
specificity, IOU, and MCC metrics for the best model segmentation results are
92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding
indicators in various aspects illustrate the exceptional performance of this
framework in automatic medical image segmentation.Comment: 37 page
Collision Avoidance on Unmanned Aerial Vehicles using Deep Neural Networks
Unmanned Aerial Vehicles (UAVs), although hardly a new technology, have recently
gained a prominent role in many industries, being widely used not only among enthusiastic
consumers but also in high demanding professional situations, and will have a
massive societal impact over the coming years. However, the operation of UAVs is full
of serious safety risks, such as collisions with dynamic obstacles (birds, other UAVs, or
randomly thrown objects). These collision scenarios are complex to analyze in real-time,
sometimes being computationally impossible to solve with existing State of the Art (SoA)
algorithms, making the use of UAVs an operational hazard and therefore significantly reducing
their commercial applicability in urban environments. In this work, a conceptual
framework for both stand-alone and swarm (networked) UAVs is introduced, focusing on
the architectural requirements of the collision avoidance subsystem to achieve acceptable
levels of safety and reliability. First, the SoA principles for collision avoidance against
stationary objects are reviewed. Afterward, a novel image processing approach that uses
deep learning and optical flow is presented. This approach is capable of detecting and
generating escape trajectories against potential collisions with dynamic objects. Finally,
novel models and algorithms combinations were tested, providing a new approach for
the collision avoidance of UAVs using Deep Neural Networks. The feasibility of the proposed
approach was demonstrated through experimental tests using a UAV, created from
scratch using the framework developed.Os veículos aéreos não tripulados (VANTs), embora dificilmente considerados uma
nova tecnologia, ganharam recentemente um papel de destaque em muitas indústrias,
sendo amplamente utilizados não apenas por amadores, mas também em situações profissionais
de alta exigência, sendo expectável um impacto social massivo nos próximos
anos. No entanto, a operação de VANTs está repleta de sérios riscos de segurança, como
colisões com obstáculos dinâmicos (pássaros, outros VANTs ou objetos arremessados).
Estes cenários de colisão são complexos para analisar em tempo real, às vezes sendo computacionalmente
impossível de resolver com os algoritmos existentes, tornando o uso de
VANTs um risco operacional e, portanto, reduzindo significativamente a sua aplicabilidade
comercial em ambientes citadinos. Neste trabalho, uma arquitectura conceptual
para VANTs autônomos e em rede é apresentada, com foco nos requisitos arquitetônicos
do subsistema de prevenção de colisão para atingir níveis aceitáveis de segurança e confiabilidade.
Os estudos presentes na literatura para prevenção de colisão contra objectos
estacionários são revistos e uma nova abordagem é descrita. Esta tecnica usa técnicas
de aprendizagem profunda e processamento de imagem, para realizar a prevenção de
colisões em tempo real com objetos móveis. Por fim, novos modelos e combinações de algoritmos
são propostos, fornecendo uma nova abordagem para evitar colisões de VANTs
usando Redes Neurais Profundas. A viabilidade da abordagem foi demonstrada através
de testes experimentais utilizando um VANT, desenvolvido a partir da arquitectura
apresentada
Machine Learning Models for High-dimensional Biomedical Data
abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.
The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.
The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.
The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.
The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201
Combining Shape and Learning for Medical Image Analysis
Automatic methods with the ability to make accurate, fast and robust assessments of medical images are highly requested in medical research and clinical care. Excellent automatic algorithms are characterized by speed, allowing for scalability, and an accuracy comparable to an expert radiologist. They should produce morphologically and physiologically plausible results while generalizing well to unseen and rare anatomies. Still, there are few, if any, applications where today\u27s automatic methods succeed to meet these requirements.\ua0The focus of this thesis is two tasks essential for enabling automatic medical image assessment, medical image segmentation and medical image registration. Medical image registration, i.e. aligning two separate medical images, is used as an important sub-routine in many image analysis tools as well as in image fusion, disease progress tracking and population statistics. Medical image segmentation, i.e. delineating anatomically or physiologically meaningful boundaries, is used for both diagnostic and visualization purposes in a wide range of applications, e.g. in computer-aided diagnosis and surgery.The thesis comprises five papers addressing medical image registration and/or segmentation for a diverse set of applications and modalities, i.e. pericardium segmentation in cardiac CTA, brain region parcellation in MRI, multi-organ segmentation in CT, heart ventricle segmentation in cardiac ultrasound and tau PET registration. The five papers propose competitive registration and segmentation methods enabled by machine learning techniques, e.g. random decision forests and convolutional neural networks, as well as by shape modelling, e.g. multi-atlas segmentation and conditional random fields
Review of the state of the art of deep learning for plant diseases: a broad analysis and discussion
Deep learning (DL) represents the golden era in the machine learning (ML) domain, and it has gradually become the leading approach in many fields. It is currently playing a vital role in the early detection and classification of plant diseases. The use of ML techniques in this field is viewed as having brought considerable improvement in cultivation productivity sectors, particularly with the recent emergence of DL, which seems to have increased accuracy levels. Recently, many DL architectures have been implemented accompanying visualisation techniques that are essential for determining symptoms and classifying plant diseases. This review investigates and analyses the most recent methods, developed over three years leading up to 2020, for training, augmentation, feature fusion and extraction, recognising and counting crops, and detecting plant diseases, including how these methods can be harnessed to feed deep classifiers and their effects on classifier accuracy
- …