1,217 research outputs found
Deep Quaternion Networks
The field of deep learning has seen significant advancement in recent years.
However, much of the existing work has been focused on real-valued numbers.
Recent work has shown that a deep learning system using the complex numbers can
be deeper for a fixed parameter budget compared to its real-valued counterpart.
In this work, we explore the benefits of generalizing one step further into the
hyper-complex numbers, quaternions specifically, and provide the architecture
components needed to build deep quaternion networks. We develop the theoretical
basis by reviewing quaternion convolutions, developing a novel quaternion
weight initialization scheme, and developing novel algorithms for quaternion
batch-normalization. These pieces are tested in a classification model by
end-to-end training on the CIFAR-10 and CIFAR-100 data sets and a segmentation
model by end-to-end training on the KITTI Road Segmentation data set. These
quaternion networks show improved convergence compared to real-valued and
complex-valued networks, especially on the segmentation task, while having
fewer parametersComment: IJCNN 2018, 8 pages, 1 figur
A deep residual architecture for skin lesion segmentation
In this paper, we propose an automatic approach to skin lesion region segmentation based on a deep learning architecture with multi-scale residual connections. The architecture of the proposed model is based on UNet [22] with residual connections to maximise the learning capability and performance of the network. The information lost in the encoder stages due to the max-pooling layer at each level is preserved through the multi-scale residual connections. To corroborate the efficacy of the proposed model, extensive experiments are conducted on the ISIC 2017 challenge dataset without using any external dermatologic image set. An extensive comparative analysis is presented with contemporary methodologies to highlight the promising performance of the proposed methodology
Cancer diagnosis using deep learning: A bibliographic review
In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
- …