511 research outputs found
A novel augmented deep transfer learning for classification of COVID-19 and other thoracic diseases from X-rays
Deep learning has provided numerous breakthroughs in natural imaging tasks. However, its successful application to medical images is severely handicapped with the limited amount of annotated training data. Transfer learning is commonly adopted for the medical imaging tasks. However, a large covariant shift between the source domain of natural images and target domain of medical images results in poor transfer learning. Moreover, scarcity of annotated data for the medical imaging tasks causes further problems for effective transfer learning. To address these problems, we develop an augmented ensemble transfer learning technique that leads to significant performance gain over the conventional transfer learning. Our technique uses an ensemble of deep learning models, where the architecture of each network is modified with extra layers to account for dimensionality change between the images of source and target data domains. Moreover, the model is hierarchically tuned to the target domain with augmented training data. Along with the network ensemble, we also utilize an ensemble of dictionaries that are based on features extracted from the augmented models. The dictionary ensemble provides an additional performance boost to our method. We first establish the effectiveness of our technique with the challenging ChestXray-14 radiography data set. Our experimental results show more than 50% reduction in the error rate with our method as compared to the baseline transfer learning technique. We then apply our technique to a recent COVID-19 data set for binary and multi-class classification tasks. Our technique achieves 99.49% accuracy for the binary classification, and 99.24% for multi-class classification
Structure-aware image translation-based long future prediction for enhancement of ground robotic vehicle teleoperation
Predicting future frames through image-to-image translation and using these synthetically generated frames for high-speed ground vehicle teleoperation is a new concept to address latency and enhance operational performance. In the immediate previous work, the image quality of the predicted frames was low and a lot of scene detail was lost. To preserve the structural details of objects and improve overall image quality in the predicted frames, several novel ideas are proposed herein. A filter has been designed to remove noise from dense optical flow components resulting from frame rate inconsistencies. The Pix2Pix base network has been modified and a structure-aware SSIM-based perpetual loss function has been implemented. A new dataset of 20 000 training input images and 2000 test input images with a 500 ms delay between the target and input frames has been created. Without any additional video transformation steps, the proposed improved model achieved PSNR of 23.1; SSIM of 0.65; and MS-SSIM of 0.80, a substantial improvement over our previous work. A Fleiss’ kappa score of \u3e 0.40 (0.48 for the modified network and 0.46 for the perpetual loss function) proves the reliability of the model
Long future frame prediction using optical flow informed deep neural networks for enhancement of robotic teleoperation in high latency environments
High latency in teleoperation has a significant negative impact on operator performance. While deep learning has revolutionized many domains recently, it has not previously been applied to teleoperation enhancement. We propose a novel approach to predict video frames deep into the future using neural networks informed by synthetically generated optical flow information. This can be employed in teleoperated robotic systems that rely on video feeds for operator situational awareness. We have used the image-to-image translation technique as a basis for the prediction of future frames. The Pix2Pix conditional generative adversarial network (cGAN) has been selected as a base network. Optical flow components reflecting real-time control inputs are added to the standard RGB channels of the input image. We have experimented with three data sets of 20,000 input images each that were generated using our custom-designed teleoperation simulator with a 500-ms delay added between the input and target frames. Structural Similarity Index Measures (SSIMs) of 0.60 and Multi-SSIMs of 0.68 were achieved when training the cGAN with three-channel RGB image data. With the five-channel input data (incorporating optical flow) these values improved to 0.67 and 0.74, respectively. Applying Fleiss\u27 κ gave a score of 0.40 for three-channel RGB data, and 0.55 for five-channel optical flow-added data. We are confident the predicted synthetic frames are of sufficient quality and reliability to be presented to teleoperators as a video feed that will enhance teleoperation. To the best of our knowledge, we are the first to attempt to reduce the impacts of latency through future frame prediction using deep neural networks
Pre-text Representation Transfer for Deep Learning with Limited Imbalanced Data : Application to CT-based COVID-19 Detection
Annotating medical images for disease detection is often tedious and
expensive. Moreover, the available training samples for a given task are
generally scarce and imbalanced. These conditions are not conducive for
learning effective deep neural models. Hence, it is common to 'transfer' neural
networks trained on natural images to the medical image domain. However, this
paradigm lacks in performance due to the large domain gap between the natural
and medical image data. To address that, we propose a novel concept of Pre-text
Representation Transfer (PRT). In contrast to the conventional transfer
learning, which fine-tunes a source model after replacing its classification
layers, PRT retains the original classification layers and updates the
representation layers through an unsupervised pre-text task. The task is
performed with (original, not synthetic) medical images, without utilizing any
annotations. This enables representation transfer with a large amount of
training data. This high-fidelity representation transfer allows us to use the
resulting model as a more effective feature extractor. Moreover, we can also
subsequently perform the traditional transfer learning with this model. We
devise a collaborative representation based classification layer for the case
when we leverage the model as a feature extractor. We fuse the output of this
layer with the predictions of a model induced with the traditional transfer
learning performed over our pre-text transferred model. The utility of our
technique for limited and imbalanced data classification problem is
demonstrated with an extensive five-fold evaluation for three large-scale
models, tested for five different class-imbalance ratios for CT based COVID-19
detection. Our results show a consistent gain over the conventional transfer
learning with the proposed method.Comment: Best paper at IVCN
Multimodal fusion for audio-image and video action recognition
Multimodal Human Action Recognition (MHAR) is an important research topic in computer vision and event recognition fields. In this work, we address the problem of MHAR by developing a novel audio-image and video fusion-based deep learning framework that we call Multimodal Audio-Image and Video Action Recognizer (MAiVAR). We extract temporal information using image representations of audio signals and spatial information from video modality with the help of Convolutional Neutral Networks (CNN)-based feature extractors and fuse these features to recognize respective action classes. We apply a high-level weights assignment algorithm for improving audio-visual interaction and convergence. This proposed fusion-based framework utilizes the influence of audio and video feature maps and uses them to classify an action. Compared with state-of-the-art audio-visual MHAR techniques, the proposed approach features a simpler yet more accurate and more generalizable architecture, one that performs better with different audio-image representations. The system achieves an accuracy 87.9% and 79.0% on UCF51 and Kinetics Sounds datasets, respectively. All code and models for this paper will be available at https://tinyurl.com/4ps2ux6n
Going Deep in Medical Image Analysis: Concepts, Methods, Challenges and Future Directions
Medical Image Analysis is currently experiencing a paradigm shift due to Deep
Learning. This technology has recently attracted so much interest of the
Medical Imaging community that it led to a specialized conference in `Medical
Imaging with Deep Learning' in the year 2018. This article surveys the recent
developments in this direction, and provides a critical review of the related
major aspects. We organize the reviewed literature according to the underlying
Pattern Recognition tasks, and further sub-categorize it following a taxonomy
based on human anatomy. This article does not assume prior knowledge of Deep
Learning and makes a significant contribution in explaining the core Deep
Learning concepts to the non-experts in the Medical community. Unique to this
study is the Computer Vision/Machine Learning perspective taken on the advances
of Deep Learning in Medical Imaging. This enables us to single out `lack of
appropriately annotated large-scale datasets' as the core challenge (among
other challenges) in this research direction. We draw on the insights from the
sister research fields of Computer Vision, Pattern Recognition and Machine
Learning etc.; where the techniques of dealing with such challenges have
already matured, to provide promising directions for the Medical Imaging
community to fully harness Deep Learning in the future
PyMAiVAR: An open-source Python suit for audio-image representation in human action recognition
We present PyMAiVAR, a versatile toolbox that encompasses the generation of image representations for audio data including Wave plots, Spectral Centroids, Spectral Roll Offs, Mel Frequency Cepstral Coefficients (MFCC), MFCC Feature Scaling, and Chromagrams. This wide-ranging toolkit generates rich audio-image representations, playing a pivotal role in reshaping human action recognition. By fully exploiting audio data\u27s latent potential, PyMAiVAR stands as a significant advancement in the field. The package is implemented in Python and can be used across different operating systems
Conjoint utilization of structured and unstructured information for planning interleaving deliberation in supply chains
Effective business planning requires seamless access and intelligent analysis of information in its totality to allow the business planner to gain enhanced critical business insights for decision support. Current business planning tools provide insights from structured business data (i.e. sales forecasts, customers and products data, inventory details) only and fail to take into account unstructured complementary information residing in contracts, reports, user\u27s comments, emails etc. In this article, a planning support system is designed and developed that empower business planners to develop and revise business plans utilizing both structured data and unstructured information conjointly. This planning system activity model comprises of two steps. Firstly, a business planner develops a candidate plan using planning template. Secondly, the candidate plan is put forward to collaborating partners for its revision interleaving deliberation. Planning interleaving deliberation activity in the proposed framework enables collaborating planners to challenge both a decision and the thinking that underpins the decision in the candidate plan. The planning system is modeled using situation calculus and is validated through a prototype development
SAM-SoS: A stochastic software architecture modeling and verification approach for complex System-of-Systems
A System-of-Systems (SoS) is a complex, dynamic system whose Constituent Systems (CSs) are not known precisely at design time, and the environment in which they operate is uncertain. SoS behavior is unpredictable due to underlying architectural characteristics such as autonomy and independence. Although the stochastic composition of CSs is vital to achieving SoS missions, their unknown behaviors and impact on system properties are unavoidable. Moreover, unknown conditions and volatility have significant effects on crucial Quality Attributes (QAs) such as performance, reliability and security. Hence, the structure and behavior of a SoS must be modeled and validated quantitatively to foresee any potential impact on the properties critical for achieving the missions. Current modeling approaches lack the essential syntax and semantics required to model and verify SoS behaviors at design time and cannot offer alternative design choices for better design decisions. Therefore, the majority of existing techniques fail to provide qualitative and quantitative verification of SoS architecture models. Consequently, we have proposed an approach to model and verify Non-Deterministic (ND) SoS in advance by extending the current algebraic notations for the formal models as a hybrid stochastic formalism to specify and reason architectural elements with the required semantics. A formal stochastic model is developed using a hybrid approach for architectural descriptions of SoS with behavioral constraints. Through a model-driven approach, stochastic models are then translated into PRISM using formal verification rules. The effectiveness of the approach has been tested with an end-to-end case study design of an emergency response SoS for dealing with a fire situation. Architectural analysis is conducted on the stochastic model, using various qualitative and quantitative measures for SoS missions. Experimental results reveal critical aspects of SoS architecture model that facilitate better achievement of missions and QAs with improved design, using the proposed approach
ECU-MSS-2 dataset, a new multi-species seagrass dataset
The ECU-MSS-2\u27 dataset contains four different habitats, \u27Amphibolis\u27 spp (hereafter \u27Amphibolis ), \u27Halophila\u27 spp (hereafter \u27Halophila\u27), \u27Posidonia\u27 spp (hereafter \u27Posidonia\u27) and \u27Background\u27. We compiled this image dataset from different sources. The \u27Halophila\u27 images were collected by the Centre for Marine Ecosystems Research, Edith Cowan University, Western Australia.
The \u27Amphibolis\u27 \u27Background\u27 and \u27Posidonia\u27 images were collected by the Department of Biodiversity, Conservation and Attractions (DBCA), Australia. The \u27Amphibolis\u27 class includes the seagrass species Amphibolis griffithii and Amphibolis Antarctica. The \u27Halophila\u27 class includes Halophila ovalis, while the \u27Posidonia\u27 class includes Posidonia sinuosa, Posidonia coriacea, and Posidonia Australis. The \u27Background\u27 class includes coral, sand, sponge, seaweeds, fish and other benthic debris.
The dataset contained total 5,201 images, where the \u27Amphibolis\u27 class has 1304 images, the \u27Background\u27 class has 1237 images, the \u27Halophila\u27 class has 1315 images and the \u27Posidonia\u27 class has 1345 images. The total images were divided into training and test sets. The training set has 4,161 images and the test set has 1,040 images. All four classes have the same 260 images in the test set
- …