1,140 research outputs found

    注目領域検出のための視覚的注意モデル設計に関する研究

    Get PDF
    Visual attention is an important mechanism in the human visual system. When human observe images and videos, they usually do not describe all the contents in them. Instead, they tend to talk about the semantically important regions and objects in the images. The human eye is usually attracted by some regions of interest rather than the entire scene. These regions of interest that present the mainly meaningful or semantic content are called saliency region. Visual saliency detection refers to the use of intelligent algorithms to simulate human visual attention mechanism, extract both the low-level features and high-level semantic information and localize the salient object regions in images and videos. The generated saliency map indicates the regions that are likely to attract human attention. As a fundamental problem of image processing and computer vision, visual saliency detection algorithms have been extensively studied by researchers to solve practical tasks, such as image and video compression, image retargeting, object detection, etc. The visual attention mechanism adopted by saliency detection in general are divided into two categories, namely the bottom-up model and top-down model. The bottom-up attention algorithm focuses on utilizing the low-level visual features such as colour and edges to locate the salient objects. While the top-down attention utilizes the supervised learning to detect saliency. In recent years, more and more research tend to design deep neural networks with attention mechanisms to improve the accuracy of saliency detection. The design of deep attention neural network is inspired by human visual attention. The main goal is to enable the network to automatically capture the information that is critical to the target tasks and suppress irrelevant information, shift the attention from focusing on all to local. Currently various domain’s attention has been developed for saliency detection and semantic segmentation, such as the spatial attention module in convolution network, it generates a spatial attention map by utilizing the inter-spatial relationship of features; the channel attention module produces a attention by exploring the inter-channel relationship of features. All these well-designed attentions have been proven to be effective in improving the accuracy of saliency detection. This paper investigates the visual attention mechanism of salient object detection and applies it to digital histopathology image analysis for the detection and classification of breast cancer metastases. As shown in following contents, the main research contents include three parts: First, we studied the semantic attention mechanism and proposed a semantic attention approach to accurately localize the salient objects in complex scenarios. The proposed semantic attention uses Faster-RCNN to capture high-level deep features and replaces the last layer of Faster-RCNN by a FC layer and sigmoid function for visual saliency detection; it calculates proposals' attention probabilities by comparing their feature distances with the possible salient object. The proposed method introduces a re-weighting mechanism to reduce the influence of the complexity background, and a proposal selection mechanism to remove the background noise to obtain objects with accurate shape and contour. The simulation result shows that the semantic attention mechanism is robust to images with complex background due to the consideration of high-level object concept, the algorithm achieved outstanding performance among the salient object detection algorithms in the same period. Second, we designed a deep segmentation network (DSNet) for saliency object prediction. We explored a Pyramidal Attentional ASPP (PA-ASPP) module which can provide pixel level attention. DSNet extracts multi-level features with dilated ResNet-101 and the multiscale contextual information was locally weighted with the proposed PA-ASPP. The pyramid feature aggregation encodes the multi-level features from three different scales. This feature fusion incorporates neighboring scales of context features more precisely to produce better pixel-level attention. Finally, we use a scale-aware selection (SAS) module to locally weight multi-scale contextual features, capture important contexts of ASPP for the accurate and consistent dense prediction. The simulation results demonstrated that the proposed PA-ASPP is effective and can generate more coherent results. Besides, with the SAS, the model can adaptively capture the regions with different scales effectively. Finally, based on previous research on attentional mechanisms, we proposed a novel Deep Regional Metastases Segmentation (DRMS) framework for the detection and classification of breast cancer metastases. As we know, the digitalized whole slide image has high-resolution, usually has gigapixel, however the size of abnormal region is often relatively small, and most of the slide region are normal. The highly trained pathologists usually localize the regions of interest first in the whole slide, then perform precise examination in the selected regions. Even though the process is time-consuming and prone to miss diagnosis. Through observation and analysis, we believe that visual attention should be perfectly suited for the application of digital pathology image analysis. The integrated framework for WSI analysis can capture the granularity and variability of WSI, rich information from multi-grained pathological image. We first utilize the proposed attention mechanism based DSNet to detect the regional metastases in patch-level. Then, adopt the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to predict the whole metastases from individual slides. Finally, determine patient-level pN-stages by aggregating each individual slide-level prediction. In combination with the above techniques, the framework can make better use of the multi-grained information in histological lymph node section of whole-slice images. Experiments on large-scale clinical datasets (e.g., CAMELYON17) demonstrate that our method delivers advanced performance and provides consistent and accurate metastasis detection

    Underwater target detection based on improved YOLOv7

    Full text link
    Underwater target detection is a crucial aspect of ocean exploration. However, conventional underwater target detection methods face several challenges such as inaccurate feature extraction, slow detection speed and lack of robustness in complex underwater environments. To address these limitations, this study proposes an improved YOLOv7 network (YOLOv7-AC) for underwater target detection. The proposed network utilizes an ACmixBlock module to replace the 3x3 convolution block in the E-ELAN structure, and incorporates jump connections and 1x1 convolution architecture between ACmixBlock modules to improve feature extraction and network reasoning speed. Additionally, a ResNet-ACmix module is designed to avoid feature information loss and reduce computation, while a Global Attention Mechanism (GAM) is inserted in the backbone and head parts of the model to improve feature extraction. Furthermore, the K-means++ algorithm is used instead of K-means to obtain anchor boxes and enhance model accuracy. Experimental results show that the improved YOLOv7 network outperforms the original YOLOv7 model and other popular underwater target detection methods. The proposed network achieved a mean average precision (mAP) value of 89.6% and 97.4% on the URPC dataset and Brackish dataset, respectively, and demonstrated a higher frame per second (FPS) compared to the original YOLOv7 model. The source code for this study is publicly available at https://github.com/NZWANG/YOLOV7-AC. In conclusion, the improved YOLOv7 network proposed in this study represents a promising solution for underwater target detection and holds great potential for practical applications in various underwater tasks

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Collaborative Artificial Intelligence Algorithms for Medical Imaging Applications

    Get PDF
    In this dissertation, we propose novel machine learning algorithms for high-risk medical imaging applications. Specifically, we tackle current challenges in radiology screening process and introduce cutting-edge methods for image-based diagnosis, detection and segmentation. We incorporate expert knowledge through eye-tracking, making the whole process human-centered. This dissertation contributes to machine learning, computer vision, and medical imaging research by: 1) introducing a mathematical formulation of radiologists level of attention, and sparsifying their gaze data for a better extraction and comparison of search patterns. 2) proposing novel, local and global, image analysis algorithms. Imaging based diagnosis and pattern analysis are high-risk Artificial Intelligence applications. A standard radiology screening procedure includes detection, diagnosis and measurement (often done with segmentation) of abnormalities. We hypothesize that having a true collaboration is essential for a better control mechanism, in such applications. In this regard, we propose to form a collaboration medium between radiologists and machine learning algorithms through eye-tracking. Further, we build a generic platform consisting of novel machine learning algorithms for each of these tasks. Our collaborative algorithm utilizes eye tracking and includes an attention model and gaze-pattern analysis, based on data clustering and graph sparsification. Then, we present a semi-supervised multi-task network for local analysis of image in radiologists\u27 ROIs, extracted in the previous step. To address missing tumors and analyze regions that are completely missed by radiologists during screening, we introduce a detection framework, S4ND: Single Shot Single Scale Lung Nodule Detection. Our proposed detection algorithm is specifically designed to handle tiny abnormalities in lungs, which are easy to miss by radiologists. Finally, we introduce a novel projective adversarial framework, PAN: Projective Adversarial Network for Medical Image Segmentation, for segmenting complex 3D structures/organs, which can be beneficial in the screening process by guiding radiologists search areas through segmentation of desired structure/organ

    Machine Learning Algorithms for Robotic Navigation and Perception and Embedded Implementation Techniques

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Medical Image Segmentation with Deep Convolutional Neural Networks

    Get PDF
    Medical imaging is the technique and process of creating visual representations of the body of a patient for clinical analysis and medical intervention. Healthcare professionals rely heavily on medical images and image documentation for proper diagnosis and treatment. However, manual interpretation and analysis of medical images are time-consuming, and inaccurate when the interpreter is not well-trained. Fully automatic segmentation of the region of interest from medical images has been researched for years to enhance the efficiency and accuracy of understanding such images. With the advance of deep learning, various neural network models have gained great success in semantic segmentation and sparked research interests in medical image segmentation using deep learning. We propose three convolutional frameworks to segment tissues from different types of medical images. Comprehensive experiments and analyses are conducted on various segmentation neural networks to demonstrate the effectiveness of our methods. Furthermore, datasets built for training our networks and full implementations are published

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Improving Classification in Single and Multi-View Images

    Get PDF
    Image classification is a sub-field of computer vision that focuses on identifying objects within digital images. In order to improve image classification we must address the following areas of improvement: 1) Single and Multi-View data quality using data pre-processing techniques. 2) Enhancing deep feature learning to extract alternative representation of the data. 3) Improving decision or prediction of labels. This dissertation presents a series of four published papers that explore different improvements of image classification. In our first paper, we explore the Siamese network architecture to create a Convolution Neural Network based similarity metric. We learn the priority features that differentiate two given input images. The metric proposed achieves state-of-the-art Fβ measure. In our second paper, we explore multi-view data classification. We investigate the application of Generative Adversarial Networks GANs on Multi-view data image classification and few-shot learning. Experimental results show that our method outperforms state-of-the-art research. In our third paper, we take on the challenge of improving ResNet backbone model. For this task, we focus on improving channel attention mechanisms. We utilize Discrete Wavelet Transform compression to address the channel representation problem. Experimental results on ImageNet shows that our method outperforms baseline SENet-34 and SOTA FcaNet-34 at no extra computational cost. In our fourth paper, we investigate further the potential of orthogonalization of filters for extraction of diverse information for channel attention. We prove that using only random constant orthogonal filters is sufficient enough to achieve good channel attention. We test our proposed method using ImageNet, Places365, and Birds datasets for image classification, MS-COCO for object detection, and instance segmentation tasks. Our method outperforms FcaNet, and WaveNet and achieves the state-of-the-art results
    corecore