1,433 research outputs found

    注目領域検出のための視覚的注意モデル設計に関する研究

    Get PDF
    Visual attention is an important mechanism in the human visual system. When human observe images and videos, they usually do not describe all the contents in them. Instead, they tend to talk about the semantically important regions and objects in the images. The human eye is usually attracted by some regions of interest rather than the entire scene. These regions of interest that present the mainly meaningful or semantic content are called saliency region. Visual saliency detection refers to the use of intelligent algorithms to simulate human visual attention mechanism, extract both the low-level features and high-level semantic information and localize the salient object regions in images and videos. The generated saliency map indicates the regions that are likely to attract human attention. As a fundamental problem of image processing and computer vision, visual saliency detection algorithms have been extensively studied by researchers to solve practical tasks, such as image and video compression, image retargeting, object detection, etc. The visual attention mechanism adopted by saliency detection in general are divided into two categories, namely the bottom-up model and top-down model. The bottom-up attention algorithm focuses on utilizing the low-level visual features such as colour and edges to locate the salient objects. While the top-down attention utilizes the supervised learning to detect saliency. In recent years, more and more research tend to design deep neural networks with attention mechanisms to improve the accuracy of saliency detection. The design of deep attention neural network is inspired by human visual attention. The main goal is to enable the network to automatically capture the information that is critical to the target tasks and suppress irrelevant information, shift the attention from focusing on all to local. Currently various domain’s attention has been developed for saliency detection and semantic segmentation, such as the spatial attention module in convolution network, it generates a spatial attention map by utilizing the inter-spatial relationship of features; the channel attention module produces a attention by exploring the inter-channel relationship of features. All these well-designed attentions have been proven to be effective in improving the accuracy of saliency detection. This paper investigates the visual attention mechanism of salient object detection and applies it to digital histopathology image analysis for the detection and classification of breast cancer metastases. As shown in following contents, the main research contents include three parts: First, we studied the semantic attention mechanism and proposed a semantic attention approach to accurately localize the salient objects in complex scenarios. The proposed semantic attention uses Faster-RCNN to capture high-level deep features and replaces the last layer of Faster-RCNN by a FC layer and sigmoid function for visual saliency detection; it calculates proposals' attention probabilities by comparing their feature distances with the possible salient object. The proposed method introduces a re-weighting mechanism to reduce the influence of the complexity background, and a proposal selection mechanism to remove the background noise to obtain objects with accurate shape and contour. The simulation result shows that the semantic attention mechanism is robust to images with complex background due to the consideration of high-level object concept, the algorithm achieved outstanding performance among the salient object detection algorithms in the same period. Second, we designed a deep segmentation network (DSNet) for saliency object prediction. We explored a Pyramidal Attentional ASPP (PA-ASPP) module which can provide pixel level attention. DSNet extracts multi-level features with dilated ResNet-101 and the multiscale contextual information was locally weighted with the proposed PA-ASPP. The pyramid feature aggregation encodes the multi-level features from three different scales. This feature fusion incorporates neighboring scales of context features more precisely to produce better pixel-level attention. Finally, we use a scale-aware selection (SAS) module to locally weight multi-scale contextual features, capture important contexts of ASPP for the accurate and consistent dense prediction. The simulation results demonstrated that the proposed PA-ASPP is effective and can generate more coherent results. Besides, with the SAS, the model can adaptively capture the regions with different scales effectively. Finally, based on previous research on attentional mechanisms, we proposed a novel Deep Regional Metastases Segmentation (DRMS) framework for the detection and classification of breast cancer metastases. As we know, the digitalized whole slide image has high-resolution, usually has gigapixel, however the size of abnormal region is often relatively small, and most of the slide region are normal. The highly trained pathologists usually localize the regions of interest first in the whole slide, then perform precise examination in the selected regions. Even though the process is time-consuming and prone to miss diagnosis. Through observation and analysis, we believe that visual attention should be perfectly suited for the application of digital pathology image analysis. The integrated framework for WSI analysis can capture the granularity and variability of WSI, rich information from multi-grained pathological image. We first utilize the proposed attention mechanism based DSNet to detect the regional metastases in patch-level. Then, adopt the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to predict the whole metastases from individual slides. Finally, determine patient-level pN-stages by aggregating each individual slide-level prediction. In combination with the above techniques, the framework can make better use of the multi-grained information in histological lymph node section of whole-slice images. Experiments on large-scale clinical datasets (e.g., CAMELYON17) demonstrate that our method delivers advanced performance and provides consistent and accurate metastasis detection

    Deep Networks Based Energy Models for Object Recognition from Multimodality Images

    Get PDF
    Object recognition has been extensively investigated in computer vision area, since it is a fundamental and essential technique in many important applications, such as robotics, auto-driving, automated manufacturing, and security surveillance. According to the selection criteria, object recognition mechanisms can be broadly categorized into object proposal and classification, eye fixation prediction and saliency object detection. Object proposal tends to capture all potential objects from natural images, and then classify them into predefined groups for image description and interpretation. For a given natural image, human perception is normally attracted to the most visually important regions/objects. Therefore, eye fixation prediction attempts to localize some interesting points or small regions according to human visual system (HVS). Based on these interesting points and small regions, saliency object detection algorithms propagate the important extracted information to achieve a refined segmentation of the whole salient objects. In addition to natural images, object recognition also plays a critical role in clinical practice. The informative insights of anatomy and function of human body obtained from multimodality biomedical images such as magnetic resonance imaging (MRI), transrectal ultrasound (TRUS), computed tomography (CT) and positron emission tomography (PET) facilitate the precision medicine. Automated object recognition from biomedical images empowers the non-invasive diagnosis and treatments via automated tissue segmentation, tumor detection and cancer staging. The conventional recognition methods normally utilize handcrafted features (such as oriented gradients, curvature, Haar features, Haralick texture features, Laws energy features, etc.) depending on the image modalities and object characteristics. It is challenging to have a general model for object recognition. Superior to handcrafted features, deep neural networks (DNN) can extract self-adaptive features corresponding with specific task, hence can be employed for general object recognition models. These DNN-features are adjusted semantically and cognitively by over tens of millions parameters corresponding to the mechanism of human brain, therefore leads to more accurate and robust results. Motivated by it, in this thesis, we proposed DNN-based energy models to recognize object on multimodality images. For the aim of object recognition, the major contributions of this thesis can be summarized below: 1. We firstly proposed a new comprehensive autoencoder model to recognize the position and shape of prostate from magnetic resonance images. Different from the most autoencoder-based methods, we focused on positive samples to train the model in which the extracted features all come from prostate. After that, an image energy minimization scheme was applied to further improve the recognition accuracy. The proposed model was compared with three classic classifiers (i.e. support vector machine with radial basis function kernel, random forest, and naive Bayes), and demonstrated significant superiority for prostate recognition on magnetic resonance images. We further extended the proposed autoencoder model for saliency object detection on natural images, and the experimental validation proved the accurate and robust saliency object detection results of our model. 2. A general multi-contexts combined deep neural networks (MCDN) model was then proposed for object recognition from natural images and biomedical images. Under one uniform framework, our model was performed in multi-scale manner. Our model was applied for saliency object detection from natural images as well as prostate recognition from magnetic resonance images. Our experimental validation demonstrated that the proposed model was competitive to current state-of-the-art methods. 3. We designed a novel saliency image energy to finely segment salient objects on basis of our MCDN model. The region priors were taken into account in the energy function to avoid trivial errors. Our method outperformed state-of-the-art algorithms on five benchmarking datasets. In the experiments, we also demonstrated that our proposed saliency image energy can boost the results of other conventional saliency detection methods

    Automated Detection of Vessel Abnormalities on Fluorescein Angiogram in Malarial Retinopathy

    Get PDF
    The detection and assessment of intravascular filling defects is important, because they may represent a process central to cerebral malaria pathogenesis: neurovascular sequestration. We have developed and validated a framework that can automatically detect intravascular filling defects in fluorescein angiogram images. It first employs a state-of-the-art segmentation approach to extract the vessels from images and then divide them into individual segments by geometrical analysis. A feature vector based on the intensity and shape of saliency maps is generated to represent the level of abnormality of each vessel segment. An AdaBoost classifier with weighted cost coefficient is trained to classify the vessel segments into normal and abnormal categories. To demonstrate its effectiveness, we apply this framework to 6,358 vessel segments in images from 10 patients with malarial retinopathy. The test sensitivity, specificity, accuracy, and area under curve (AUC) are 74.7%, 73.5%, 74.1% and 74.2% respectively when compared to the reference standard of human expert manual annotations. This performance is comparable to the agreement that we find between human observers of intravascular filling defects. Our method will be a powerful new tool for studying malarial retinopathy

    Utilising Visual Attention Cues for Vehicle Detection and Tracking

    Get PDF
    Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of \approx8\% in object detection which effectively increased tracking performance by \approx4\% on KITTI dataset.Comment: Accepted in ICPR202

    Attention and Anticipation in Fast Visual-Inertial Navigation

    Get PDF
    We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table

    Low-cost, high-resolution, fault-robust position and speed estimation for PMSM drives operating in safety-critical systems

    Get PDF
    In this paper it is shown how to obtain a low-cost, high-resolution and fault-robust position sensing system for permanent magnet synchronous motor drives operating in safety-critical systems, by combining high-frequency signal injection with binary Hall-effect sensors. It is shown that the position error signal obtained via high-frequency signal injection can be merged easily into the quantization-harmonic-decoupling vector tracking observer used to process the Hall-effect sensor signals. The resulting algorithm provides accurate, high-resolution estimates of speed and position throughout the entire speed range; compared to state-of-the-art drives using Hall-effect sensors alone, the low speed performance is greatly improved in healthy conditions and also following position sensor faults. It is envisaged that such a sensing system can be successfully used in applications requiring IEC 61508 SIL 3 or ISO 26262 ASIL D compliance, due to its extremely high mean time to failure and to the very fast recovery of the drive following Hall-effect sensor faults at low speeds. Extensive simulation and experimental results are provided on a 3.7 kW permanent magnet drive
    corecore