2,404 research outputs found

    Studies on deep learning approach in breast lesions detection and cancer diagnosis in mammograms

    Get PDF
    Breast cancer accounts for the largest proportion of newly diagnosed cancers in women recently. Early diagnosis of breast cancer can improve treatment outcomes and reduce mortality. Mammography is convenient and reliable, which is the most commonly used method for breast cancer screening. However, manual examinations are limited by the cost and experience of radiologists, which introduce a high false positive rate and false examination. Therefore, a high-performance computer-aided diagnosis (CAD) system is significant for lesions detection and cancer diagnosis. Traditional CADs for cancer diagnosis require a large number of features selected manually and remain a high false positive rate. The methods based on deep learning can automatically extract image features through the network, but their performance is limited by the problems of multicenter data biases, the complexity of lesion features, and the high cost of annotations. Therefore, it is necessary to propose a CAD system to improve the ability of lesion detection and cancer diagnosis, which is optimized for the above problems. This thesis aims to utilize deep learning methods to improve the CADs' performance and effectiveness of lesion detection and cancer diagnosis. Starting from the detection of multi-type lesions using deep learning methods based on full consideration of characteristics of mammography, this thesis explores the detection method of microcalcification based on multiscale feature fusion and the detection method of mass based on multi-view enhancing. Then, a classification method based on multi-instance learning is developed, which integrates the detection results from the above methods, to realize the precise lesions detection and cancer diagnosis in mammography. For the detection of microcalcification, a microcalcification detection network named MCDNet is proposed to overcome the problems of multicenter data biases, the low resolution of network inputs, and scale differences between microcalcifications. In MCDNet, Adaptive Image Adjustment mitigates the impact of multicenter biases and maximizes the input effective pixels. Then, the proposed pyramid network with shortcut connections ensures that the feature maps for detection contain more precise localization and classification information about multiscale objects. In the structure, trainable Weighted Feature Fusion is proposed to improve the detection performance of both scale objects by learning the contribution of feature maps in different stages. The experiments show that MCDNet outperforms other methods on robustness and precision. In case the average number of false positives per image is 1, the recall rates of benign and malignant microcalcification are 96.8% and 98.9%, respectively. MCDNet can effectively help radiologists detect microcalcifications in clinical applications. For the detection of breast masses, a weakly supervised multi-view enhancing mass detection network named MVMDNet is proposed to solve the lack of lesion-level labels. MVMDNet can be trained on the image-level labeled dataset and extract the extra localization information by exploring the geometric relation between multi-view mammograms. In Multi-view Enhancing, Spatial Correlation Attention is proposed to extract correspondent location information between different views while Sigmoid Weighted Fusion module fuse diagnostic and auxiliary features to improve the precision of localization. CAM-based Detection module is proposed to provide detections for mass through the classification labels. The results of experiments on both in-house dataset and public dataset, [email protected] and [email protected] (recall rate@average number of false positive per image), demonstrate MVMDNet achieves state-of-art performances among weakly supervised methods and has robust generalization ability to alleviate the multicenter biases. In the study of cancer diagnosis, a breast cancer classification network named CancerDNet based on Multi-instance Learning is proposed. CancerDNet successfully solves the problem that the features of lesions are complex in whole image classification utilizing the lesion detection results from the previous chapters. Whole Case Bag Learning is proposed to combined the features extracted from four-view, which works like a radiologist to realize the classification of each case. Low-capacity Instance Learning and High-capacity Instance Learning successfully integrate the detections of multi-type lesions into the CancerDNet, so that the model can fully consider lesions with complex features in the classification task. CancerDNet achieves the AUC of 0.907 and AUC of 0.925 on the in-house and the public datasets, respectively, which is better than current methods. The results show that CancerDNet achieves a high-performance cancer diagnosis. In the works of the above three parts, this thesis fully considers the characteristics of mammograms and proposes methods based on deep learning for lesions detection and cancer diagnosis. The results of experiments on in-house and public datasets show that the methods proposed in this thesis achieve the state-of-the-art in the microcalcifications detection, masses detection, and the case-level classification of cancer and have a strong ability of multicenter generalization. The results also prove that the methods proposed in this thesis can effectively assist radiologists in making the diagnosis while saving labor costs

    Weakly Supervised 3D Open-vocabulary Segmentation

    Full text link
    Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation. A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations in certain scenes, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs. Code is available at \url{https://github.com/Kunhao-Liu/3D-OVS}.Comment: Accepted to NeurIPS 202

    MuRAL: Multi-Scale Region-based Active Learning for Object Detection

    Full text link
    Obtaining large-scale labeled object detection dataset can be costly and time-consuming, as it involves annotating images with bounding boxes and class labels. Thus, some specialized active learning methods have been proposed to reduce the cost by selecting either coarse-grained samples or fine-grained instances from unlabeled data for labeling. However, the former approaches suffer from redundant labeling, while the latter methods generally lead to training instability and sampling bias. To address these challenges, we propose a novel approach called Multi-scale Region-based Active Learning (MuRAL) for object detection. MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects and improve training performance. The informative region score is designed to consider both the predicted confidence of instances and the distribution of each object category, enabling our method to focus more on difficult-to-detect classes. Moreover, MuRAL employs a scale-aware selection strategy that ensures diverse regions are selected from different scales for labeling and downstream finetuning, which enhances training stability. Our proposed method surpasses all existing coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets, and demonstrates significant improvement in difficult category performance
    • …
    corecore