412 research outputs found

    The Whole Pathological Slide Classification via Weakly Supervised Learning

    Full text link
    Due to its superior efficiency in utilizing annotations and addressing gigapixel-sized images, multiple instance learning (MIL) has shown great promise as a framework for whole slide image (WSI) classification in digital pathology diagnosis. However, existing methods tend to focus on advanced aggregators with different structures, often overlooking the intrinsic features of H\&E pathological slides. To address this limitation, we introduced two pathological priors: nuclear heterogeneity of diseased cells and spatial correlation of pathological tiles. Leveraging the former, we proposed a data augmentation method that utilizes stain separation during extractor training via a contrastive learning strategy to obtain instance-level representations. We then described the spatial relationships between the tiles using an adjacency matrix. By integrating these two views, we designed a multi-instance framework for analyzing H\&E-stained tissue images based on pathological inductive bias, encompassing feature extraction, filtering, and aggregation. Extensive experiments on the Camelyon16 breast dataset and TCGA-NSCLC Lung dataset demonstrate that our proposed framework can effectively handle tasks related to cancer detection and differentiation of subtypes, outperforming state-of-the-art medical image classification methods based on MIL. The code will be released later

    Methods for Detecting Floodwater on Roadways from Ground Level Images

    Get PDF
    Recent research and statistics show that the frequency of flooding in the world has been increasing and impacting flood-prone communities severely. This natural disaster causes significant damages to human life and properties, inundates roads, overwhelms drainage systems, and disrupts essential services and economic activities. The focus of this dissertation is to use machine learning methods to automatically detect floodwater in images from ground level in support of the frequently impacted communities. The ground level images can be retrieved from multiple sources, including the ones that are taken by mobile phone cameras as communities record the state of their flooded streets. The model developed in this research processes these images in multiple levels. The first detection model investigates the presence of flood in images by developing and comparing image classifiers with various feature extractors. Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), and pretrained convolutional neural networks are used as feature extractors. Then, decision trees, logistic regression, and K-Nearest Neighbors (K-NN) models are trained and tested for making predictions on floodwater presence in the image. Once the model detects flood in an image, it moves to the second layer to detect the presence of floodwater at a pixel level in each image. This pixel-level identification is achieved by semantic segmentation by using a super-pixel based prediction method and Fully Convolutional Neural Networks (FCNs). First, SLIC super-pixel method is used to create the super-pixels, then the same types of classifiers as the initial classification method are trained to predict the class of each super-pixel. Later, the FCN is trained end-to-end without any additional classifiers. Once these processes are done, images are segmented into regions of floodwater at pixel level. In both of the classification and semantic segmentation tasks, deep learning-based methods showed the best results. Once the model receives the confirmation of flood detection in image and pixel layers, it moves to the final task of finding the floodwater depth in images. This third and final layer of the model is critical as it can help officials deduce the severity of the flood at a given area. In order to detect the depth of the water and the severity of the flooding, the model processes the cars on streets that are in water and calculates the percentage of tires that are under water. This calculation is achieved with a mixture of deep learning and classical computer vision techniques. There are four main processes in this task: (i)-Semantic segmentation of the image into pixels that belong to background, floodwater, and wheels of vehicles. The segmentation is done by multiple FCN models that are trained with various base models. (ii)-Object detection models for detecting tires. The tires are identified by a You Only Look Once (YOLO) object detector. (iii)- Improvement of initial segmentation results. A U-Net like semantic segmentation network is proposed. It uses the tire patches from the object detector and the corresponding initial segmentation results, and it learns to fix the errors of the initial segmentation results. (iv)-Calculation of water depth as a ratio of the tire wheel under the water. This final task uses the improved segmentation results to identify the ellipses that correspond to the wheel parts of vehicles and utilizes two approaches listed below as part of a hybrid method: (i)-Using the improved segmentation results as they return the pixels belonging to the wheels. Boundaries of the wheels are found from this and used. (ii)-Finding arcs that belong to elliptical objects by applying a series of image processing methods. This method connects the arcs found to build larger structures such as two-piece (half ellipse), three-piece or four-piece (full) ellipses. Once the ellipse boundary is calculated using both methods, the ratio of the ellipse under floodwater can be calculated. This novel multi-model system allows us to attribute potential prediction errors to the different parts of the model such as semantic segmentation of the image or the calculation of the elliptical boundary. To verify the applicability of the proposed methods and to train the models, extensive hand-labeled datasets were created as part of this dissertation. The initial images were collected from the web, then the datasets were enriched by images created from virtual environments, simulations of neighborhoods under flood, using the Unity software. In conclusion, the proposed methods in this dissertation, as validated on the labeled datasets, can successfully classify images as a flood scene, semantically segment the regions of flood, and predict the depth of water to indicate severit

    The model of an anomaly detector for HiLumi LHC magnets based on Recurrent Neural Networks and adaptive quantization

    Full text link
    This paper focuses on an examination of an applicability of Recurrent Neural Network models for detecting anomalous behavior of the CERN superconducting magnets. In order to conduct the experiments, the authors designed and implemented an adaptive signal quantization algorithm and a custom GRU-based detector and developed a method for the detector parameters selection. Three different datasets were used for testing the detector. Two artificially generated datasets were used to assess the raw performance of the system whereas the 231 MB dataset composed of the signals acquired from HiLumi magnets was intended for real-life experiments and model training. Several different setups of the developed anomaly detection system were evaluated and compared with state-of-the-art OC-SVM reference model operating on the same data. The OC-SVM model was equipped with a rich set of feature extractors accounting for a range of the input signal properties. It was determined in the course of the experiments that the detector, along with its supporting design methodology, reaches F1 equal or very close to 1 for almost all test sets. Due to the profile of the data, the best_length setup of the detector turned out to perform the best among all five tested configuration schemes of the detection system. The quantization parameters have the biggest impact on the overall performance of the detector with the best values of input/output grid equal to 16 and 8, respectively. The proposed solution of the detection significantly outperformed OC-SVM-based detector in most of the cases, with much more stable performance across all the datasets.Comment: Related to arXiv:1702.0083

    Camera Based Object Detection for Indoor Scenes

    Get PDF
    This master thesis describes a practical implementation of a deep learning framework for object detection on the self-collected multiclass dataset. The research work presents multiple perspectives of the data collection, labelling, preprocessing and training popular object detection architectures. The challenges in the collection of multiclass object detection dataset from the indoor premises and annotation process are presented with possible solutions. The performance evaluations of the trained object detectors are measured in terms of precision, recall, F1-score, mAP and processing speed. We experimented multiple object detection architectures that were available on the TensorFlow object detection model zoo. The multiclass dataset collected from the indoor premises were used to train and evaluate the performance of modern convolutional object detection models. We studied two scenarios, (a) pretrained object detection model and (b) fine-tuned detection model on the self-collected multiclass dataset. The performance of fine-tuned object detectors was better than the pretrained detectors. From our experiment, we found that region based convolutional neural network architectures have superior detection accuracy on our dataset. Faster region-based convolutional neural network (RCNN) architecture with residual networks features extractor has the best detection accuracy. Single shot multi-box detector (SSD) models are comparatively less precise in detection. However, they are faster in computation and easier to deploy in mobile and embedded devices. It is found that the region-based fully convolutional network (RFCN) is the suitable alternative for multi-class object detection considering the speed/accuracy trade-offs

    Temporal Sentence Grounding in Videos: A Survey and Future Directions

    Full text link
    Temporal sentence grounding in videos (TSGV), \aka natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has drawn significant attention from researchers in both communities. This survey attempts to provide a summary of fundamental concepts in TSGV and current research status, as well as future research directions. As the background, we present a common structure of functional components in TSGV, in a tutorial style: from feature extraction from raw video and language query, to answer prediction of the target moment. Then we review the techniques for multimodal understanding and interaction, which is the key focus of TSGV for effective alignment between the two modalities. We construct a taxonomy of TSGV techniques and elaborate the methods in different categories with their strengths and weaknesses. Lastly, we discuss issues with the current TSGV research and share our insights about promising research directions.Comment: 29 pages, 32 figures, 9 table

    Classification of Leukocytes Using Meta-Learning and Color Constancy Methods

    Get PDF
    In the human healthcare area, leukocytes are very important blood cells for the diagnosis of different pathologies, like leukemia. Recent technology and image-processing methods have contributed to the image classification of leukocytes. Especially, machine learning paradigms have been used for the classification of leukocyte images. However, reported models do not leverage the knowledge produced by the classification of leukocytes to solve similar tasks. For example, the knowledge can be reused to classify images collected with different types of microscopes and image-processing techniques. Therefore, we propose a meta-learning methodology for the classification of leukocyte images using different color constancy methods involving previous knowledge. Our methodology is trained with a specific task at the meta-level, and the knowledge produced is used to solve a different task at the base-level. For the meta-level, we implemented meta-models based on Xception, and for the base-level, we used support vector machine classifiers. Besides, we analyzed the Shades of Gray color constancy method commonly used in skin lesion diagnosis and now implemented for leukocyte images. Our methodology, at the meta-level, achieved 89.28% for precision, 95.65% for sensitivity, 91.78% for F1-score, and 94.40% for accuracy. These scores are competitive regarding the reported state-of-the-art models, especially the sensitivity which is very important for imbalanced datasets, and our meta-model outperforms previous works by +2.25%. Additionally, for the basophil images that were acquired from a chronic myeloid leukemia-positive sample, our meta-model obtained 100% for sensitivity. Moreover, we present an algorithm that generates a new conditioned output at the base-level obtaining highly competitive scores of 91.56% for sensitivity and F1 scores, 95.61% for precision, and 96.47% for accuracy. The findings indicate that our proposed meta-learning methodology can be applied to other medical image classification tasks and achieve high performances by reusing knowledge and reducing the training time for new similar tasks
    • …
    corecore