412 research outputs found
The Whole Pathological Slide Classification via Weakly Supervised Learning
Due to its superior efficiency in utilizing annotations and addressing
gigapixel-sized images, multiple instance learning (MIL) has shown great
promise as a framework for whole slide image (WSI) classification in digital
pathology diagnosis. However, existing methods tend to focus on advanced
aggregators with different structures, often overlooking the intrinsic features
of H\&E pathological slides. To address this limitation, we introduced two
pathological priors: nuclear heterogeneity of diseased cells and spatial
correlation of pathological tiles. Leveraging the former, we proposed a data
augmentation method that utilizes stain separation during extractor training
via a contrastive learning strategy to obtain instance-level representations.
We then described the spatial relationships between the tiles using an
adjacency matrix. By integrating these two views, we designed a multi-instance
framework for analyzing H\&E-stained tissue images based on pathological
inductive bias, encompassing feature extraction, filtering, and aggregation.
Extensive experiments on the Camelyon16 breast dataset and TCGA-NSCLC Lung
dataset demonstrate that our proposed framework can effectively handle tasks
related to cancer detection and differentiation of subtypes, outperforming
state-of-the-art medical image classification methods based on MIL. The code
will be released later
Methods for Detecting Floodwater on Roadways from Ground Level Images
Recent research and statistics show that the frequency of flooding in the world has been increasing and impacting flood-prone communities severely. This natural disaster causes significant damages to human life and properties, inundates roads, overwhelms drainage systems, and disrupts essential services and economic activities. The focus of this dissertation is to use machine learning methods to automatically detect floodwater in images from ground level in support of the frequently impacted communities. The ground level images can be retrieved from multiple sources, including the ones that are taken by mobile phone cameras as communities record the state of their flooded streets. The model developed in this research processes these images in multiple levels. The first detection model investigates the presence of flood in images by developing and comparing image classifiers with various feature extractors. Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), and pretrained convolutional neural networks are used as feature extractors. Then, decision trees, logistic regression, and K-Nearest Neighbors (K-NN) models are trained and tested for making predictions on floodwater presence in the image. Once the model detects flood in an image, it moves to the second layer to detect the presence of floodwater at a pixel level in each image. This pixel-level identification is achieved by semantic segmentation by using a super-pixel based prediction method and Fully Convolutional Neural Networks (FCNs). First, SLIC super-pixel method is used to create the super-pixels, then the same types of classifiers as the initial classification method are trained to predict the class of each super-pixel. Later, the FCN is trained end-to-end without any additional classifiers. Once these processes are done, images are segmented into regions of floodwater at pixel level. In both of the classification and semantic segmentation tasks, deep learning-based methods showed the best results. Once the model receives the confirmation of flood detection in image and pixel layers, it moves to the final task of finding the floodwater depth in images. This third and final layer of the model is critical as it can help officials deduce the severity of the flood at a given area. In order to detect the depth of the water and the severity of the flooding, the model processes the cars on streets that are in water and calculates the percentage of tires that are under water. This calculation is achieved with a mixture of deep learning and classical computer vision techniques. There are four main processes in this task: (i)-Semantic segmentation of the image into pixels that belong to background, floodwater, and wheels of vehicles. The segmentation is done by multiple FCN models that are trained with various base models. (ii)-Object detection models for detecting tires. The tires are identified by a You Only Look Once (YOLO) object detector. (iii)- Improvement of initial segmentation results. A U-Net like semantic segmentation network is proposed. It uses the tire patches from the object detector and the corresponding initial segmentation results, and it learns to fix the errors of the initial segmentation results. (iv)-Calculation of water depth as a ratio of the tire wheel under the water. This final task uses the improved segmentation results to identify the ellipses that correspond to the wheel parts of vehicles and utilizes two approaches listed below as part of a hybrid method: (i)-Using the improved segmentation results as they return the pixels belonging to the wheels. Boundaries of the wheels are found from this and used. (ii)-Finding arcs that belong to elliptical objects by applying a series of image processing methods. This method connects the arcs found to build larger structures such as two-piece (half ellipse), three-piece or four-piece (full) ellipses. Once the ellipse boundary is calculated using both methods, the ratio of the ellipse under floodwater can be calculated. This novel multi-model system allows us to attribute potential prediction errors to the different parts of the model such as semantic segmentation of the image or the calculation of the elliptical boundary. To verify the applicability of the proposed methods and to train the models, extensive hand-labeled datasets were created as part of this dissertation. The initial images were collected from the web, then the datasets were enriched by images created from virtual environments, simulations of neighborhoods under flood, using the Unity software.
In conclusion, the proposed methods in this dissertation, as validated on the labeled datasets, can successfully classify images as a flood scene, semantically segment the regions of flood, and predict the depth of water to indicate severit
The model of an anomaly detector for HiLumi LHC magnets based on Recurrent Neural Networks and adaptive quantization
This paper focuses on an examination of an applicability of Recurrent Neural
Network models for detecting anomalous behavior of the CERN superconducting
magnets. In order to conduct the experiments, the authors designed and
implemented an adaptive signal quantization algorithm and a custom GRU-based
detector and developed a method for the detector parameters selection. Three
different datasets were used for testing the detector. Two artificially
generated datasets were used to assess the raw performance of the system
whereas the 231 MB dataset composed of the signals acquired from HiLumi magnets
was intended for real-life experiments and model training. Several different
setups of the developed anomaly detection system were evaluated and compared
with state-of-the-art OC-SVM reference model operating on the same data. The
OC-SVM model was equipped with a rich set of feature extractors accounting for
a range of the input signal properties. It was determined in the course of the
experiments that the detector, along with its supporting design methodology,
reaches F1 equal or very close to 1 for almost all test sets. Due to the
profile of the data, the best_length setup of the detector turned out to
perform the best among all five tested configuration schemes of the detection
system. The quantization parameters have the biggest impact on the overall
performance of the detector with the best values of input/output grid equal to
16 and 8, respectively. The proposed solution of the detection significantly
outperformed OC-SVM-based detector in most of the cases, with much more stable
performance across all the datasets.Comment: Related to arXiv:1702.0083
Camera Based Object Detection for Indoor Scenes
This master thesis describes a practical implementation of a deep learning framework for object detection on the self-collected multiclass dataset. The research work presents multiple perspectives of the data collection, labelling, preprocessing and training popular object detection architectures. The challenges in the collection of multiclass object detection dataset from the indoor premises and annotation process are presented with possible solutions. The performance evaluations of the trained object detectors are measured in terms of precision, recall, F1-score, mAP and processing speed.
We experimented multiple object detection architectures that were available on the TensorFlow object detection model zoo. The multiclass dataset collected from the indoor premises were used to train and evaluate the performance of modern convolutional object detection models. We studied two scenarios, (a) pretrained object detection model and (b) fine-tuned detection model on the self-collected multiclass dataset. The performance of fine-tuned object detectors was better than the pretrained detectors. From our experiment, we found that region based convolutional neural network architectures have superior detection accuracy on our dataset. Faster region-based convolutional neural network (RCNN) architecture with residual networks features extractor has the best detection accuracy. Single shot multi-box detector (SSD) models are comparatively less precise in detection. However, they are faster in computation and easier to deploy in mobile and embedded devices. It is found that the region-based fully convolutional network (RFCN) is the suitable alternative for multi-class object detection considering the speed/accuracy trade-offs
Recommended from our members
Automatic sound synthesizer programming: techniques and applications
The aim of this thesis is to investigate techniques for, and applications of automatic sound synthesizer programming. An automatic sound synthesizer programmer is a system which removes the requirement to explicitly specify parameter settings for a sound synthesis algorithm from the user. Two forms of these systems are discussed in this thesis:
tone matching programmers and synthesis space explorers. A tone matching programmer takes at its input a sound synthesis algorithm and a desired target sound. At its output it produces a configuration for the sound synthesis algorithm which causes it to emit a
similar sound to the target. The techniques for achieving this that are investigated are
genetic algorithms, neural networks, hill climbers and data driven approaches. A synthesis
space explorer provides a user with a representation of the space of possible sounds
that a synthesizer can produce and allows them to interactively explore this space. The
applications of automatic sound synthesizer programming that are investigated include
studio tools, an autonomous musical agent and a self-reprogramming drum machine. The
research employs several methodologies: the development of novel software frameworks
and tools, the examination of existing software at the source code and performance levels
and user trials of the tools and software. The main contributions made are: a method
for visualisation of sound synthesis space and low dimensional control of sound synthesizers; a general purpose framework for the deployment and testing of sound synthesis and optimisation algorithms in the SuperCollider language sclang; a comparison of a variety of optimisation techniques for sound synthesizer programming; an analysis of sound synthesizer error surfaces; a general purpose sound synthesizer programmer compatible with industry standard tools; an automatic improviser which passes a loose equivalent of the Turing test for Jazz musicians, i.e. being half of a man-machine duet which was rated as one of the best sessions of 2009 on the BBC's 'Jazz on 3' programme
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Temporal sentence grounding in videos (TSGV), \aka natural language video
localization (NLVL) or video moment retrieval (VMR), aims to retrieve a
temporal moment that semantically corresponds to a language query from an
untrimmed video. Connecting computer vision and natural language, TSGV has
drawn significant attention from researchers in both communities. This survey
attempts to provide a summary of fundamental concepts in TSGV and current
research status, as well as future research directions. As the background, we
present a common structure of functional components in TSGV, in a tutorial
style: from feature extraction from raw video and language query, to answer
prediction of the target moment. Then we review the techniques for multimodal
understanding and interaction, which is the key focus of TSGV for effective
alignment between the two modalities. We construct a taxonomy of TSGV
techniques and elaborate the methods in different categories with their
strengths and weaknesses. Lastly, we discuss issues with the current TSGV
research and share our insights about promising research directions.Comment: 29 pages, 32 figures, 9 table
Classification of Leukocytes Using Meta-Learning and Color Constancy Methods
In the human healthcare area, leukocytes are very important blood cells for the diagnosis of different pathologies, like leukemia. Recent technology and image-processing methods have contributed to the image classification of leukocytes. Especially, machine learning paradigms have been used for the classification of leukocyte images. However, reported models do not leverage the knowledge produced by the classification of leukocytes to solve similar tasks. For example, the knowledge can be reused to classify images collected with different types of microscopes and image-processing techniques. Therefore, we propose a meta-learning methodology for the classification of leukocyte images using different color constancy methods involving previous knowledge. Our methodology is trained with a specific task at the meta-level, and the knowledge produced is used to solve a different task at the base-level. For the meta-level, we implemented meta-models based on Xception, and for the base-level, we used support vector machine classifiers. Besides, we analyzed the Shades of Gray color constancy method commonly used in skin lesion diagnosis and now implemented for leukocyte images. Our methodology, at the meta-level, achieved 89.28% for precision, 95.65% for sensitivity, 91.78% for F1-score, and 94.40% for accuracy. These scores are competitive regarding the reported state-of-the-art models, especially the sensitivity which is very important for imbalanced datasets, and our meta-model outperforms previous works by +2.25%. Additionally, for the basophil images that were acquired from a chronic myeloid leukemia-positive sample, our meta-model obtained 100% for sensitivity. Moreover, we present an algorithm that generates a new conditioned output at the base-level obtaining highly competitive scores of 91.56% for sensitivity and F1 scores, 95.61% for precision, and 96.47% for accuracy. The findings indicate that our proposed meta-learning methodology can be applied to other medical image classification tasks and achieve high performances by reusing knowledge and reducing the training time for new similar tasks
Recommended from our members
Discovering multi-purpose modules through deep multitask learning
Machine learning scientists aim to discover techniques that can be applied across diverse sets of problems. Such techniques need to exploit regularities that are shared across tasks. This begs the question: What shared regularity is not yet being exploited? Complex tasks may share structure that is difficult for humans to discover. The goal of deep multitask learning is to discover and exploit this structure automatically by training a joint model across tasks. To this end, this dissertation introduces a deep multitask learning framework for collecting generic functional modules that are used in different ways to solve different problems. Within this framework, a progression of systems is developed based on assembling shared modules into task models and leveraging the complementary advantages of gradient descent and evolutionary optimization. In experiments, these systems confirm that modular sharing improves performance across a range of application areas, including general video game playing, computer vision, natural language processing, and genomics; yielding state-of-the-art results in several cases. The conclusion is that multi-purpose modules discovered by deep multitask learning can exceed those developed by humans in performance and generality.Computer Science
- …