44 research outputs found
Learning to Segment Breast Biopsy Whole Slide Images
We trained and applied an encoder-decoder model to semantically segment
breast biopsy images into biologically meaningful tissue labels. Since
conventional encoder-decoder networks cannot be applied directly on large
biopsy images and the different sized structures in biopsies present novel
challenges, we propose four modifications: (1) an input-aware encoding block to
compensate for information loss, (2) a new dense connection pattern between
encoder and decoder, (3) dense and sparse decoders to combine multi-level
features, (4) a multi-resolution network that fuses the results of
encoder-decoders run on different resolutions. Our model outperforms a
feature-based approach and conventional encoder-decoders from the literature.
We use semantic segmentations produced with our model in an automated diagnosis
task and obtain higher accuracies than a baseline approach that employs an SVM
for feature-based segmentation, both using the same segmentation-based
diagnostic features.Comment: Added more WSI images in appendi
PU-NET Deep Learning Architecture for Gliomas Brain Tumor Segmentation in Magnetic Resonance Images
Automatic medical image segmentation is one of the main tasks for many organs and pathology structures delineation. It is also a crucial technique in the posterior clinical examination of brain tumors, like applying radiotherapy or tumor restrictions. Various image segmentation techniques have been proposed and applied to different image types. Recently, it has been shown that the deep learning approach accurately segments images, and its implementation is usually straightforward. In this paper, we proposed a novel approach, called PU-NET, for automatic brain tumor segmentation in multi-modal magnetic resonance images (MRI). We introduced an input processing block to a customized fully convolutional network derived from the U-Net network to handle the multi-modal inputs. We performed experiments over the Brain Tumor Segmentation (BRATS) dataset collected in 2018 and achieved Dice scores of 90.5%, 82.7%, and 80.3% for the whole tumor, tumor core, and enhancing tumor classes, respectively. This study provides promising results compared to the deep learning methods used in this context
Magnetic resonance image-based brain tumour segmentation methods : a systematic review
Background:
Image segmentation is an essential step in the analysis and subsequent characterisation of brain tumours through magnetic resonance imaging. In the literature, segmentation methods are empowered by open-access magnetic resonance imaging datasets, such as the brain tumour segmentation dataset. Moreover, with the increased use of artificial intelligence methods in medical imaging, access to larger data repositories has become vital in method development.
Purpose:
To determine what automated brain tumour segmentation techniques can medical imaging specialists and clinicians use to identify tumour components, compared to manual segmentation.
Methods:
We conducted a systematic review of 572 brain tumour segmentation studies during 2015–2020. We reviewed segmentation techniques using T1-weighted, T2-weighted, gadolinium-enhanced T1-weighted, fluid-attenuated inversion recovery, diffusion-weighted and perfusion-weighted magnetic resonance imaging sequences. Moreover, we assessed physics or mathematics-based methods, deep learning methods, and software-based or semi-automatic methods, as applied to magnetic resonance imaging techniques. Particularly, we synthesised each method as per the utilised magnetic resonance imaging sequences, study population, technical approach (such as deep learning) and performance score measures (such as Dice score).
Statistical tests:
We compared median Dice score in segmenting the whole tumour, tumour core and enhanced tumour.
Results:
We found that T1-weighted, gadolinium-enhanced T1-weighted, T2-weighted and fluid-attenuated inversion recovery magnetic resonance imaging are used the most in various segmentation algorithms. However, there is limited use of perfusion-weighted and diffusion-weighted magnetic resonance imaging. Moreover, we found that the U-Net deep learning technology is cited the most, and has high accuracy (Dice score 0.9) for magnetic resonance imaging-based brain tumour segmentation.
Conclusion:
U-Net is a promising deep learning technology for magnetic resonance imaging-based brain tumour segmentation. The community should be encouraged to contribute open-access datasets so training, testing and validation of deep learning algorithms can be improved, particularly for diffusion- and perfusion-weighted magnetic resonance imaging, where there are limited datasets available
Research progress on deep learning in magnetic resonance imaging–based diagnosis and treatment of prostate cancer: a review on the current status and perspectives
Multiparametric magnetic resonance imaging (mpMRI) has emerged as a first-line screening and diagnostic tool for prostate cancer, aiding in treatment selection and noninvasive radiotherapy guidance. However, the manual interpretation of MRI data is challenging and time-consuming, which may impact sensitivity and specificity. With recent technological advances, artificial intelligence (AI) in the form of computer-aided diagnosis (CAD) based on MRI data has been applied to prostate cancer diagnosis and treatment. Among AI techniques, deep learning involving convolutional neural networks contributes to detection, segmentation, scoring, grading, and prognostic evaluation of prostate cancer. CAD systems have automatic operation, rapid processing, and accuracy, incorporating multiple sequences of multiparametric MRI data of the prostate gland into the deep learning model. Thus, they have become a research direction of great interest, especially in smart healthcare. This review highlights the current progress of deep learning technology in MRI-based diagnosis and treatment of prostate cancer. The key elements of deep learning-based MRI image processing in CAD systems and radiotherapy of prostate cancer are briefly described, making it understandable not only for radiologists but also for general physicians without specialized imaging interpretation training. Deep learning technology enables lesion identification, detection, and segmentation, grading and scoring of prostate cancer, and prediction of postoperative recurrence and prognostic outcomes. The diagnostic accuracy of deep learning can be improved by optimizing models and algorithms, expanding medical database resources, and combining multi-omics data and comprehensive analysis of various morphological data. Deep learning has the potential to become the key diagnostic method in prostate cancer diagnosis and treatment in the future
A survey, review, and future trends of skin lesion segmentation and classification
The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or associated with manual inspection. This article aims to provide a comprehensive literature survey and review of a total of 594 publications (356 for skin lesion segmentation and 238 for skin lesion classification) published between 2011 and 2022. These articles are analyzed and summarized in a number of different ways to contribute vital information regarding the methods for the development of CAD systems. These ways include: relevant and essential definitions and theories, input data (dataset utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria. We intend to investigate a variety of performance-enhancing approaches, including ensemble and post-processing. We also discuss these dimensions to reveal their current trends based on utilization frequencies. In addition, we highlight the primary difficulties associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these difficulties. Findings, recommendations, and trends are disclosed to inform future research on developing an automated and robust CAD system for skin lesion analysis
Effective Approaches for Improving the Efficiency of Deep Convolutional Neural Networks for Image Classification
Aquesta tesi presenta dos mètodes per reduir el nombre de parà metres i cà lculs de punt flotant a arquitectures DCNN utilitzades amb classificació d'imatges. El primer mètode és una modificació de les primeres capes d‟una DCNN que divideix els canals d‟una imatge codificada amb l‟espai de color CIE Lab en dos camins separats, un per al canal acromà tic i un altre per a la resta de canals cromà tics. Modifiquem una arquitectura Inception V3 per incloure una branca especÃfica per a dades acromà tiques (canal L) i una altra branca especÃfica per a dades cromà tiques (canals AB). Aquesta modificació aprofita el desacoblament de la informació cromà tica i acromà tica. A més, la divisió de branques redueix el nombre de parà metres entrenables i la cà rrega de cà lcul fins a un 50% de les xifres originals a les capes modificades. Vam aconseguir una state-of-the-art precisió classificació de 99,48% a Plant Village. També trobem una millor fiabilitat en la classificació d'imatges quan les imatges d'entrada contenen soroll. A les DCNNs, el recompte de parà metres en convolucions puntuals creix rà pidament a causa de la multiplicació dels filtres i canals dentrada de la capa anterior. Per gestionar aquest creixement, el segon mètode d'optimització fa que les convolucions puntuals tinguin pocs parà metres mitjançant l'ús de branques paral·leles, on cada branca conté un grup de filtres i processa una fracció dels canals d'entrada. Per evitar degradar la capacitat daprenentatge de les DCNN, proposem intercalar la sortida dels filtres de branques paral·leles en capes intermèdies de convolucions puntuals successives. Provem la nostra optimització en un EfficientNet-B0 com a arquitectura de referència i realitzem proves de classificació als conjunts de dades CIFAR-10, Histologia del cà ncer colorectal i Malà ria. Per a cada conjunt de dades, la nostra optimització aconsegueix un estalvi del 76%, 89% i 91% de la quantitat de parà metres entrenables de EfficientNet-B0, mantenint la precisió de classificació.Esta tesis presenta dos métodos para reducir el número de parámetros y cálculos de punto flotante en arquitecturas DCNN utilizadas con clasificación de imágenes. El primer método es una modificación de las primeras capas de una DCNN que divide los canales de una imagen codificada con el espacio de color CIE Lab en dos caminos separados, uno para el canal acromático y otro para el resto de canales cromáticos. Modificamos una arquitectura Inception V3 para incluir una rama especÃfica para datos acromáticos (canal L) y otra rama especÃfica para datos cromáticos (canales AB). Esta modificación aprovecha el desacoplamiento de la información cromática y acromática. Además, la división de ramas reduce el número de parámetros entrenables y la carga de cálculo hasta en un 50% de las cifras originales en las capas modificadas. Logramos una state-of-the-art precisión clasificación de 99,48% en Plant Village. También encontramos una mejor confiabilidad en la clasificación de imágenes cuando las imágenes de entrada contienen ruido. En las DCNNs, el conteo de parámetros en convoluciones puntuales crece rápidamente debido a la multiplicación de los filtros y canales de entrada de la capa anterior. Para manejar este crecimiento, el segundo método de optimización hace que las convoluciones puntuales tengan pocos parametros mediante el empleo de ramas paralelas, donde cada rama contiene un grupo de filtros y procesa una fracción de los canales de entrada. Para evitar degradar la capacidad de aprendizaje de las DCNN, proponemos intercalar la salida de los filtros de ramas paralelas en capas intermedias de convoluciones puntuales sucesivas. Probamos nuestra optimización en un EfficientNet-B0 como arquitectura de referencia y realizamos pruebas de clasificación en los conjuntos de datos CIFAR-10, HistologÃa del cáncer colorrectal y Malaria. Para cada conjunto de datos, nuestra optimización logra un ahorro del 76 %, 89 % y 91 % de la cantidad de parámetros entrenables de EfficientNet-B0, manteniendo la precisión de clasificación.This thesis presents two methods for reducing the number of parameters and floating-point computations in existing DCNN architectures used with image classification. The first method is a modification of the first layers of a DCNN that splits the channels of an image encoded with CIE Lab color space in two separate paths, one for the achromatic channel and another for the remaining chromatic channels. We modified an Inception V3 architecture to include one branch specific for achromatic data (L channel) and another branch specific for chromatic data (AB channels). This modification takes advantage of the decoupling of chromatic and achromatic information. Besides, splitting branches reduces the number of trainable parameters and computation load by up to 50% of the original figures in the modified layers. We achieved a state-of-the-art classification accuracy of 99.48% on the Plant Village dataset. This two-branch method improves image classification reliability when the input images contain noise. In DCNNs, the parameter count in pointwise convolutions quickly grows due to the multiplication of the filters and input channels from the preceding layer. To handle this growth, the second optimization method makes pointwise convolutions parameter-efficient via parallel branching. Each branch contains a group of filters and processes a fraction of the input channels. To avoid degrading the learning capability of DCNNs, we propose interleaving the filters' output from separate branches at intermediate layers of successive pointwise convolutions. We tested our optimization on an EfficientNet-B0 as a baseline architecture and made classification tests on the CIFAR-10, Colorectal Cancer Histology, and Malaria datasets. For each dataset, our optimization saves 76%, 89%, and 91% of the number of trainable parameters of EfficientNet-B0, while keeping its test classification accuracy
Human pose and action recognition
This thesis focuses on detection of persons and pose recognition using neural networks.
The goal is to detect human body poses in a visual scene with multiple
persons and to use this information in order to recognize human activity. This is
achieved by rst detecting persons in a scene and then by estimating their body
joints in order to infer articulated poses.
The work developed in this thesis explored neural networks and deep learning
methods. Deep learning allows to employ computational models that are composed
of multiple processing layers to learn representations of data with multiple levels
of abstraction. These methods have greatly improved the state-of-the-art in many
domains such as speech recognition and visual object detection and classi cation.
Deep learning discovers intricate structure in data by using the backpropagation
algorithm to indicate how a machine should change its internal parameters that are
used to compute the representation in each layer from the representation provided
by the previous one.
Person detection, in general, is a di cult task due to a large variability of representation
due to di erent factors such as scales, views and occlusion. An object
detection framework based on multi-stage convolutional features for pedestrian detection
is proposed in this thesis. This framework extends the Fast R-CNN framework
for the combination of several convolutional features from di erent stages of
a CNN (Convolutional Neural Network) to improve the detector's accuracy. This
provides high quality detections of persons in a visual scene, which are then used
as input in conjunction with a human pose estimation model in order to estimate
human body joint locations of multiple persons in an image.
Human pose estimation is done by a deep convolutional neural network composed
of a series of residual auto-encoders. These produce multiple predictions which are
later combined to provide a heatmap prediction of human body joints. In this network
topology, features are processed across all scales capturing the various spatial
relationships associated with the body. Repeated bottom-up and top-down processing
with intermediate supervision for each auto-encoder network is applied. This
results in very accurate 2D heatmaps of body joint predictions.
The methods presented in this thesis were benchmarked against other topperforming
methods on popular datasets for human pedestrian and pose estimation,
achieving good results compared with other state-of-the-art algorithms.Esta tese foca a detec c~ao de pessoas e o reconhecimento de poses usando redes neuronais.
O objectivo e detectar poses humanas num ambiente (cena) com m ultiplas
pessoas e usar essa informa c~ao para reconhecer actividade humana. Isto e alcan cado
ao detectar, em primeiro lugar, pessoas numa cena e, seguidamente, estimar as suas
juntas corporais de modo a inferir poses articuladas.
O trabalho desenvolvido nesta tese explorou m etodos de redes neuronais e de
aprendizagem profunda. A aprendizagem profunda permite que modelos computacionais
compostos por m ultiplas camadas de processamento aprendam representa
c~oes de dados com m ultiplos n veis de abstra c~ao. Estes m etodos t^em drasticamente
melhorado o estado-da-arte em muitos dom nios como o reconhecimento
de fala e a classi ca c~ao e o reconhecimento de objectos visuais. A aprendizagem
profunda descobre estruturas intr nsecas em conjuntos de dados ao usar algoritmos
de propaga c~ao inversa (backpropagation) para indicar como uma m aquina deve alterar
os seus par^ametros internos que, por sua vez, s~ao usados para processar a
representa c~ao em cada camada a partir da representa c~ao da camada anterior.
A detec c~ao de pessoas em geral e uma tarefa dif cil dado a grande variabilidade de
representa c~oes devido a diferentes escalas, vistas e oclus~oes. Uma estrutura de detec
c~ao de objectos baseada em caracter sticas convolucionais de m ultiplos est agios
para a detec c~ao de pedestres e proposta nesta tese. Esta estrutura estende a estrutura
Fast R-CNN com a combina c~ao de v arias caracter sticas convolucionais de
diferentes est agios da CNN (Convolutional Neural Network) usada de modo a melhorar
a precis~ao do detector. Isto proporciona detec c~oes de pessoas com elevada
abilidade numa cena, que s~ao posteriormente conjuntamente usadas como entrada
no modelo de estima c~ao de poses humanas de modo a estimar a localiza c~ao de
articula c~oes humanas para a detec c~ao de m ultiplas pessoas numa imagem.
A estima c~ao de poses humanas e obtido atrav es de redes neuronais convolucionais
profundas que s~ao compostas por uma s erie de auto-codi cadores residuais que
fornecem m ultiplas previs~oes que s~ao, posteriormente, combinadas para fornecer
um \mapa de calor" de articula c~oes corporais. Nesta topologia de rede, as caracter
sticas da imagem s~ao processadas ao longo de v arias escalas, capturando as
v arias rela c~oes espaciais associadas com o corpo humano. Repetidos processos de
baixo-para-cima e de cima-para-baixo com supervis~ao interm edia para cada autocodi
cador s~ao aplicados. Isto resulta em mapas de calor 2D muito precisos de
estima c~oes de articula c~oes corporais de pessoas.
Os m etodos apresentados nesta tese foram comparados com outros m etodos de
alto desempenho em bases de dados de detec c~ao de pessoas e de reconhecimento de
poses humanas, alcan cando muito bons resultados comparando com outros algoritmos
do estado-da-arte