939 research outputs found
Deep Learning for Task-Based Image Quality Assessment in Medical Imaging
It has been advocated to use objective measures of image quality (IQ) for assessing and optimizing medical imaging systems. Objective measures of IQ quantify the performance of an observer at a specific diagnostic task. Binary signal detection tasks and joint signal detection and localization (detection-localization) tasks are commonly considered in medical imaging. When optimizing imaging systems for binary signal detection tasks, the performance of the Bayesian Ideal Observer (IO) has been advocated for use as a figure-of-merit (FOM). The IO maximizes the observer performance that is summarized by the receiver operating characteristic (ROC) curve. When signal detection-localization tasks are considered, the IO that implements a modified generalized likelihood ratio test (MGLRT) maximizes the observer performance as measured by the localization ROC (LROC) curve. However, computation of the IO test statistic generally is analytically intractable. To address this difficulty, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been proposed. However, current applications of MCMC methods have been limited to relatively simple stochastic object models (SOMs). When the IO is difficult or intractable to compute, the optimal linear observer, known as the Hotelling Observer (HO), can be employed to evaluate objective measures of IQ. Although computation of the HO is easier than that of the IO, it can still be challenging or even intractable because a potentially large covariance matrix needs to be estimated and subsequently inverted. In the first part of the dissertation, we introduce supervised learning-based methods for approximating the IO and the HO for binary signal detection tasks. The use of convolutional neural networks (CNNs) to approximate the IO and the use of single layer neural networks (SLNNs) to directly estimate the Hotelling template without computing and inverting covariance matrices are demonstrated. In the second part, a supervised learning method that employs CNNs to approximate the IO for signal detection-localization tasks is presented. This method represents a deep-learning-based implementation of a MGLRT that defines the IO decision strategy for signal detection-localization tasks. When evaluating observer performance for assessing and optimizing imaging systems by use of objective measures of IQ, all sources of variability in the measured image data should be accounted for. One important source of variability that can significantly affect observer performance is the variation in the ensemble of objects to-be-imaged. To describe this variability, a SOM can be established. A SOM is a generative model that can produce an ensemble of simulated objects with prescribed statistical properties. In order to establish a realistic SOM, it is desirable to use experimental data. Generative adversarial networks (GANs) hold great potential for establishing SOMs. However, images produced by imaging systems are affected by the measurement noise and a potential reconstruction process. Therefore, GANs that are trained by use of these images cannot represent SOMs because they are not established to learn object variability alone. An augmented GAN architecture named AmbientGAN that includes a measurement operator was proposed to address this issue. However, AmbientGANs cannot be immediately implemented with advanced GAN training strategies such as progressive growing of GANs (ProGANs). Therefore, the ability of AmbientGANs to establish realistic and sophisticated SOMs is limited. In the third part of this dissertation, we propose a novel deep learning method named progressively growing AmbientGANs (ProAmGANs) that incorporates the advanced progressive growing training procedure and therefore enables the AmbientGAN to be applied to realistically sized medical image data. Stylized numerical studies involving a variety of object ensembles with common medical imaging modalities are presented. Finally, a novel sampling-based method named MCMC-GAN is developed to approximate the IO. This method applies MCMC algorithms to SOMs that are established by use of GAN techniques. Because the implementation of GANs is general and not limited to specific images, our proposed method can be implemented with sophisticated object models and therefore extends the domain of applicability of the MCMC techniques. Numerical studies involving clinical brain positron emission tomography (PET) images and brain magnetic resonance (MR) images are presented
On the impact of incorporating task-information in learning-based image denoising
A variety of deep neural network (DNN)-based image denoising methods have
been proposed for use with medical images. These methods are typically trained
by minimizing loss functions that quantify a distance between the denoised
image, or a transformed version of it, and the defined target image (e.g., a
noise-free or low-noise image). They have demonstrated high performance in
terms of traditional image quality metrics such as root mean square error
(RMSE), structural similarity index measure (SSIM), or peak signal-to-noise
ratio (PSNR). However, it has been reported recently that such denoising
methods may not always improve objective measures of image quality. In this
work, a task-informed DNN-based image denoising method was established and
systematically evaluated. A transfer learning approach was employed, in which
the DNN is first pre-trained by use of a conventional (non-task-informed) loss
function and subsequently fine-tuned by use of the hybrid loss that includes a
task-component. The task-component was designed to measure the performance of a
numerical observer (NO) on a signal detection task. The impact of network depth
and constraining the fine-tuning to specific layers of the DNN was explored.
The task-informed training method was investigated in a stylized low-dose X-ray
computed tomography (CT) denoising study for which binary signal detection
tasks under signal-known-statistically (SKS) with
background-known-statistically (BKS) conditions were considered. The impact of
changing the specified task at inference time to be different from that
employed for model training, a phenomenon we refer to as "task-shift", was also
investigated. The presented results indicate that the task-informed training
method can improve observer performance while providing control over the trade
off between traditional and task-based measures of image quality
Multi-sensor data fusion techniques for RPAS detect, track and avoid
Accurate and robust tracking of objects is of growing interest amongst the computer vision scientific community. The ability of a multi-sensor system to detect and track objects, and accurately predict their future trajectory is critical in the context of mission- and safety-critical applications. Remotely Piloted Aircraft System (RPAS) are currently not equipped to routinely access all classes of airspace since certified Detect-and-Avoid (DAA) systems are yet to be developed. Such capabilities can be achieved by incorporating both cooperative and non-cooperative DAA functions, as well as providing enhanced communications, navigation and surveillance (CNS) services. DAA is highly dependent on the performance of CNS systems for Detection, Tacking and avoiding (DTA) tasks and maneuvers. In order to perform an effective detection of objects, a number of high performance, reliable and accurate avionics sensors and systems are adopted including non-cooperative sensors (visual and thermal cameras, Laser radar (LIDAR) and acoustic sensors) and cooperative systems (Automatic Dependent Surveillance-Broadcast (ADS-B) and Traffic Collision Avoidance System (TCAS)). In this paper the sensors and system information candidates are fully exploited in a Multi-Sensor Data Fusion (MSDF) architecture. An Unscented Kalman Filter (UKF) and a more advanced Particle Filter (PF) are adopted to estimate the state vector of the objects based for maneuvering and non-maneuvering DTA tasks. Furthermore, an artificial neural network is conceptualised/adopted to exploit the use of statistical learning methods, which acts to combined information obtained from the UKF and PF. After describing the MSDF architecture, the key mathematical models for data fusion are presented. Conceptual studies are carried out on visual and thermal image fusion architectures
UNet and MobileNet CNN-based model observers for CT protocol optimization: comparative performance evaluation by means of phantom CT images
Purpose: The aim of this work is the development and characterization of a model observer (MO) based on convolutional neural networks (CNNs), trained to mimic human observers in image evaluation in terms of detection and localization of low-contrast objects in CT scans acquired on a reference phantom. The final goal is automatic image quality evaluation and CT protocol optimization to fulfill the ALARA principle. Approach: Preliminary work was carried out to collect localization confidence ratings of human observers for signal presence/absence from a dataset of 30,000 CT images acquired on a PolyMethyl MethAcrylate phantom containing inserts filled with iodinated contrast media at different concentrations. The collected data were used to generate the labels for the training of the artificial neural networks. We developed and compared two CNN architectures based respectively on Unet and MobileNetV2, specifically adapted to achieve the double tasks of classification and localization. The CNN evaluation was performed by computing the area under localization-ROC curve (LAUC) and accuracy metrics on the test dataset. Results: The mean of absolute percentage error between the LAUC of the human observer and MO was found to be below 5% for the most significative test data subsets. An elevated inter-rater agreement was achieved in terms of S-statistics and other common statistical indices. Conclusions: Very good agreement was measured between the human observer and MO, as well as between the performance of the two algorithms. Therefore, this work is highly supportive of the feasibility of employing CNN-MO combined with a specifically designed phantom for CT protocol optimization programs
A Machine Learning Approach to Indoor Localization Data Mining
Indoor positioning systems are increasingly commonplace in various environments and
produce large quantities of data. They are used in industrial applications, robotics,
asset and employee tracking just to name a few use cases. The growing amount of data
and the accelerating progress of machine learning opens up many new possibilities for
analyzing this data in ways that were not conceivable or relevant before. This paper
introduces connected concepts and implementations to answer question how this data
can be utilized. Data gathered in this thesis originates from an indoor positioning system
deployed in retail environment, but the discussed methods can be applied generally.
The issue will be approached by first introducing the concept of machine learning
and more generally, artificial intelligence, and how they work on a general level. A
deeper dive is done to subfields and algorithms that are relevant to the data mining task
at hand. Indoor positioning system basics are also shortly discussed to create a base understanding
on the realistic capabilities and constraints that these kinds of systems encase.
These methods and previous knowledge from literature are put to test with the
freshly gathered data. An algorithm based on existing example from literature was tested
and improved upon with the new data. A novel method to cluster and classify movement
patterns was introduced, utilizing deep learning to create embedded representations of the
trajectories in a more complex learning pipeline. This type of learning is often referred
to as deep clustering.
The results are promising and both of the methods produce useful high level representations
of the complex dataset that can help a human operator to discern the
relevant patterns from raw data and to be used as an input for subsequent supervised and
unsupervised learning steps. Several factors related to optimizing the learning pipeline,
such as regularization were also researched and the results presented as visualizations.
The research found that pipeline consisting of CNN-autoencoder followed by a classic
clustering algorithm such as DBSCAN produces useful results in the form of trajectory
clusters. Regularization such as L1 regression improves this performance.
The research done in this paper presents useful algorithms for processing raw, noisy
localization data from indoor environments that can be used for further implementations
in both industrial applications and academia
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma serÃa una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancÃas. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehÃculos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnologÃa aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehÃculos puedan entender el entorno, detectando los elementos presentes, asà como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehÃculos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafÃos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. EspecÃficamente, en el CapÃtulo 2 detectamos vehÃculos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el CapÃtulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los dÃas posteriores, como para el seguimiento de vehÃculos. Desde la perspectiva dinámica, en el CapÃtulo 4 aprendemos de la nube de puntos en 3D una caracterÃstica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el CapÃtulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el CapÃtulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehÃculos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trà nsit al món, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible aixà reduir la densitat del trà nsit, la contaminació i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automà tic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessà ria, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, aixà com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva està tica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinà miques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinà mics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al CapÃtol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. Més endavant, al CapÃtol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinà mica, al CapÃtol 4 aprenem una nova
caracterÃstica dinà mica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al CapÃtol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al CapÃtol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
- …