675 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationImage segmentation entails the partitioning of an image domain, usually two or three dimensions, so that each partition or segment has some meaning that is relevant to the application at hand. Accurate image segmentation is a crucial challenge in many disciplines, including medicine, computer vision, and geology. In some applications, heterogeneous pixel intensities; noisy, ill-defined, or diffusive boundaries; and irregular shapes with high variability can make it challenging to meet accuracy requirements. Various segmentation approaches tackle such challenges by casting the segmentation problem as an energy-minimization problem, and solving it using efficient optimization algorithms. These approaches are broadly classified as either region-based or edge (surface)-based depending on the features on which they operate. The focus of this dissertation is on the development of a surface-based energy model, the design of efficient formulations of optimization frameworks to incorporate such energy, and the solution of the energy-minimization problem using graph cuts. This dissertation utilizes a set of four papers whose motivation is the efficient extraction of the left atrium wall from the late gadolinium enhancement magnetic resonance imaging (LGE-MRI) image volume. This dissertation utilizes these energy formulations for other applications, including contact lens segmentation in the optical coherence tomography (OCT) data and the extraction of geologic features in seismic data. Chapters 2 through 5 (papers 1 through 4) explore building a surface-based image segmentation model by progressively adding components to improve its accuracy and robustness. The first paper defines a parametric search space and its discrete formulation in the form of a multilayer three-dimensional mesh model within which the segmentation takes place. It includes a generative intensity model, and we optimize using a graph formulation of the surface net problem. The second paper proposes a Bayesian framework with a Markov random field (MRF) prior that gives rise to another class of surface nets, which provides better segmentation with smooth boundaries. The third paper presents a maximum a posteriori (MAP)-based surface estimation framework that relies on a generative image model by incorporating global shape priors, in addition to the MRF, within the Bayesian formulation. Thus, the resulting surface not only depends on the learned model of shapes,but also accommodates the test data irregularities through smooth deviations from these priors. Further, the paper proposes a new shape parameter estimation scheme, in closed form, for segmentation as a part of the optimization process. Finally, the fourth paper (under review at the time of this document) presents an extensive analysis of the MAP framework and presents improved mesh generation and generative intensity models. It also performs a thorough analysis of the segmentation results that demonstrates the effectiveness of the proposed method qualitatively, quantitatively, and clinically. Chapter 6, consisting of unpublished work, demonstrates the application of an MRF-based Bayesian framework to segment coupled surfaces of contact lenses in optical coherence tomography images. This chapter also shows an application related to the extraction of geological structures in seismic volumes. Due to the large sizes of seismic volume datasets, we also present fast, approximate surface-based energy minimization strategies that achieve better speed-ups and memory consumption

    Editing faces in videos

    Get PDF
    Editing faces in movies is of interest in the special effects industry. We aim at producing effects such as the addition of accessories interacting correctly with the face or replacing the face of a stuntman with the face of the main actor. The system introduced in this thesis is based on a 3D generative face model. Using a 3D model makes it possible to edit the face in the semantic space of pose, expression, and identity instead of pixel space, and due to its 3D nature allows a modelling of the light interaction. In our system we first reconstruct the 3D face, which is deforming because of expressions and speech, the lighting, and the camera in all frames of a monocular input video. The face is then edited by substituting expressions or identities with those of another video sequence or by adding virtual objects into the scene. The manipulated 3D scene is rendered back into the original video, correctly simulating the interaction of the light with the deformed face and virtual objects. We describe all steps necessary to build and apply the system. This includes registration of training faces to learn a generative face model, semi-automatic annotation of the input video, fitting of the face model to the input video, editing of the fit, and rendering of the resulting scene. While describing the application we introduce a host of new methods, each of which is of interest on its own. We start with a new method to register 3D face scans to use as training data for the face model. For video preprocessing a new interest point tracking and 2D Active Appearance Model fitting technique is proposed. For robust fitting we introduce background modelling, model-based stereo techniques, and a more accurate light model

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Automatic detection of drusen associated with age-related macular degeneration in optical coherence tomography: a graph-based approach

    Get PDF
    Tese de Doutoramento em Líderes para Indústrias TecnológicasThe age-related macular degeneration (AMD) starts to manifest itself with the appearance of drusen. Progressively, the drusen increase in size and in number without causing alterations to vision. Nonetheless, their quantification is important because it correlates with the evolution of the disease to an advanced stage, which could lead to the loss of central vision. Manual quantification of drusen is impractical, since it is time-consuming and it requires specialized knowledge. Therefore, this work proposes a method for quantifying drusen automatically In this work, it is proposed a method for segmenting boundaries limiting drusen and another method for locating them through classification. The segmentation method is based on a multiple surface framework that is adapted for segmenting the limiting boundaries of drusen: the inner boundary of the retinal pigment epithelium + drusen complex (IRPEDC) and the Bruch’s membrane (BM). Several segmentation methods have been considerably successful in segmenting layers of healthy retinas in optical coherence tomography (OCT) images. These methods were successful because they incorporate prior information and regularization. However, these factors have the side-effect of hindering the segmentation in regions of altered morphology that often occur in diseased retinas. The proposed segmentation method takes into account the presence of lesion related with AMD, i.e., drusen and geographic atrophies (GAs). For that, it is proposed a segmentation scheme that excludes prior information and regularization that is only valid for healthy regions. Even with this segmentation scheme, the prior information and regularization can still cause the oversmoothing of some drusen. To address this problem, it is also proposed the integration of local shape priors in the form of a sparse high order potentials (SHOPs) into the multiple surface framework. Drusen are commonly detected by thresholding the distance among the boundaries that limit drusen. This approach misses drusen or portions of drusen with a height below the threshold. To improve the detection of drusen, Dufour et al. [1] proposed a classification method that detects drusen using textural information. In this work, the method of Dufour et al. [1] is extended by adding new features and performing multi-label classification, which allow the individual detection of drusen when these occur in clusters. Furthermore, local information is incorporated into the classification by combining the classifier with a hidden Markov model (HMM). Both the segmentation and detections methods were evaluated in a database of patients with intermediate AMD. The results suggest that both methods frequently perform better than some methods present in the literature. Furthermore, the results of these two methods form drusen delimitations that are closer to expert delimitations than two methods of the literature.A degenerescência macular relacionada com a idade (DMRI) começa a manifestar-se com o aparecimento de drusas. Progressivamente, as drusas aumentam em tamanho e em número sem causar alterações à visão. Porém, a sua quantificação é importante porque está correlacionada com a evolução da doença para um estado avançado, levar à perda de visão central. A quantificação manual de drusas é impraticável, já que é demorada e requer conhecimento especializado. Por isso, neste trabalho é proposto um método para segmentar drusas automaticamente. Neste trabalho, é proposto um método para segmentar as fronteiras que limitam as drusas e outro método para as localizar através de classificação. O método de segmentação é baseado numa ”framework” de múltiplas superfícies que é adaptada para segmentar as fronteiras que limitam as drusas: a fronteira interior do epitélio pigmentar + complexo de drusas e a membrana de Bruch. Vários métodos de segmentação foram consideravelmente bem-sucedidos a segmentar camadas de retinas saudáveis em imagens de tomografia de coerência ótica. Estes métodos foram bem-sucedidos porque incorporaram informação prévia e regularização. Contudo, estes fatores têm como efeito secundário dificultar a segmentação em regiões onde a morfologia da retina está alterada devido a doenças. O método de segmentação proposto toma em consideração a presença de lesões relacionadas com DMRI, .i.e., drusas e atrofia geográficas. Para isso, é proposto um esquema de segmentação que exclui informação prévia e regularização que são válidas apenas em regiões saudáveis da retina. Mesmo com este esquema de segmentação, a informação prévia e a regularização podem causar a suavização excessiva de algumas drusas. Para tentar resolver este problema, também é proposta a integração de informação prévia local sob a forma de potenciais esparsos de ordem elevada na ”framework” multi-superfície. As drusas são usalmente detetadas por ”thresholding” da distância entre as fronteiras que limitam as drusas. Esta abordagem falha drusas ou porções de drusas abaixo do ”threshold”. Para melhorar a deteção de drusas, Dufour et al. [1] propuseram um método de classificação que deteta drusas usando informação de texturas. Neste trabalho, o método de Dufour et al. [1] é estendido, adicionando novas características e realizando uma classificação com múltiplas classes, o que permite a deteção individual de drusas em aglomerados. Além disso, é incorporada informação local na classificação, combinando o classificador com um modelo oculto de Markov. Ambos os métodos de segmentação e deteção foram avaliados numa base de dados de pacientes com DMRI intermédia. Os resultados sugerem que ambos os métodos obtêm frequentemente melhores resultados que alguns métodos descritos na literatura. Para além disso, os resultados destes dois métodos formam delimitações de drusas que estão mais próximas das delimitações dos especialistas que dois métodos da literatura.This work was supported by FCT with the reference project UID/EEA/04436/2013, by FEDER funds through the COMPETE 2020 – Programa Operacional Competitividade e Internacionalização (POCI) with the reference project POCI-01-0145-FEDER-006941. Furthermore, the Portuguese funding institution Fundação Calouste Gulbenkian has conceded me a Ph.D. grant for this work. For that, I wish to acknowledge this institution. Additionally, I want to thank one of its members, Teresa Burnay, for all her assistance with issues related with the grant, for believing that my work was worth supporting and for encouraging me to apply for the grant

    Real Time Sequential Non Rigid Structure from motion using a single camera

    Get PDF
    En la actualidad las aplicaciones que basan su funcionamiento en una correcta localización y reconstrucción dentro de un entorno real en 3D han experimentado un gran interés en los últimos años, tanto por la comunidad investigadora como por la industrial. Estas aplicaciones varían desde la realidad aumentada, la robótica, la simulación, los videojuegos, etc. Dependiendo de la aplicación y del nivel de detalle de la reconstrucción, se emplean diversos dispositivos, algunos específicos, más complejos y caros como las cámaras estéreo, cámara y profundidad (RGBD) con Luz estructurada y Time of Flight (ToF), así como láser y otros más avanzados. Para aplicaciones sencillas es suficiente con dispositivos de uso común, como los smartphones, en los que aplicando técnicas de visión artificial, se pueden obtener modelos 3D del entorno para, en el caso de la realidad aumentada, mostrar información aumentada en la ubicación seleccionada.En robótica, la localización y generación simultáneas de un mapa del entorno en 3D es una tarea fundamental para conseguir la navegación autónoma. Este problema se conoce en el estado del arte como Simultaneous Localization And Mapping (SLAM) o Structure from Motion (SfM). Para la aplicación de estas técnicas, el objeto no ha de cambiar su forma a lo largo del tiempo. La reconstrucción es unívoca salvo factor de escala en captura monocular sin referencia. Si la condición de rigidez no se cumple, es porque la forma del objeto cambia a lo largo del tiempo. El problema sería equivalente a realizar una reconstrucción por fotograma, lo cual no se puede hacer de manera directa, puesto que diferentes formas, combinadas con diferentes poses de cámara pueden dar proyecciones similares. Es por esto que el campo de la reconstrucción de objetos deformables es todavía un área en desarrollo. Los métodos de SfM se han adaptado aplicando modelos físicos, restricciones temporales, espaciales, geométricas o de otros tipos para reducir la ambigüedad en las soluciones, naciendo así las técnicas conocidas como Non-Rigid SfM (NRSfM).En esta tesis se propone partir de una técnica de reconstrucción rígida bien conocida en el estado del arte como es PTAM (Parallel Tracking and Mapping) y adaptarla para incluir técnicas de NRSfM, basadas en modelo de bases lineales para estimar las deformaciones del objeto modelado dinámicamente y aplicar restricciones temporales y espaciales para mejorar las reconstrucciones, además de ir adaptándose a cambios de deformación que se presenten en la secuencia. Para ello, hay que realizar cambios de manera que cada uno de sus hilos de ejecución procesen datos no rígidos.El hilo encargado del seguimiento ya realizaba seguimiento basado en un mapa de puntos 3D, proporcionado a priori. La modificación más importante aquí es la integración de un modelo de deformación lineal para que se realice el cálculo de la deformación del objeto en tiempo real, asumiendo fijas las formas básicas de deformación. El cálculo de la pose de la cámara está basado en el sistema de estimación rígido, por lo que la estimación de pose y coeficientes de deformación se hace de manera alternada usando el algoritmo E-M (Expectation-Maximization). También, se imponen restricciones temporales y de forma para restringir las ambigüedades inherentes en las soluciones y mejorar la calidad de la estimación 3D.Respecto al hilo que gestiona el mapa, se actualiza en función del tiempo para que sea capaz de mejorar las bases de deformación cuando éstas no son capaces de explicar las formas que se ven en las imágenes actuales. Para ello, se sustituye la optimización de modelo rígido incluida en este hilo por un método de procesamiento exhaustivo NRSfM, para mejorar las bases acorde a las imágenes con gran error de reconstrucción desde el hilo de seguimiento. Con esto, el modelo se consigue adaptar a nuevas deformaciones, permitiendo al sistema evolucionar y ser estable a largo plazo.A diferencia de una gran parte de los métodos de la literatura, el sistema propuesto aborda el problema de la proyección perspectiva de forma nativa, minimizando los problemas de ambigüedad y de distancia al objeto existente en la proyección ortográfica. El sistema propuesto maneja centenares de puntos y está preparado para cumplir con restricciones de tiempo real para su aplicación en sistemas con recursos hardware limitados

    Deep Learning Approaches to Grasp Synthesis: A Review

    Get PDF
    Grasping is the process of picking up an object by applying forces and torques at a set of contacts. Recent advances in deep learning methods have allowed rapid progress in robotic object grasping. In this systematic review, we surveyed the publications over the last decade, with a particular interest in grasping an object using all six degrees of freedom of the end-effector pose. Our review found four common methodologies for robotic grasping: sampling-based approaches, direct regression, reinforcement learning, and exemplar approaches In addition, we found two “supporting methods” around grasping that use deep learning to support the grasping process, shape approximation, and affordances. We have distilled the publications found in this systematic review (85 papers) into ten key takeaways we consider crucial for future robotic grasping and manipulation research
    corecore