26 research outputs found

    Machine Learning in Medical Image Analysis

    Get PDF
    Machine learning is playing a pivotal role in medical image analysis. Many algorithms based on machine learning have been applied in medical imaging to solve classification, detection, and segmentation problems. Particularly, with the wide application of deep learning approaches, the performance of medical image analysis has been significantly improved. In this thesis, we investigate machine learning methods for two key challenges in medical image analysis: The first one is segmentation of medical images. The second one is learning with weak supervision in the context of medical imaging. The first main contribution of the thesis is a series of novel approaches for image segmentation. First, we propose a framework based on multi-scale image patches and random forests to segment small vessel disease (SVD) lesions on computed tomography (CT) images. This framework is validated in terms of spatial similarity, estimated lesion volumes, visual score ratings and was compared with human experts. The results showed that the proposed framework performs as well as human experts. Second, we propose a generic convolutional neural network (CNN) architecture called the DRINet for medical image segmentation. The DRINet approach is robust in three different types of segmentation tasks, which are multi-class cerebrospinal fluid (CSF) segmentation on brain CT images, multi-organ segmentation on abdomen CT images, and multi-class tumour segmentation on brain magnetic resonance (MR) images. Finally, we propose a CNN-based framework to segment acute ischemic lesions on diffusion weighted (DW)-MR images, where the lesions are highly variable in terms of position, shape, and size. Promising results were achieved on a large clinical dataset. The second main contribution of the thesis is two novel strategies for learning with weak supervision. First, we propose a novel strategy called context restoration to make use of the images without annotations. The context restoration strategy is a proxy learning process based on the CNN, which extracts semantic features from images without using annotations. It was validated on classification, localization, and segmentation problems and was superior to existing strategies. Second, we propose a patch-based framework using multi-instance learning to distinguish normal and abnormal SVD on CT images, where there are only coarse-grained labels available. Our framework was observed to work better than classic methods and clinical practice.Open Acces

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Leveraging audio-visual speech effectively via deep learning

    Get PDF
    The rising popularity of neural networks, combined with the recent proliferation of online audio-visual media, has led to a revolution in the way machines encode, recognize, and generate acoustic and visual speech. Despite the ubiquity of naturally paired audio-visual data, only a limited number of works have applied recent advances in deep learning to leverage the duality between audio and video within this domain. This thesis considers the use of neural networks to learn from large unlabelled datasets of audio-visual speech to enable new practical applications. We begin by training a visual speech encoder that predicts latent features extracted from the corresponding audio on a large unlabelled audio-visual corpus. We apply the trained visual encoder to improve performance on lip reading in real-world scenarios. Following this, we extend the idea of video learning from audio by training a model to synthesize raw speech directly from raw video, without the need for text transcriptions. Remarkably, we find that this framework is capable of reconstructing intelligible audio from videos of new, previously unseen speakers. We also experiment with a separate speech reconstruction framework, which leverages recent advances in sequence modeling and spectrogram inversion to improve the realism of the generated speech. We then apply our research in video-to-speech synthesis to advance the state-of-the-art in audio-visual speech enhancement, by proposing a new vocoder-based model that performs particularly well under extremely noisy scenarios. Lastly, we aim to fully realize the potential of paired audio-visual data by proposing two novel frameworks that leverage acoustic and visual speech to train two encoders that learn from each other simultaneously. We leverage these pre-trained encoders for deepfake detection, speech recognition, and lip reading, and find that they consistently yield improvements over training from scratch.Open Acces

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Biomedical signal analysis in automatic classification problems

    Full text link
    A lo largo de la última década hemos asistido a un desarrollo sin precedentes de las tecnologías de la salud. Los avances en la informatización, la creación de redes, las técnicas de imagen, la robótica, las micro/nano tecnologías, y la genómica, han contribuido a aumentar significativamente la cantidad y diversidad de información al alcance del personal clínico para el diagnóstico, pronóstico, tratamiento y seguimiento de los pacientes. Este aumento en la cantidad y diversidad de datos clínicos requiere del continuo desarrollo de técnicas y metodologías capaces de integrar estos datos, procesarlos, y dar soporte en su interpretación de una forma robusta y eficiente. En este contexto, esta Tesis se focaliza en el análisis y procesado de señales biomédicas y su uso en problemas de clasificación automática. Es decir, se focaliza en: el diseño e integración de algoritmos para el procesado automático de señales biomédicas, el desarrollo de nuevos métodos de extracción de características para señales, la evaluación de compatibilidad entre señales biomédicas, y el diseño de modelos de clasificación para problemas clínicos específicos. En la mayoría de casos contenidos en esta Tesis, estos problemas se sitúan en el ámbito de los sistemas de apoyo a la decisión clínica, es decir, de sistemas computacionales que proporcionan conocimiento experto para la decisión en el diagnóstico, pronóstico y tratamiento de los pacientes. Una de las principales contribuciones de esta tesis consiste en la evaluación de la compatibilidad entre espectros de resonancia magnética (ERM) obtenidos mediante dos tecnologías de escáneres de resonancia magnética coexistentes en la actualidad (escáneres de 1.5T y de 3T). Esta compatibilidad se evalúa en el contexto de clasificación automática de tumores cerebrales. Los resultados obtenidos en este trabajo sugieren que los clasificadores existentes basados en datos de ERM de 1.5T pueden ser aplicables a casos obtenidos con la nueva tecnologFuster García, E. (2012). Biomedical signal analysis in automatic classification problems [Tesis doctoral]. Editorial Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17176Palanci

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Determinants of supply chain structure

    Get PDF
    This dissertation is a contribution to the study of manufacturing subcontracting, with particular reference to the European Automotive industrial sector. It takes as its central theme, the structure of supply chains - the way in which value addition is split amongst members of the chain. The thesis addresses a central question: What factors determine optimum structure and practice in modem-day industrial supply chains? This devolves into a number of derivative questions to which various parts of the study are addressed. With reference to 24 case study supply chains the investigation first tests whether existing theory can fully explain the changing structures. From the results of these tests a new model is postulated and then further work is carried out to validate the model. It was found that the concentration in existing theory on primarily dyadic relationships meant that when taken alone, current theory was insufficient to explain the changes in supply chain structure in the European automotive industry in the mid to late 1990s. It is felt that the work is novel in that it addresses the whole supply chain, and demonstrates the clear link between the physical structure and other determining success factors. Two methods for recording and systematically comparing both the structure and management practices in supply chains were developed - termed 'Fixed Reference Benchmark' and 'Hierarchical Structure Mapping'. These two models were tested, and used in the comparison of 24 European automotive supply chains. The results of this analysis showed the dominant factors that most heavily influenced the structure of supply chains in the European Automotive Industry to be: Criticality of component (which in turn affects the acceptability of risk), the level, and pace of development of technology for the component or system of the supply chain (which is strongly linked to bargaining power), the desire to reduce the complexity of logistics (which is also linked to acceptability of risk), the desire to reduce the cost of demand fluctuations, and the capital intensity of the production process. It is felt that this study of supply chain structures is valuable in its contribution to new knowledge on three levels. At a theoretical level, it analyses the current theory, exposing gaps and anomalies. At an empirical level it presents contemporary data that in some parts simply substantiates and in others adds to the current theory. On a practical level it aims to present a picture which is of use to practitioners making decisions on the future of individual supply chains
    corecore