571 research outputs found

    An efficient hybrid method for 3D to 2D medical image registration

    Get PDF
    PURPOSE: The purpose of this paper is to present a method for registration of 3D computed tomography to 2D single-plane fluoroscopy knee images to provide 3D motion information for knee joints. This 3D kinematic information has unique utility for examining joint kinematics in conditions such as ligament injury, osteoarthritis and after joint replacement. METHODS: We proposed a non-invasive rigid body image registration method which is based on two different multimodal similarity measures. This hybrid registration method helps to achieve a trade-off among different challenges including, time complexity and accuracy. RESULTS: We performed a number of experiments to evaluate the performance of the proposed method. The experimental results show that the proposed method is as accurate as one of the most recent registration methods while it is several times faster than that method. CONCLUSION: The proposed method is a non-invasive, fast and accurate registration method, which can provide 3D information for knee joint kinematic measurements. This information can be very helpful in improving the accuracy of diagnosis and providing targeted treatment

    Multi-modal Image Registration

    Full text link
    In different areas, particularly medical image analysis, there is a vital need to access and analyse dynamic three dimensional (3D) images of the anatomical structures of the human body. This can enable specialists to track events as well as clinically conduct and evaluate surgical and radio therapeutical procedures. For example, measuring the 3D kinematics of knee joints in a dynamic manner is essential for understanding their normal functions and diagnosing any pathology, such as ligament injury and osteoarthritis. For evaluations of subsequent treatments, such as surgery and rehabilitation, and designs of joint replacements, having knowledge of the movements of knee joints is necessary. Image registration is increasingly being applied to medical image analysis. Whereas in mono-modal registration, the images to be registered are acquired by the same sensor, in multi-modal image registration, they can be taken from different devices or imaging protocols which makes this registration process much more challenging. The invasive or non-invasive nature of the registration method used, the computational time it requires as well as its accuracy and robustness against a large range of initial displacements are the most important features used for its evaluation. As currently available approaches have limited capabilities to register images with large initial displacements and are either not sufficiently accurate or very computationally expensive, the objective of this research is to propose new registration methods, that provide dynamic 3D images, to address these issues. In the first part of this study, I conducted research on registering an individuals’ natural knee bones that can provide 3D information of knee joint kinematics which can be very helpful for improving the accuracy of diagnosis and enabling targeted treatments. A fast, accurate and robust hybrid rigid body registration method based on two different multi-modal similarity measures, the edge position difference (EPD) and sum-of-conditional variance (SCV), is proposed. It uses a gradient descent optimisation technique to register multi-modal images and determine the best transformation parameters. It helps to achieve a trade-off among different challenges, including time complexity, accuracy and robustness against a large range of initial displacements. To evaluate it, several experiments were performed on two different databases: one collected from the knee bones of four patients and the other from three knee cadavers installed on a mechanical positioning system, with the results showing that this method is accurate, fast and robust against large initial displacement. Then, I conducted research on registering implanted human knee joints and proposed a non-invasive, robust 3D-to-2D registration method which can be used for 3D evaluations of the status of knee implants after joint replacement surgeries. In this method, 3D models of the implants for an individual with the relevant post-operative fluoroscopy frames are able to be used in the registration process. As a result, it is possible to perform 3D analysis at any time after a surgery by simply taking single-plane radiographs. This approach uses the EPD multi-modal similarity measure together with a steepest descent optimisation method. It applies coarse-to-fine registration steps to determine the transformation parameters that lead to the best alignment between the model used and X-ray images to be registered. The experimental results showed that not only does the proposed registration method have a high success rate but that it is also much faster than the most relevant competitive approach. Although the experiments were designed for a 3D analysis of total knee arthroplasty (TKA) components, this proposed method can be applied to other joints such as the ankle or hip. In the final part of my research, I developed a multi-frame 2D fluoroscopy to 3D model registration method for measuring the kinematics of post-operative knee joints. It uses a coarse-to-fine approach and applies the normalised EPD (NEPD) and SCV similarity measures together with a gradient descent optimisation method and an interpolation estimation one. In order to measure the kinematics of post- operative knee joints, after a TKA surgery, a 3D knee implant model can be registered with a number of single-plane fluoroscopy frames of the patient’s knee. Generally, when this number is quite high, the computational cost for registering the frames and a 3D model is expensive. Therefore, in order to speed up the registration process, a cubic spline interpolation prediction method is applied to initialise and estimate the 3D positions of the 3D model in each fluoroscopy frame instead of applying a registration algorithm on all the frames, one after the other. The estimated 3D positions are then tuned using a registration improvement step. The experimental results demonstrated that the proposed registration method is much faster than the best existing one and achieves almost the same accuracy. It also provides smooth registration results which can lead to more natural 3D modelling of joint movements

    3D-3D Deformable Registration and Deep Learning Segmentation based Neck Diseases Analysis in MRI

    Full text link
    Whiplash, cervical dystonia (CD), neck pain and work-related upper limb disorder (WRULD) are the most common diseases in the cervical region. Headaches, stiffness, sensory disturbance to the legs and arms, optical problems, aching in the back and shoulder, and auditory and visual problems are common symptoms seen in patients with these diseases. CD patients may also suffer tormenting spasticity in some neck muscles, with the symptoms possibly being acute and persisting for a long time, sometimes a lifetime. Whiplash-associated disorders (WADs) may occur due to sudden forward and backward movements of the head and neck occurring during a sporting activity or vehicle or domestic accident. These diseases affect private industries, insurance companies and governments, with the socio-economic costs significantly related to work absences, long-term sick leave, early disability and disability support pensions, health care expenses, reduced productivity and insurance claims. Therefore, diagnosing and treating neck-related diseases are important issues in clinical practice. The reason for these afflictions resulting from accident is the impairment of the cervical muscles which undergo atrophy or pseudo-hypertrophy due to fat infiltrating into them. These morphological changes have to be determined by identifying and quantifying their bio-markers before applying any medical intervention. Volumetric studies of neck muscles are reliable indicators of the proper treatments to apply. Radiation therapy, chemotherapy, injection of a toxin or surgery could be possible ways of treating these diseases. However, the dosages required should be precise because the neck region contains some sensitive organs, such as nerves, blood vessels and the trachea and spinal cord. Image registration and deep learning-based segmentation can help to determine appropriate treatments by analyzing the neck muscles. However, this is a challenging task for medical images due to complexities such as many muscles crossing multiple joints and attaching to many bones. Also, their shapes and sizes vary greatly across populations whereas their cross-sectional areas (CSAs) do not change in proportion to the heights and weights of individuals, with their sizes varying more significantly between males and females than ages. Therefore, the neck's anatomical variabilities are much greater than those of other parts of the human body. Some other challenges which make analyzing neck muscles very difficult are their compactness, similar gray-level appearances, intra-muscular fat, sliding due to cardiac and respiratory motions, false boundaries created by intramuscular fat, low resolution and contrast in medical images, noise, inhomogeneity and background clutter with the same composition and intensity. Furthermore, a patient's mode, position and neck movements during the capture of an image create variability. However, very little significant research work has been conducted on analyzing neck muscles. Although previous image registration efforts form a strong basis for many medical applications, none can satisfy the requirements of all of them because of the challenges associated with their implementation and low accuracy which could be due to anatomical complexities and variabilities or the artefacts of imaging devices. In existing methods, multi-resolution- and heuristic-based methods are popular. However, the above issues cause conventional multi-resolution-based registration methods to be trapped in local minima due to their low degrees of freedom in their geometrical transforms. Although heuristic-based methods are good at handling large mismatches, they require pre-segmentation and are computationally expensive. Also, current deformable methods often face statistical instability problems and many local optima when dealing with small mismatches. On the other hand, deep learning-based methods have achieved significant success over the last few years. Although a deeper network can learn more complex features and yields better performances, its depth cannot be increased as this would cause the gradient to vanish during training and result in training difficulties. Recently, researchers have focused on attention mechanisms for deep learning but current attention models face a challenge in the case of an application with compact and similar small multiple classes, large variability, low contrast and noise. The focus of this dissertation is on the design of 3D-3D image registration approaches as well as deep learning-based semantic segmentation methods for analyzing neck muscles. In the first part of this thesis, a novel object-constrained hierarchical registration framework for aligning inter-subject neck muscles is proposed. Firstly, to handle large-scale local minima, it uses a coarse registration technique which optimizes a new edge position difference (EPD) similarity measure to align large mismatches. Also, a new transformation based on the discrete periodic spline wavelet (DPSW), affine and free-form-deformation (FFD) transformations are exploited. Secondly, to avoid the monotonous nature of using transformations in multiple stages, affine registration technique, which uses a double-pushing system by changing the edges in the EPD and switching the transformation's resolutions, is designed to align small mismatches. The EPD helps in both the coarse and fine techniques to implement object-constrained registration via controlling edges which is not possible using traditional similarity measures. Experiments are performed on clinical 3D magnetic resonance imaging (MRI) scans of the neck, with the results showing that the EPD is more effective than the mutual information (MI) and the sum of squared difference (SSD) measures in terms of the volumetric dice similarity coefficient (DSC). Also, the proposed method is compared with two state-of-the-art approaches with ablation studies of inter-subject deformable registration and achieves better accuracy, robustness and consistency. However, as this method is computationally complex and has a problem handling large-scale anatomical variabilities, another 3D-3D registration framework with two novel contributions is proposed in the second part of this thesis. Firstly, a two-stage heuristic search optimization technique for handling large mismatches,which uses a minimal user hypothesis regarding these mismatches and is computationally fast, is introduced. It brings a moving image hierarchically closer to a fixed one using MI and EPD similarity measures in the coarse and fine stages, respectively, while the images do not require pre-segmentation as is necessary in traditional heuristic optimization-based techniques. Secondly, a region of interest (ROI) EPD-based registration framework for handling small mismatches using salient anatomical information (AI), in which a convex objective function is formed through a unique shape created from the desired objects in the ROI, is proposed. It is compared with two state-of-the-art methods on a neck dataset, with the results showing that it is superior in terms of accuracy and is computationally fast. In the last part of this thesis, an evaluation study of recent U-Net-based convolutional neural networks (CNNs) is performed on a neck dataset. It comprises 6 recent models, the U-Net, U-Net with a conditional random field (CRF-Unet), attention U-Net (A-Unet), nested U-Net or U-Net++, multi-feature pyramid (MFP)-Unet and recurrent residual U-Net (R2Unet) and 4 with more comprehensive modifications, the multi-scale U-Net (MS-Unet), parallel multi-scale U-Net (PMSUnet), recurrent residual attention U-Net (R2A-Unet) and R2A-Unet++ in neck muscles segmentation, with analyses of the numerical results indicating that the R2Unet architecture achieves the best accuracy. Also, two deep learning-based semantic segmentation approaches are proposed. In the first, a new two-stage U-Net++ (TS-UNet++) uses two different types of deep CNNs (DCNNs) rather than one similar to the traditional multi-stage method, with the U-Net++ in the first stage and the U-Net in the second. More convolutional blocks are added after the input and before the output layers of the multi-stage approach to better extract the low- and high-level features. A new concatenation-based fusion structure, which is incorporated in the architecture to allow deep supervision, helps to increase the depth of the network without accelerating the gradient-vanishing problem. Then, more convolutional layers are added after each concatenation of the fusion structure to extract more representative features. The proposed network is compared with the U-Net, U-Net++ and two-stage U-Net (TS-UNet) on the neck dataset, with the results indicating that it outperforms the others. In the second approach, an explicit attention method, in which the attention is performed through a ROI evolved from ground truth via dilation, is proposed. It does not require any additional CNN, as does a cascaded approach, to localize the ROI. Attention in a CNN is sensitive with respect to the area of the ROI. This dilated ROI is more capable of capturing relevant regions and suppressing irrelevant ones than a bounding box and region-level coarse annotation, and is used during training of any CNN. Coarse annotation, which does not require any detailed pixel wise delineation that can be performed by any novice person, is used during testing. This proposed ROI-based attention method, which can handle compact and similar small multiple classes with objects with large variabilities, is compared with the automatic A-Unet and U-Net, and performs best

    Localization of Autonomous Vehicles in Urban Environments

    Full text link
    The future of applications such as last-mile delivery, infrastructure inspection and surveillance bets big on employing small autonomous drones and ground robots in cluttered urban settings where precise positioning is critical. However, when navigating close to buildings, GPS-based localisation of robotic platforms is noisy due to obscured reception and multi-path reflection. Localisation methods using introspective sensors like monocular and stereo cameras mounted on the platforms offer a better alternative as they are suitable for both indoor and outdoor operations. However, the inherent drift in the estimated trajectory is often evident in the 7 degrees of freedom that captures scaling, rotation and translation motion, and needs to be corrected. The theme of the thesis is to use a pre-existing 3D model to supplement the pose estimation from a visual navigation system, reducing incremental drift and thereby improving localisation accuracy. The novel framework developed for the monocular camera first extracts the geometric relationship between the pixels of the calibrated camera and the 3D points on the model. These geometric constraints, when used in addition to the relative pose constraints typically used in Simultaneous Localisation and Mapping (SLAM) algorithms, provide superior trajectory estimation. Further, scale drift correction is proposed using a novel SIM3SIM_3 optimisation procedure and successfully demonstrated using a unique dataset that embodies many urban localisation challenges. Techniques developed for Stereo camera localisation aligns the textured 3D stereo scans with respect to a 3D model and estimates the associated camera pose. The idea is to solve the image registration problem between the projection of the 3D scan and images whose poses are accurately known with respect to the 3D model. The 2D motion parameters are then mapped to the 3D space for camera pose estimation. Novel image registration techniques are developed which use image edge information combined with traditional approaches to show successful results

    HUMAN FACE RECOGNITION BASED ON FRACTAL IMAGE CODING

    Get PDF
    Human face recognition is an important area in the field of biometrics. It has been an active area of research for several decades, but still remains a challenging problem because of the complexity of the human face. In this thesis we describe fully automatic solutions that can locate faces and then perform identification and verification. We present a solution for face localisation using eye locations. We derive an efficient representation for the decision hyperplane of linear and nonlinear Support Vector Machines (SVMs). For this we introduce the novel concept of ρ\rho and η\eta prototypes. The standard formulation for the decision hyperplane is reformulated and expressed in terms of the two prototypes. Different kernels are treated separately to achieve further classification efficiency and to facilitate its adaptation to operate with the fast Fourier transform to achieve fast eye detection. Using the eye locations, we extract and normalise the face for size and in-plane rotations. Our method produces a more efficient representation of the SVM decision hyperplane than the well-known reduced set methods. As a result, our eye detection subsystem is faster and more accurate. The use of fractals and fractal image coding for object recognition has been proposed and used by others. Fractal codes have been used as features for recognition, but we need to take into account the distance between codes, and to ensure the continuity of the parameters of the code. We use a method based on fractal image coding for recognition, which we call the Fractal Neighbour Distance (FND). The FND relies on the Euclidean metric and the uniqueness of the attractor of a fractal code. An advantage of using the FND over fractal codes as features is that we do not have to worry about the uniqueness of, and distance between, codes. We only require the uniqueness of the attractor, which is already an implied property of a properly generated fractal code. Similar methods to the FND have been proposed by others, but what distinguishes our work from the rest is that we investigate the FND in greater detail and use our findings to improve the recognition rate. Our investigations reveal that the FND has some inherent invariance to translation, scale, rotation and changes to illumination. These invariances are image dependent and are affected by fractal encoding parameters. The parameters that have the greatest effect on recognition accuracy are the contrast scaling factor, luminance shift factor and the type of range block partitioning. The contrast scaling factor affect the convergence and eventual convergence rate of a fractal decoding process. We propose a novel method of controlling the convergence rate by altering the contrast scaling factor in a controlled manner, which has not been possible before. This helped us improve the recognition rate because under certain conditions better results are achievable from using a slower rate of convergence. We also investigate the effects of varying the luminance shift factor, and examine three different types of range block partitioning schemes. They are Quad-tree, HV and uniform partitioning. We performed experiments using various face datasets, and the results show that our method indeed performs better than many accepted methods such as eigenfaces. The experiments also show that the FND based classifier increases the separation between classes. The standard FND is further improved by incorporating the use of localised weights. A local search algorithm is introduced to find a best matching local feature using this locally weighted FND. The scores from a set of these locally weighted FND operations are then combined to obtain a global score, which is used as a measure of the similarity between two face images. Each local FND operation possesses the distortion invariant properties described above. Combined with the search procedure, the method has the potential to be invariant to a larger class of non-linear distortions. We also present a set of locally weighted FNDs that concentrate around the upper part of the face encompassing the eyes and nose. This design was motivated by the fact that the region around the eyes has more information for discrimination. Better performance is achieved by using different sets of weights for identification and verification. For facial verification, performance is further improved by using normalised scores and client specific thresholding. In this case, our results are competitive with current state-of-the-art methods, and in some cases outperform all those to which they were compared. For facial identification, under some conditions the weighted FND performs better than the standard FND. However, the weighted FND still has its short comings when some datasets are used, where its performance is not much better than the standard FND. To alleviate this problem we introduce a voting scheme that operates with normalised versions of the weighted FND. Although there are no improvements at lower matching ranks using this method, there are significant improvements for larger matching ranks. Our methods offer advantages over some well-accepted approaches such as eigenfaces, neural networks and those that use statistical learning theory. Some of the advantages are: new faces can be enrolled without re-training involving the whole database; faces can be removed from the database without the need for re-training; there are inherent invariances to face distortions; it is relatively simple to implement; and it is not model-based so there are no model parameters that need to be tweaked

    Tools for discovering and characterizing extrasolar planets

    Get PDF
    Among the group of extrasolar planets, transiting planets provide a great opportunity to obtain direct measurements for the basic physical properties, such as mass and radius of these objects. These planets are therefore highly important in the understanding of the evolution and formation of planetary systems: from the observations of photometric transits, the interior structure of the planet and atmospheric properties can also be constrained. The most efficient way to search for transiting extrasolar planets is based on wide-field surveys by hunting for short and shallow periodic dips in light curves covering quite long time intervals. These surveys monitor fields with several degrees in diameter and tens or hundreds of thousands of objects simultaneously. In the practice of astronomical observations, surveys of large field-of-view are rather new and therefore require special methods for photometric data reduction that have not been used before. In this PhD thesis, I summarize my efforts related to the development of a complete software solution for high precision photometric reduction of astronomical images. I also demonstrate the role of this newly developed package and the related algorithms in the case of particular discoveries of the HATNet project. [abridged]Comment: PhD thesis, Eotvos Lorand University (June 18, 2009), 68 pages in journal style, 41 figures, 18 table

    Representation Learning with Adversarial Latent Autoencoders

    Get PDF
    A large number of deep learning methods applied to computer vision problems require encoder-decoder maps. These methods include, but are not limited to, self-representation learning, generalization, few-shot learning, and novelty detection. Encoder-decoder maps are also useful for photo manipulation, photo editing, superresolution, etc. Encoder-decoder maps are typically learned using autoencoder networks.Traditionally, autoencoder reciprocity is achieved in the image-space using pixel-wisesimilarity loss, which has a widely known flaw of producing non-realistic reconstructions. This flaw is typical for the Variational Autoencoder (VAE) family and is not only limited to pixel-wise similarity losses, but is common to all methods relying upon the explicit maximum likelihood training paradigm, as opposed to an implicit one. Likelihood maximization, coupled with poor decoder distribution leads to poor or blurry reconstructions at best. Generative Adversarial Networks (GANs) on the other hand, perform an implicit maximization of the likelihood by solving a minimax game, thus bypassing the issues derived from the explicit maximization. This provides GAN architectures with remarkable generative power, enabling the generation of high-resolution images of humans, which are indistinguishable from real photos to the naked eye. However, GAN architectures lack inference capabilities, which makes them unsuitable for training encoder-decoder maps, effectively limiting their application space.We introduce an autoencoder architecture that (a) is free from the consequences ofmaximizing the likelihood directly, (b) produces reconstructions competitive in quality with state-of-the-art GAN architectures, and (c) allows learning disentangled representations, which makes it useful in a variety of problems. We show that the proposed architecture and training paradigm significantly improves the state-of-the-art in novelty and anomaly detection methods, it enables novel kinds of image manipulations, and has significant potential for other applications
    corecore