60 research outputs found

    Measurment of spatial orientation using a biologically plausible gradient model

    Get PDF
    A Thesis submitted for the degree of Doctor of Philosophy

    Towards accurate multi-person pose estimation in the wild

    Get PDF
    In this thesis we are concerned with the problem of articulated human pose estimation and pose tracking in images and video sequences. Human pose estimation is a task of localising major joints of a human skeleton in natural images and is one of the most important visual recognition tasks in the scenes containing humans with numerous applications in robotics, virtual and augmented reality, gaming and healthcare among others. Articulated human pose tracking requires tracking multiple persons in the video sequence while simultaneously estimating full body poses. This task is important for analysing surveillance footage, activity recognition, sports analytics, etc. Most of the prior work focused on the pose estimation of single pre-localised humans whereas here we address a case with multiple people in real world images which entails several challenges such as person-person overlaps in highly crowded scenes, unknown number of people or people entering and leaving video sequences. The first contribution is a multi-person pose estimation algorithm based on the bottom-up detection-by-grouping paradigm. Unlike the widespread top-down approaches our method detects body joints and pairwise relations between them in a single forward pass of a convolutional neural network. Multi-person parsing is performed by optimizing a joint objective based on a multicut graph partitioning framework. Secondly, we extend our pose estimation approach to articulated multi-person pose tracking in videos. Our approach performs multi-target tracking and pose estimation in a holistic manner by optimising a single objective. We further simplify and refine the formulation which allows us to reach close to the real-time performance. Thirdly, we propose a large scale dataset and a benchmark for articulated multi-person tracking. It is the first dataset of video sequences comprising complex multi-person scenes and fully annotated tracks with 2D keypoints. Our fourth contribution is a method for estimating 3D body pose using on-body wearable cameras. Our approach uses a pair of downward facing, head-mounted cameras and captures an entire body. This egocentric approach is free of limitations of traditional setups with external cameras and can estimate body poses in very crowded environments. Our final contribution goes beyond human pose estimation and is in the field of deep learning of 3D object shapes. In particular, we address the case of reconstructing 3D objects from weak supervision. Our approach represents objects as 3D point clouds and is able to learn them with 2D supervision only and without requiring camera pose information at training time. We design a differentiable renderer of point clouds as well as a novel loss formulation for dealing with camera pose ambiguity.In dieser Arbeit behandeln wir das Problem der Schätzung und Verfolgung artikulierter menschlicher Posen in Bildern und Video-Sequenzen. Die Schätzung menschlicher Posen besteht darin die Hauptgelenke des menschlichen Skeletts in natürlichen Bildern zu lokalisieren und ist eine der wichtigsten Aufgaben der visuellen Erkennung in Szenen, die Menschen beinhalten. Sie hat zahlreiche Anwendungen in der Robotik, virtueller und erweiterter Realität, in Videospielen, in der Medizin und weiteren Bereichen. Die Verfolgung artikulierter menschlicher Posen erfordert die Verfolgung mehrerer Personen in einer Videosequenz bei gleichzeitiger Schätzung vollständiger Körperhaltungen. Diese Aufgabe ist besonders wichtig für die Analyse von Video-Überwachungsaufnahmen, Aktivitätenerkennung, digitale Sportanalyse etc. Die meisten vorherigen Arbeiten sind auf die Schätzung einzelner Posen vorlokalisierter Menschen fokussiert, wohingegen wir den Fall mehrerer Personen in natürlichen Aufnahmen betrachten. Dies bringt einige Herausforderungen mit sich, wie die Überlappung verschiedener Personen in dicht gedrängten Szenen, eine unbekannte Anzahl an Personen oder Personen die das Sichtfeld der Video-Sequenz verlassen oder betreten. Der erste Beitrag ist ein Algorithmus zur Schätzung der Posen mehrerer Personen, welcher auf dem Paradigma der Erkennung durch Gruppierung aufbaut. Im Gegensatz zu den verbreiteten Verfeinerungs-Ansätzen erkennt unsere Methode Körpergelenke and paarweise Beziehungen zwischen ihnen in einer einzelnen Vorwärtsrechnung eines faltenden neuronalen Netzwerkes. Die Gliederung in mehrere Personen erfolgt durch Optimierung einer gemeinsamen Zielfunktion, die auf dem Mehrfachschnitt-Problem in der Graphenzerlegung basiert. Zweitens erweitern wir unseren Ansatz zur Posen-Bestimmung auf das Verfolgen mehrerer Personen und deren Artikulation in Videos. Unser Ansatz führt eine Verfolgung mehrerer Ziele und die Schätzung der zugehörigen Posen in ganzheitlicher Weise durch, indem eine einzelne Zielfunktion optimiert wird. Desweiteren vereinfachen und verfeinern wir die Formulierung, was unsere Methode nah an Echtzeit-Leistung bringt. Drittens schlagen wir einen großen Datensatz und einen Bewertungsmaßstab für die Verfolgung mehrerer artikulierter Personen vor. Dies ist der erste Datensatz der Video-Sequenzen von komplexen Szenen mit mehreren Personen beinhaltet und deren Spuren komplett mit zwei-dimensionalen Markierungen der Schlüsselpunkte versehen sind. Unser vierter Beitrag ist eine Methode zur Schätzung von drei-dimensionalen Körperhaltungen mittels am Körper tragbarer Kameras. Unser Ansatz verwendet ein Paar nach unten gerichteter, am Kopf befestigter Kameras und erfasst den gesamten Körper. Dieser egozentrische Ansatz ist frei von jeglichen Limitierungen traditioneller Konfigurationen mit externen Kameras und kann Körperhaltungen in sehr dicht gedrängten Umgebungen bestimmen. Unser letzter Beitrag geht über die Schätzung menschlicher Posen hinaus in den Bereich des tiefen Lernens der Gestalt von drei-dimensionalen Objekten. Insbesondere befassen wir uns mit dem Fall drei-dimensionale Objekte unter schwacher Überwachung zu rekonstruieren. Unser Ansatz repräsentiert Objekte als drei-dimensionale Punktwolken and ist im Stande diese nur mittels zwei-dimensionaler Überwachung und ohne Informationen über die Kamera-Ausrichtung zur Trainingszeit zu lernen. Wir entwerfen einen differenzierbaren Renderer für Punktwolken sowie eine neue Formulierung um mit uneindeutigen Kamera-Ausrichtungen umzugehen

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Digital Image Processing

    Get PDF
    Newspapers and the popular scientific press today publish many examples of highly impressive images. These images range, for example, from those showing regions of star birth in the distant Universe to the extent of the stratospheric ozone depletion over Antarctica in springtime, and to those regions of the human brain affected by Alzheimer’s disease. Processed digitally to generate spectacular images, often in false colour, they all make an immediate and deep impact on the viewer’s imagination and understanding. Professor Jonathan Blackledge’s erudite but very useful new treatise Digital Image Processing: Mathematical and Computational Methods explains both the underlying theory and the techniques used to produce such images in considerable detail. It also provides many valuable example problems - and their solutions - so that the reader can test his/her grasp of the physical, mathematical and numerical aspects of the particular topics and methods discussed. As such, this magnum opus complements the author’s earlier work Digital Signal Processing. Both books are a wonderful resource for students who wish to make their careers in this fascinating and rapidly developing field which has an ever increasing number of areas of application. The strengths of this large book lie in: • excellent explanatory introduction to the subject; • thorough treatment of the theoretical foundations, dealing with both electromagnetic and acoustic wave scattering and allied techniques; • comprehensive discussion of all the basic principles, the mathematical transforms (e.g. the Fourier and Radon transforms), their interrelationships and, in particular, Born scattering theory and its application to imaging systems modelling; discussion in detail - including the assumptions and limitations - of optical imaging, seismic imaging, medical imaging (using ultrasound), X-ray computer aided tomography, tomography when the wavelength of the probing radiation is of the same order as the dimensions of the scatterer, Synthetic Aperture Radar (airborne or spaceborne), digital watermarking and holography; detail devoted to the methods of implementation of the analytical schemes in various case studies and also as numerical packages (especially in C/C++); • coverage of deconvolution, de-blurring (or sharpening) an image, maximum entropy techniques, Bayesian estimators, techniques for enhancing the dynamic range of an image, methods of filtering images and techniques for noise reduction; • discussion of thresholding, techniques for detecting edges in an image and for contrast stretching, stochastic scattering (random walk models) and models for characterizing an image statistically; • investigation of fractal images, fractal dimension segmentation, image texture, the coding and storing of large quantities of data, and image compression such as JPEG; • valuable summary of the important results obtained in each Chapter given at its end; • suggestions for further reading at the end of each Chapter. I warmly commend this text to all readers, and trust that they will find it to be invaluable. Professor Michael J Rycroft Visiting Professor at the International Space University, Strasbourg, France, and at Cranfield University, England

    ADVANCED MOTION MODELS FOR RIGID AND DEFORMABLE REGISTRATION IN IMAGE-GUIDED INTERVENTIONS

    Get PDF
    Image-guided surgery (IGS) has been a major area of interest in recent decades that continues to transform surgical interventions and enable safer, less invasive procedures. In the preoperative contexts, diagnostic imaging, including computed tomography (CT) and magnetic resonance (MR) imaging, offers a basis for surgical planning (e.g., definition of target, adjacent anatomy, and the surgical path or trajectory to the target). At the intraoperative stage, such preoperative images and the associated planning information are registered to intraoperative coordinates via a navigation system to enable visualization of (tracked) instrumentation relative to preoperative images. A major limitation to such an approach is that motions during surgery, either rigid motions of bones manipulated during orthopaedic surgery or brain soft-tissue deformation in neurosurgery, are not captured, diminishing the accuracy of navigation systems. This dissertation seeks to use intraoperative images (e.g., x-ray fluoroscopy and cone-beam CT) to provide more up-to-date anatomical context that properly reflects the state of the patient during interventions to improve the performance of IGS. Advanced motion models for inter-modality image registration are developed to improve the accuracy of both preoperative planning and intraoperative guidance for applications in orthopaedic pelvic trauma surgery and minimally invasive intracranial neurosurgery. Image registration algorithms are developed with increasing complexity of motion that can be accommodated (single-body rigid, multi-body rigid, and deformable) and increasing complexity of registration models (statistical models, physics-based models, and deep learning-based models). For orthopaedic pelvic trauma surgery, the dissertation includes work encompassing: (i) a series of statistical models to model shape and pose variations of one or more pelvic bones and an atlas of trajectory annotations; (ii) frameworks for automatic segmentation via registration of the statistical models to preoperative CT and planning of fixation trajectories and dislocation / fracture reduction; and (iii) 3D-2D guidance using intraoperative fluoroscopy. For intracranial neurosurgery, the dissertation includes three inter-modality deformable registrations using physic-based Demons and deep learning models for CT-guided and CBCT-guided procedures

    Surface Remeshing and Applications

    Get PDF
    Due to the focus of popular graphic accelerators, triangle meshes remain the primary representation for 3D surfaces. They are the simplest form of interpolation between surface samples, which may have been acquired with a laser scanner, computed from a 3D scalar field resolved on a regular grid, or identified on slices of medical data. Typical methods for the generation of triangle meshes from raw data attempt to lose as less information as possible, so that the resulting surface models can be used in the widest range of scenarios. When such a general-purpose model has to be used in a particular application context, however, a pre-processing is often worth to be considered. In some cases, it is convenient to slightly modify the geometry and/or the connectivity of the mesh, so that further processing can take place more easily. Other applications may require the mesh to have a pre-defined structure, which is often different from the one of the original general-purpose mesh. The central focus of this thesis is the automatic remeshing of highly detailed surface triangulations. Besides a thorough discussion of state-of-the-art applications such as real-time rendering and simulation, new approaches are proposed which use remeshing for topological analysis, flexible mesh generation and 3D compression. Furthermore, innovative methods are introduced to post-process polygonal models in order to recover information which was lost, or hidden, by a prior remeshing process. Besides the technical contributions, this thesis aims at showing that surface remeshing is much more useful than it may seem at a first sight, as it represents a nearly fundamental step for making several applications feasible in practice

    Director's Discretionary Fund Report for Fiscal Year 1996

    Get PDF
    Topics covered include: Waterproofing the Space Shuttle tiles, thermal protection system for Reusable Launch Vehicles, computer modeling of the thermal conductivity of cometary ice, effects of ozone depletion and ultraviolet radiation on plants, a novel telemetric biosensor to monitor blood pH on-line, ion mobility in polymer electrolytes for lithium-polymer batteries, a microwave-pumped far infrared photoconductor, and a new method for measuring cloud liquid vapor using near infrared remote sensing. Also included: laser-spectroscopic instrument for turbulence measurement, remote sensing of aircraft contrails using a field portable imaging interferometer, development of a silicon-micromachined gas chromatography system for determination of planetary surface composition, planar Doppler velocimetry, chaos in interstellar chemistry, and a limited pressure cycle engine for high-speed output

    Mobile Robots Navigation

    Get PDF
    Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described

    Satellite Communications

    Get PDF
    This study is motivated by the need to give the reader a broad view of the developments, key concepts, and technologies related to information society evolution, with a focus on the wireless communications and geoinformation technologies and their role in the environment. Giving perspective, it aims at assisting people active in the industry, the public sector, and Earth science fields as well, by providing a base for their continued work and thinking
    corecore