194 research outputs found
Optimised Calibration, Registration and Tracking for Image Enhanced Surgical Navigation in ENT Operations
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Reconstructing while registering: a novel approach for markerless augmented reality
Colloque avec actes et comité de lecture. internationale.International audienceThis paper addresses the registration problem for unprepared multi-planar scenes. An interactive process is proposed to get accurate results using nothing else than the texture information of the planes. In particular, the classical preparation steps (camera calibration, scene acquisition) are greatly simplified, since included in the on-line registration process. Some results are shown on indoor and outdoor scenes. Videos available at url : http://www.loria.fr/~gsimon/Ismar
Recommended from our members
Spatially Augmented Reality on Dynamic, Deformable Surfaces and its Applications
Spatially Augmented Reality (SAR), also known as projection mapping, uses multiple projectors to illuminate surfaces of arbitrary shape and size and create seamless, large-scale displays. Traditional SAR assumes that the projection surface is static and rigid. This restriction was partially addressed by Dynamic-SAR, where the surface is rigid and of known shape but can be moved around. However, no prior work has addressed SAR using multiple projectors on deformable surfaces, where the shape is unknown and constantly changing. Thus, multi-projector SAR on deformable surfaces introduces several challenges, including projector-camera calibration on a deformable surface, real-time surface shape recovery and real-time multi-projector warping and blending. My thesis is the first attempt to develop a comprehensive framework for achieving seamless multi-projector displays on deformable surfaces. Furthermore, I will also be presenting its applications in the medical domain to enable remote surgical guidance by using SAR to illuminate surgical stencils on a physical surgical site precisely
Implicit Meshes for Effective Silhouette Handling
Using silhouettes in uncontrolled environments typically requires handling occlusions as well as changing or cluttered backgrounds, which limits the applicability of most silhouette based methods. For the purpose of 3-D shape modeling, we show that representing generic 3-D surfaces as implicit surfaces lets us effectively address these issues. This desirable behavior is completely independent from the way the surface deformations are parame-trized. To show this, we demonstrate our technique in three very different cases: Modeling the deformations of a piece of paper represented by an ordinary triangulated mesh; reconstruction and tracking a person's shoulders whose deformations are expressed in terms of Dirichlet Free Form Deformations; reconstructing the shape of a human face parametrized in terms of a Principal Component Analysis mode
The development of a hybrid virtual reality/video view-morphing display system for teleoperation and teleconferencing
Thesis (S.M.)--Massachusetts Institute of Technology, System Design & Management Program, 2000.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 84-89).The goal of this study is to extend the desktop panoramic static image viewer concept (e.g., Apple QuickTime VR; IPIX) to support immersive real time viewing, so that an observer wearing a head-mounted display can make free head movements while viewing dynamic scenes rendered in real time stereo using video data obtained from a set of fixed cameras. Computational experiments by Seitz and others have demonstrated the feasibility of morphing image pairs to render stereo scenes from novel, virtual viewpoints. The user can interact both with morphed real world video images, and supplementary artificial virtual objects (“Augmented Reality”). The inherent congruence of the real and artificial coordinate frames of this system reduces registration errors commonly found in Augmented Reality applications. In addition, the user’s eyepoint is computed locally so that any scene lag resulting from head movement will be less than those from alternative technologies using remotely controlled ground cameras. For space applications, this can significantly reduce the apparent lag due to satellite communication delay. This hybrid VR/view-morphing display (“Virtual Video”) has many important NASA applications including remote teleoperation, crew onboard training, private family and medical teleconferencing, and telemedicine. The technical objective of this study developed a proof-of-concept system using a 3D graphics PC workstation of one of the component technologies, Immersive Omnidirectional Video, of Virtual Video. The management goal identified a system process for planning, managing, and tracking the integration, test and validation of this phased, 3-year multi-university research and development program.by William E. Hutchison.S.M
Pose estimation system based on monocular cameras
Our world is full of wonders. It is filled with mysteries and challenges, which through
the ages inspired and called for the human civilization to grow itself, either philosophically
or sociologically. In time, humans reached their own physical limitations;
nevertheless, we created technology to help us overcome it. Like the ancient uncovered
land, we are pulled into the discovery and innovation of our time. All of this is
possible due to a very human characteristic - our imagination.
The world that surrounds us is mostly already discovered, but with the power of
computer vision (CV) and augmented reality (AR), we are able to live in multiple hidden
universes alongside our own. With the increasing performance and capabilities of
the current mobile devices, AR is what we dream it can be. There are still many obstacles,
but this future is already our reality, and with the evolving technologies closing
the gap between the real and the virtual world, soon it will be possible for us to surround
ourselves into other dimensions, or fuse them with our own.
This thesis focuses on the development of a system to predict the camera’s pose
estimation in the real-world regarding to the virtual world axis. The work was developed
as a sub-module integrated on the M5SAR project: Mobile Five Senses Augmented
Reality System for Museums, aiming to a more immerse experience with the
total or partial replacement of the environments’ surroundings. It is based mainly on
man-made buildings indoors and their typical rectangular cuboid shape. With the possibility
of knowing the user’s camera direction, we can then superimpose dynamic AR content, inviting the user to explore the hidden worlds.
The M5SAR project introduced a new way to explore the existent historical museums
by exploring the human’s five senses: hearing, smell, taste, touch, vision. With
this innovative technology, the user is able to enhance their visitation and immerse
themselves into a virtual world blended with our reality. A mobile device application
was built containing an innovating framework: MIRAR - Mobile Image Recognition
based Augmented Reality - containing object recognition, navigation, and additional
AR information projection in order to enrich the users’ visit, providing an intuitive
and compelling information regarding the available artworks, exploring the hearing
and vision senses. A device specially designed was built to explore the additional
three senses: smell, taste and touch which, when attached to a mobile device, either
smartphone or tablet, would pair with it and automatically react in with the offered
narrative related to the artwork, immersing the user with a sensorial experience.
As mentioned above, the work presented on this thesis is relative to a sub-module
of the MIRAR regarding environment detection and the superimposition of AR content.
With the main goal being the full replacement of the walls’ contents, and with the
possibility of keeping the artwork visible or not, it presented an additional challenge
with the limitation of using only monocular cameras. Without the depth information,
any 2D image of an environment, to a computer doesn’t represent the tridimensional
layout of the real-world dimensions. Nevertheless, man-based building tends to follow
a rectangular approach to divisions’ constructions, which allows for a prediction
to where the vanishing point on any environment image may point, allowing the reconstruction
of an environment’s layout from a 2D image. Furthermore, combining
this information with an initial localization through an improved image recognition
to retrieve the camera’s spatial position regarding to the real-world coordinates and
the virtual-world, alas, pose estimation, allowed for the possibility of superimposing
specific localized AR content over the user’s mobile device frame, in order to immerse,
i.e., a museum’s visitor into another era correlated to the present artworks’ historical
period. Through the work developed for this thesis, it was also presented a better planar surface in space rectification and retrieval, a hybrid and scalable multiple images
matching system, a more stabilized outlier filtration applied to the camera’s axis,
and a continuous tracking system that works with uncalibrated cameras and is able to
achieve particularly obtuse angles and still maintain the surface superimposition.
Furthermore, a novelty method using deep learning models for semantic segmentation
was introduced for indoor layout estimation based on monocular images. Contrary
to the previous developed methods, there is no need to perform geometric calculations
to achieve a near state of the art performance with a fraction of the parameters
required by similar methods. Contrary to the previous work presented on this thesis,
this method performs well even in unseen and cluttered rooms if they follow the Manhattan
assumption. An additional lightweight application to retrieve the camera pose
estimation is presented using the proposed method.O nosso mundo está repleto de maravilhas. Está cheio de mistérios e desafios, os quais,
ao longo das eras, inspiraram e impulsionaram a civilização humana a evoluir, seja
filosófica ou sociologicamente. Eventualmente, os humanos foram confrontados com
os seus limites físicos; desta forma, criaram tecnologias que permitiram superá-los.
Assim como as terras antigas por descobrir, somos impulsionados à descoberta e inovação
da nossa era, e tudo isso é possível graças a uma característica marcadamente
humana: a nossa imaginação.
O mundo que nos rodeia está praticamente todo descoberto, mas com o poder da
visão computacional (VC) e da realidade aumentada (RA), podemos viver em múltiplos
universos ocultos dentro do nosso. Com o aumento da performance e das capacidades
dos dispositivos móveis da atualidade, a RA pode ser exatamente aquilo que
sonhamos. Continuam a existir muitos obstáculos, mas este futuro já é o nosso presente,
e com a evolução das tecnologias a fechar o fosso entre o mundo real e o mundo
virtual, em breve será possível cercarmo-nos de outras dimensões, ou fundi-las dentro
da nossa.
Esta tese foca-se no desenvolvimento de um sistema de predição para a estimação
da pose da câmara no mundo real em relação ao eixo virtual do mundo. Este trabalho
foi desenvolvido como um sub-módulo integrado no projeto M5SAR: Mobile
Five Senses Augmented Reality System for Museums, com o objetivo de alcançar uma
experiência mais imersiva com a substituição total ou parcial dos limites do ambiente. Dedica-se ao interior de edifícios de arquitetura humana e a sua típica forma
de retângulo cuboide. Com a possibilidade de saber a direção da câmara do dispositivo,
podemos então sobrepor conteúdo dinâmico de RA, num convite ao utilizador
para explorar os mundos ocultos.
O projeto M5SAR introduziu uma nova forma de explorar os museus históricos existentes
através da exploração dos cinco sentidos humanos: a audição, o cheiro, o paladar,
o toque e a visão. Com essa tecnologia inovadora, o utilizador pode engrandecer
a sua visita e mergulhar num mundo virtual mesclado com a nossa realidade. Uma
aplicação para dispositivo móvel foi criada, contendo uma estrutura inovadora: MIRAR
- Mobile Image Recognition based Augmented Reality - a possuir o reconhecimento
de objetos, navegação e projeção de informação de RA adicional, de forma a
enriquecer a visita do utilizador, a fornecer informação intuitiva e interessante em relação
às obras de arte disponíveis, a explorar os sentidos da audição e da visão. Foi
também desenhado um dispositivo para exploração em particular dos três outros sentidos
adicionais: o cheiro, o toque e o sabor. Este dispositivo, quando afixado a um
dispositivo móvel, como um smartphone ou tablet, emparelha e reage com este automaticamente
com a narrativa relacionada à obra de arte, a imergir o utilizador numa
experiência sensorial.
Como já referido, o trabalho apresentado nesta tese é relativo a um sub-módulo
do MIRAR, relativamente à deteção do ambiente e a sobreposição de conteúdo de RA.
Sendo o objetivo principal a substituição completa dos conteúdos das paredes, e com
a possibilidade de manter as obras de arte visíveis ou não, foi apresentado um desafio
adicional com a limitação do uso de apenas câmaras monoculares. Sem a informação
relativa à profundidade, qualquer imagem bidimensional de um ambiente, para um
computador isso não se traduz na dimensão tridimensional das dimensões do mundo
real. No entanto, as construções de origem humana tendem a seguir uma abordagem
retangular às divisões dos edifícios, o que permite uma predição de onde poderá apontar
o ponto de fuga de qualquer ambiente, a permitir a reconstrução da disposição de
uma divisão através de uma imagem bidimensional. Adicionalmente, ao combinar esta informação com uma localização inicial através de um reconhecimento por imagem
refinado, para obter a posição espacial da câmara em relação às coordenadas
do mundo real e do mundo virtual, ou seja, uma estimativa da pose, foi possível alcançar
a possibilidade de sobrepor conteúdo de RA especificamente localizado sobre
a moldura do dispositivo móvel, de maneira a imergir, ou seja, colocar o visitante do
museu dentro de outra era, relativa ao período histórico da obra de arte em questão.
Ao longo do trabalho desenvolvido para esta tese, também foi apresentada uma melhor
superfície planar na recolha e retificação espacial, um sistema de comparação de
múltiplas imagens híbrido e escalável, um filtro de outliers mais estabilizado, aplicado
ao eixo da câmara, e um sistema de tracking contínuo que funciona com câmaras não
calibradas e que consegue obter ângulos particularmente obtusos, continuando a manter
a sobreposição da superfície.
Adicionalmente, um algoritmo inovador baseado num modelo de deep learning
para a segmentação semântica foi introduzido na estimativa do traçado com base em
imagens monoculares. Ao contrário de métodos previamente desenvolvidos, não é
necessário realizar cálculos geométricos para obter um desempenho próximo ao state
of the art e ao mesmo tempo usar uma fração dos parâmetros requeridos para métodos
semelhantes. Inversamente ao trabalho previamente apresentado nesta tese, este
método apresenta um bom desempenho mesmo em divisões sem vista ou obstruídas,
caso sigam a mesma premissa Manhattan. Uma leve aplicação adicional para obter a
posição da câmara é apresentada usando o método proposto
Facade Proposals for Urban Augmented Reality
International audienceWe introduce a novel object proposals method specific to building facades. We define new image cues that measure typical facadecharacteristics such as semantic, symmetry and repetitions. They are combined to generate a few facade candidates in urban environments fast. We show that our method outperforms state-of-the-art object proposals techniques for this task on the 1000 images of the Zurich Building Database. We demonstrate the interest of this procedure for augmented reality through facade recognition and camera pose initialization. In a very time-efficient pipeline we classify the candidates and match them to a facade references database using CNN-based descriptors. We prove that this approach is more robust to severe changes of viewpoint and occlusions than standard object recognition methods
Markerless Tracking using Planar Structures in the Scene
Colloque avec actes et comité de lecture. internationale.International audienceWe describe a markerless camera tracking system for augmented reality that operates in environments which contain one or more planes. This is a common special case, which we show significantly simplifies tracking. The result is a practical, reliable, vision-based tracker. Furthermore, the tracked plane imposes a natural reference frame, so that the alignment of the real and virtual coordinate systems is rather simpler than would be the case with a general structure-and-motion system. Multiple planes can be tracked, and additional data such as 2D point tracks are easily incorporated
Towards markerless orthopaedic navigation with intuitive Optical See-through Head-mounted displays
The potential of image-guided orthopaedic navigation to improve surgical outcomes has been well-recognised during the last two decades. According to the tracked pose of target bone, the anatomical information and preoperative plans are updated and displayed to surgeons, so that they can follow the guidance to reach the goal with higher accuracy, efficiency and reproducibility. Despite their success, current orthopaedic navigation systems have two main limitations: for target tracking, artificial markers have to be drilled into the bone and calibrated manually to the bone, which introduces the risk of additional harm to patients and increases operating complexity; for guidance visualisation, surgeons have to shift their attention from the patient to an external 2D monitor, which is disruptive and can be mentally stressful.
Motivated by these limitations, this thesis explores the development of an intuitive, compact and reliable navigation system for orthopaedic surgery. To this end, conventional marker-based tracking is replaced by a novel markerless tracking algorithm, and the 2D display is replaced by a 3D holographic Optical see-through (OST) Head-mounted display (HMD) precisely calibrated to a user's perspective.
Our markerless tracking, facilitated by a commercial RGBD camera, is achieved through deep learning-based bone segmentation followed by real-time pose registration. For robust segmentation, a new network is designed and efficiently augmented by a synthetic dataset. Our segmentation network outperforms the state-of-the-art regarding occlusion-robustness, device-agnostic behaviour, and target generalisability. For reliable pose registration, a novel Bounded Iterative Closest Point (BICP) workflow is proposed. The improved markerless tracking can achieve a clinically acceptable error of 0.95 deg and 2.17 mm according to a phantom test.
OST displays allow ubiquitous enrichment of perceived real world with contextually blended virtual aids through semi-transparent glasses. They have been recognised as a suitable visual tool for surgical assistance, since they do not hinder the surgeon's natural eyesight and require no attention shift or perspective conversion. The OST calibration is crucial to ensure locational-coherent surgical guidance.
Current calibration methods are either human error-prone or hardly applicable to commercial devices. To this end, we propose an offline camera-based calibration method that is highly accurate yet easy to implement in commercial products, and an online alignment-based refinement that is user-centric and robust against user error. The proposed methods are proven to be superior to other similar State-of-
the-art (SOTA)s regarding calibration convenience and display accuracy.
Motivated by the ambition to develop the world's first markerless OST navigation system, we integrated the developed markerless tracking and calibration scheme into a complete navigation workflow designed for femur drilling tasks during knee replacement surgery. We verify the usability of our designed OST system with an experienced orthopaedic surgeon by a cadaver study. Our test validates the potential of the proposed markerless navigation system for surgical assistance, although further improvement is required for clinical acceptance.Open Acces
Bio-Inspired Multi-Spectral Image Sensor and Augmented Reality Display for Near-Infrared Fluorescence Image-Guided Surgery
Background: Cancer remains a major public health problem worldwide and poses a huge economic burden. Near-infrared (NIR) fluorescence image-guided surgery (IGS) utilizes molecular markers and imaging instruments to identify and locate tumors during surgical resection. Unfortunately, current state-of-the-art NIR fluorescence imaging systems are bulky, costly, and lack both fluorescence sensitivity under surgical illumination and co-registration accuracy between multimodal images. Additionally, the monitor-based display units are disruptive to the surgical workflow and are suboptimal at indicating the 3-dimensional position of labeled tumors. These major obstacles have prevented the wide acceptance of NIR fluorescence imaging as the standard of care for cancer surgery. The goal of this dissertation is to enhance cancer treatment by developing novel image sensors and presenting the information using holographic augmented reality (AR) display to the physician in intraoperative settings.
Method: By mimicking the visual system of the Morpho butterfly, several single-chip, color-NIR fluorescence image sensors and systems were developed with CMOS technologies and pixelated interference filters. Using a holographic AR goggle platform, an NIR fluorescence IGS display system was developed. Optoelectronic evaluation was performed on the prototypes to evaluate the performance of each component, and small animal models and large animal models were used to verify the overall effectiveness of the integrated systems at cancer detection.
Result: The single-chip bio-inspired multispectral logarithmic image sensor I developed has better main performance indicators than the state-of-the-art NIR fluorescence imaging instruments. The image sensors achieve up to 140 dB dynamic range. The sensitivity under surgical illumination achieves 6108 V/(mW/cm2), which is up to 25 times higher. The signal-to-noise ratio is up to 56 dB, which is 11 dB greater. These enable high sensitivity fluorescence imaging under surgical illumination. The pixelated interference filters enable temperature-independent co-registration accuracy between multimodal images. Pre-clinical trials with small animal model demonstrate that the sensor can achieve up to 95% sensitivity and 94% specificity with tumor-targeted NIR molecular probes. The holographic AR goggle provides the physician with a non-disruptive 3-dimensional display in the clinical setup. This is the first display system that co-registers a virtual image with human eyes and allows video rate image transmission. The imaging system is tested in the veterinary science operating room on canine patients with naturally occurring cancers. In addition, a time domain pulse-width-modulation address-event-representation multispectral image sensor and a handheld multispectral camera prototype are developed.
Conclusion: The major problems of current state-of-the-art NIR fluorescence imaging systems are successfully solved. Due to enhanced performance and user experience, the bio-inspired sensors and augmented reality display system will give medical care providers much needed technology to enable more accurate value-based healthcare
- …