2,505 research outputs found

    Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings

    Full text link
    Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods.Comment: MICCAI 202

    3D Medical Collaboration Technology to Enhance Emergency Healthcare

    Get PDF
    Two-dimensional (2D) videoconferencing has been explored widely in the past 15–20 years to support collaboration in healthcare. Two issues that arise in most evaluations of 2D videoconferencing in telemedicine are the difficulty obtaining optimal camera views and poor depth perception. To address these problems, we are exploring the use of a small array of cameras to reconstruct dynamic three-dimensional (3D) views of a remote environment and of events taking place within. The 3D views could be sent across wired or wireless networks to remote healthcare professionals equipped with fixed displays or with mobile devices such as personal digital assistants (PDAs). The remote professionals’ viewpoints could be specified manually or automatically (continuously) via user head or PDA tracking, giving the remote viewers head-slaved or hand-slaved virtual cameras for monoscopic or stereoscopic viewing of the dynamic reconstructions. We call this idea remote 3D medical collaboration. In this article we motivate and explain the vision for 3D medical collaboration technology; we describe the relevant computer vision, computer graphics, display, and networking research; we present a proof-of-concept prototype system; and we present evaluation results supporting the general hypothesis that 3D remote medical collaboration technology could offer benefits over conventional 2D videoconferencing in emergency healthcare

    Human-robot interaction and computer-vision-based services for autonomous robots

    Get PDF
    L'Aprenentatge per Imitació (IL), o Programació de robots per Demostració (PbD), abasta mètodes pels quals un robot aprèn noves habilitats a través de l'orientació humana i la imitació. La PbD s'inspira en la forma en què els éssers humans aprenen noves habilitats per imitació amb la finalitat de desenvolupar mètodes pels quals les noves tasques es poden transferir als robots. Aquesta tesi està motivada per la pregunta genèrica de "què imitar?", Que es refereix al problema de com extreure les característiques essencials d'una tasca. Amb aquesta finalitat, aquí adoptem la perspectiva del Reconeixement d'Accions (AR) per tal de permetre que el robot decideixi el què cal imitar o inferir en interactuar amb un ésser humà. L'enfoc proposat es basa en un mètode ben conegut que prové del processament del llenguatge natural: és a dir, la bossa de paraules (BoW). Aquest mètode s'aplica a grans bases de dades per tal d'obtenir un model entrenat. Encara que BoW és una tècnica d'aprenentatge de màquines que s'utilitza en diversos camps de la investigació, en la classificació d'accions per a l'aprenentatge en robots està lluny de ser acurada. D'altra banda, se centra en la classificació d'objectes i gestos en lloc d'accions. Per tant, en aquesta tesi es demostra que el mètode és adequat, en escenaris de classificació d'accions, per a la fusió d'informació de diferents fonts o de diferents assajos. Aquesta tesi fa tres contribucions: (1) es proposa un mètode general per fer front al reconeixement d'accions i per tant contribuir a l'aprenentatge per imitació; (2) la metodologia pot aplicar-se a grans bases de dades, que inclouen diferents modes de captura de les accions; i (3) el mètode s'aplica específicament en un projecte internacional d'innovació real anomenat Vinbot.El Aprendizaje por Imitación (IL), o Programación de robots por Demostración (PbD), abarca métodos por los cuales un robot aprende nuevas habilidades a través de la orientación humana y la imitación. La PbD se inspira en la forma en que los seres humanos aprenden nuevas habilidades por imitación con el fin de desarrollar métodos por los cuales las nuevas tareas se pueden transferir a los robots. Esta tesis está motivada por la pregunta genérica de "qué imitar?", que se refiere al problema de cómo extraer las características esenciales de una tarea. Con este fin, aquí adoptamos la perspectiva del Reconocimiento de Acciones (AR) con el fin de permitir que el robot decida lo que hay que imitar o inferir al interactuar con un ser humano. El enfoque propuesto se basa en un método bien conocido que proviene del procesamiento del lenguaje natural: es decir, la bolsa de palabras (BoW). Este método se aplica a grandes bases de datos con el fin de obtener un modelo entrenado. Aunque BoW es una técnica de aprendizaje de máquinas que se utiliza en diversos campos de la investigación, en la clasificación de acciones para el aprendizaje en robots está lejos de ser acurada. Además, se centra en la clasificación de objetos y gestos en lugar de acciones. Por lo tanto, en esta tesis se demuestra que el método es adecuado, en escenarios de clasificación de acciones, para la fusión de información de diferentes fuentes o de diferentes ensayos. Esta tesis hace tres contribuciones: (1) se propone un método general para hacer frente al reconocimiento de acciones y por lo tanto contribuir al aprendizaje por imitación; (2) la metodología puede aplicarse a grandes bases de datos, que incluyen diferentes modos de captura de las acciones; y (3) el método se aplica específicamente en un proyecto internacional de innovación real llamado Vinbot.Imitation Learning (IL), or robot Programming by Demonstration (PbD), covers methods by which a robot learns new skills through human guidance and imitation. PbD takes its inspiration from the way humans learn new skills by imitation in order to develop methods by which new tasks can be transmitted to robots. This thesis is motivated by the generic question of “what to imitate?” which concerns the problem of how to extract the essential features of a task. To this end, here we adopt Action Recognition (AR) perspective in order to allow the robot to decide what has to be imitated or inferred when interacting with a human kind. The proposed approach is based on a well-known method from natural language processing: namely, Bag of Words (BoW). This method is applied to large databases in order to obtain a trained model. Although BoW is a machine learning technique that is used in various fields of research, in action classification for robot learning it is far from accurate. Moreover, it focuses on the classification of objects and gestures rather than actions. Thus, in this thesis we show that the method is suitable in action classification scenarios for merging information from different sources or different trials. This thesis makes three contributions: (1) it proposes a general method for dealing with action recognition and thus to contribute to imitation learning; (2) the methodology can be applied to large databases which include different modes of action captures; and (3) the method is applied specifically in a real international innovation project called Vinbot

    User experience study of 360° music videos on computer monitor and virtual reality goggles

    Get PDF
    In the past few years, user experience has become a trend in the field of Human-Computer Interaction (HCI). The ultimate success of a product or service depends on delivering user experience as the user prefers. 360° video is an immersive technology that offers a new era of visual experience. People are using 360° videos in several sectors of their everyday lives, including media consumption such as music videos. Nowadays, Virtual Reality (VR) and 360° camera hardware are becoming more usable, and 360° videos are also being produced to deliver realistic experience through both VR goggles and traditional displays. While producing 360° videos, the need of measuring user experiences also arises. This study explores the user experience of 360° music videos along with how the users perceive multicamera 360° music videos through the computer monitor and the VR goggles. This empirical research was conducted in the form of a laboratory experiment with 20 test participants. During the within-subject study, participants watched four 360° music videos produced with four different cutting rates and shots and then evaluated them. Quantitative and qualitative data were collected through user evaluations and interviews. The data were also analysed both quantitatively and qualitatively. The results indicated that a music video which was produced through integrating eight shots (average length of 26 s per shot) captured by four 360° cameras, delivered the highest quality user experiences on both computer monitor and VR goggles. The video which had the highest cutting rate (average length of 11 s per shot) delivered lowest-quality user experiences. Results also demonstrated that 360° music video which was produced by using a single camera delivered some boredom among the users because of its static view. The thesis is concluded by illustrating the findings of 360° music video user experiences based on user evaluation and interview data

    Systematic Parameterization, Storage, and Representation of Volumetric DICOM Data

    Get PDF
    Tomographic medical imaging systems produce hundreds to thousands of slices, enabling three-dimensional (3D) analysis. Radiologists process these images through various tools and techniques in order to generate 3D renderings for various applications, such as surgical planning, medical education, and volumetric measurements. To save and store these visualizations, current systems use snapshots or video exporting, which prevents further optimizations and requires the storage of significant additional data. The Grayscale Softcopy Presentation State extension of the Digital Imaging and Communications in Medicine (DICOM) standard resolves this issue for two-dimensional (2D) data by introducing an extensive set of parameters, namely 2D Presentation States (2DPR), that describe how an image should be displayed. 2DPR allows storing these parameters instead of storing parameter applied images, which cause unnecessary duplication of the image data. Since there is currently no corresponding extension for 3D data, in this study, a DICOM-compliant object called 3D presentation states (3DPR) is proposed for the parameterization and storage of 3D medical volumes. To accomplish this, the 3D medical visualization process is divided into four tasks, namely pre-processing, segmentation, post-processing, and rendering. The important parameters of each task are determined. Special focus is given to the compression of segmented data, parameterization of the rendering process, and DICOM-compliant implementation of the 3DPR object. The use of 3DPR was tested in a radiology department on three clinical cases, which require multiple segmentations and visualizations during the workflow of radiologists. The results show that 3DPR can effectively simplify the workload of physicians by directly regenerating 3D renderings without repeating intermediate tasks, increase efficiency by preserving all user interactions, and provide efficient storage as well as transfer of visualized data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s40846-015-0097-5) contains supplementary material, which is available to authorized users

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
    corecore