Search CORE

576 research outputs found

Recommended from our members

Gaze tracing in a bounded log-spherical space for artificial attention systems

Author: Ferreira JF
Lanillos P
Oliveira B
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Human gaze is one of the most important cue for social robotics due to its embedded intention information. Discovering the location or the object that an interlocutor is staring at, gives the machine some insight to perform the correct attentional behaviour. This work presents a fast voxel traversal algorithm for estimating the potential locations that a human is gazing. Given a 3D occupancy map in log-spherical coordinates and the gaze vector, we evaluate the regions that are relevant for attention by computing the set of intersected voxels between an arbitrary gaze ray in the 3D space and a log-spherical bounded section defined by ρ∈(ρmin,ρmax), θ∈(θmin,θmax), ϕ∈(ϕmin,ϕmax). The first intersected voxel is computed in closed form and the rest are obtained by binary search guaranteeing no repetitions in the intersected set. The proposed method is motivated and validated within a human-robot interaction application: gaze tracing for artificial attention systems

Nottingham Trent Institutional Repository (IRep)

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

Efficient image-based rendering

Author: Bemana Mojtaba
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.Jüngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle über Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spärlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der Sichtabhängigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. Darüber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beiträgt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten

Universaar

Acronym

Hand eye coordination in surgery

Author: Leong Jit Hung Julian John
Publication venue: Wolfson Foundation Medical Image Computing Laboratory. Biosurgery and Surgical Technology Royal Society, Imperial College London
Publication date: 01/01/2009
Field of study

The coordination of the hand in response to visual target selection has always been regarded as an essential quality in a range of professional activities. This quality has thus far been elusive to objective scientific measurements, and is usually engulfed in the overall performance of the individuals. Parallels can be drawn to surgery, especially Minimally Invasive Surgery (MIS), where the physical constraints imposed by the arrangements of the instruments and visualisation methods require certain coordination skills that are unprecedented. With the current paradigm shift towards early specialisation in surgical training and shortened focused training time, selection process should identify trainees with the highest potentials in certain specific skills. Although significant effort has been made in objective assessment of surgical skills, it is only currently possible to measure surgeons’ abilities at the time of assessment. It has been particularly difficult to quantify specific details of hand-eye coordination and assess innate ability of future skills development. The purpose of this thesis is to examine hand-eye coordination in laboratory-based simulations, with a particular emphasis on details that are important to MIS. In order to understand the challenges of visuomotor coordination, movement trajectory errors have been used to provide an insight into the innate coordinate mapping of the brain. In MIS, novel spatial transformations, due to a combination of distorted endoscopic image projections and the “fulcrum” effect of the instruments, accentuate movement generation errors. Obvious differences in the quality of movement trajectories have been observed between novices and experts in MIS, however, this is difficult to measure quantitatively. A Hidden Markov Model (HMM) is used in this thesis to reveal the underlying characteristic movement details of a particular MIS manoeuvre and how such features are exaggerated by the introduction of rotation in the endoscopic camera. The proposed method has demonstrated the feasibility of measuring movement trajectory quality by machine learning techniques without prior arbitrary classification of expertise. Experimental results have highlighted these changes in novice laparoscopic surgeons, even after a short period of training. The intricate relationship between the hands and the eyes changes when learning a skilled visuomotor task has been previously studied. Reactive eye movement, when visual input is used primarily as a feedback mechanism for error correction, implies difficulties in hand-eye coordination. As the brain learns to adapt to this new coordinate map, eye movements then become predictive of the action generated. The concept of measuring this spatiotemporal relationship is introduced as a measure of hand-eye coordination in MIS, by comparing the Target Distance Function (TDF) between the eye fixation and the instrument tip position on the laparoscopic screen. Further validation of this concept using high fidelity experimental tasks is presented, where higher cognitive influence and multiple target selection increase the complexity of the data analysis. To this end, Granger-causality is presented as a measure of the predictability of the instrument movement with the eye fixation pattern. Partial Directed Coherence (PDC), a frequency-domain variation of Granger-causality, is used for the first time to measure hand-eye coordination. Experimental results are used to establish the strengths and potential pitfalls of the technique. To further enhance the accuracy of this measurement, a modified Jensen-Shannon Divergence (JSD) measure has been developed for enhancing the signal matching algorithm and trajectory segmentations. The proposed framework incorporates high frequency noise filtering, which represents non-purposeful hand and eye movements. The accuracy of the technique has been demonstrated by quantitative measurement of multiple laparoscopic tasks by expert and novice surgeons. Experimental results supporting visual search behavioural theory are presented, as this underpins the target selection process immediately prior to visual motor action generation. The effects of specialisation and experience on visual search patterns are also examined. Finally, pilot results from functional brain imaging are presented, where the Posterior Parietal Cortical (PPC) activation is measured using optical spectroscopy techniques. PPC has been demonstrated to involve in the calculation of the coordinate transformations between the visual and motor systems, which establishes the possibilities of exciting future studies in hand-eye coordination

Spiral - Imperial College Digital Repository

OpenGrey Repository

Representing and Inferring Visual Perceptual Skills in Dermatological Image Understanding

Author: Li Rui
Publication venue: RIT Scholar Works
Publication date: 01/07/2013
Field of study

Experts have a remarkable capability of locating, perceptually organizing, identifying, and categorizing objects in images specific to their domains of expertise. Eliciting and representing their visual strategies and some aspects of domain knowledge will benefit a wide range of studies and applications. For example, image understanding may be improved through active learning frameworks by transferring human domain knowledge into image-based computational procedures, intelligent user interfaces enhanced by inferring dynamic informational needs in real time, and cognitive processing analyzed via unveiling the engaged underlying cognitive processes. An eye tracking experiment was conducted to collect both eye movement and verbal narrative data from three groups of subjects with different medical training levels or no medical training in order to study perceptual skill. Each subject examined and described 50 photographical dermatological images. One group comprised 11 board-certified dermatologists (attendings), another group was 4 dermatologists in training (residents), and the third group 13 novices (undergraduate students with no medical training). We develop a novel hierarchical probabilistic framework to discover the stereotypical and idiosyncratic viewing behaviors exhibited by the three expertise-specific groups. A hidden Markov model is used to describe each subject\u27s eye movement sequence combined with hierarchical stochastic processes to capture and differentiate the discovered eye movement patterns shared by multiple subjects\u27 eye movement sequences within and among the three expertise-specific groups. Through these patterned eye movement behaviors we are able to elicit some aspects of the domain-specific knowledge and perceptual skill from the subjects whose eye movements are recorded during diagnostic reasoning processes on medical images. Analyzing experts\u27 eye movement patterns provides us insight into cognitive strategies exploited to solve complex perceptual reasoning tasks. Independent experts\u27 annotations of diagnostic conceptual units of thought in the transcribed verbal narratives are time-aligned with discovered eye movement patterns to help interpret the patterns\u27 meanings. By mapping eye movement patterns to thought units, we uncover the relationships between visual and linguistic elements of their reasoning and perceptual processes, and show the manner in which these subjects varied their behaviors while parsing the images

RIT Scholar Works

Blickpunktabhängige Computergraphik

Author: Stengel Michael
Publication venue
Publication date: 01/01/2016
Field of study

Contemporary digital displays feature multi-million pixels at ever-increasing refresh rates. Reality, on the other hand, provides us with a view of the world that is continuous in space and time. The discrepancy between viewing the physical world and its sampled depiction on digital displays gives rise to perceptual quality degradations. By measuring or estimating where we look, gaze-contingent algorithms aim at exploiting the way we visually perceive to remedy visible artifacts. This dissertation presents a variety of novel gaze-contingent algorithms and respective perceptual studies. Chapter 4 and 5 present methods to boost perceived visual quality of conventional video footage when viewed on commodity monitors or projectors. In Chapter 6 a novel head-mounted display with real-time gaze tracking is described. The device enables a large variety of applications in the context of Virtual Reality and Augmented Reality. Using the gaze-tracking VR headset, a novel gaze-contingent render method is described in Chapter 7. The gaze-aware approach greatly reduces computational efforts for shading virtual worlds. The described methods and studies show that gaze-contingent algorithms are able to improve the quality of displayed images and videos or reduce the computational effort for image generation, while display quality perceived by the user does not change.Moderne digitale Bildschirme ermöglichen immer höhere Auflösungen bei ebenfalls steigenden Bildwiederholraten. Die Realität hingegen ist in Raum und Zeit kontinuierlich. Diese Grundverschiedenheit führt beim Betrachter zu perzeptuellen Unterschieden. Die Verfolgung der Aug-Blickrichtung ermöglicht blickpunktabhängige Darstellungsmethoden, die sichtbare Artefakte verhindern können. Diese Dissertation trägt zu vier Bereichen blickpunktabhängiger und wahrnehmungstreuer Darstellungsmethoden bei. Die Verfahren in Kapitel 4 und 5 haben zum Ziel, die wahrgenommene visuelle Qualität von Videos für den Betrachter zu erhöhen, wobei die Videos auf gewöhnlicher Ausgabehardware wie z.B. einem Fernseher oder Projektor dargestellt werden. Kapitel 6 beschreibt die Entwicklung eines neuartigen Head-mounted Displays mit Unterstützung zur Erfassung der Blickrichtung in Echtzeit. Die Kombination der Funktionen ermöglicht eine Reihe interessanter Anwendungen in Bezug auf Virtuelle Realität (VR) und Erweiterte Realität (AR). Das vierte und abschließende Verfahren in Kapitel 7 dieser Dissertation beschreibt einen neuen Algorithmus, der das entwickelte Eye-Tracking Head-mounted Display zum blickpunktabhängigen Rendern nutzt. Die Qualität des Shadings wird hierbei auf Basis eines Wahrnehmungsmodells für jeden Bildpixel in Echtzeit analysiert und angepasst. Das Verfahren hat das Potenzial den Berechnungsaufwand für das Shading einer virtuellen Szene auf ein Bruchteil zu reduzieren. Die in dieser Dissertation beschriebenen Verfahren und Untersuchungen zeigen, dass blickpunktabhängige Algorithmen die Darstellungsqualität von Bildern und Videos wirksam verbessern können, beziehungsweise sich bei gleichbleibender Bildqualität der Berechnungsaufwand des bildgebenden Verfahrens erheblich verringern lässt

Digitale Bibliothek Braunschweig

Applications of ray tracing to a pseudophakic eye model

Author: Turuwhenua Jason
Publication venue: The University of Waikato
Publication date: 01/07/2021
Field of study

The calculation of IOL power using keratometry is adversely affected by recent corneal reshaping surgeries. This thesis investigates the application of ray tracing and general anterior corneal surface modeling, for the purpose of improving ophthalmic measurements and in particular, the estimation of IOL power. A new algorithm (based on a multi-step approach) for the recovery of the corneal height using videokeratography is presented. The method ensures a cubic recovery with continuous curvature; skew rays are treated in post-processing. The RMS height error is measured for three simulated (with two skewed) cornea. The total errors are 6.2 x 10⁻⁴ mm ignoring the skew ray error, and 1.7 x 10⁻⁴ mm accounting for it. The individual height errors are submicron in the latter case. The algorithm gives average errors of 2.5 x 10⁻⁴ mm for a set of calibration balls. The completion time is 2.3 s over all cases, using a standard desktop PC. A new method for the recovery of the internal ocular radii of curvature is investigated. The method is used to recover the posterior corneal radii (PII) and the anterior lens radii (PIII) given several anterior cornea models (PI) in simulation. The recovered surface powers are no more than 0.1 D(PII) and 0.006 D(PIII) in error of the true surface powers. A framework is then presented for modeling the effect of lens decenter and tilt on perceived image quality. The SQRI image quality metric is determined for a range of lens tilt and lens decenter values. These are compared with the statistical moments of the spot diagrams. The SQRI shows asymmetric degradation (with tilt for a particular decenter value) of imaging for a plane displaced -0.1 mm from best focus. For a plane displaced +0.1 mm from best focus, the SQRI is symmetric and improves regardless of the sign of tilt. The statistical moments suggest that skew does not necessarily imply poor imaging. Finally, the modeling methods developed are tested on two clinically measured eyes. Minimizing the spot size, predicts the spectacle prescription to 0.0 D(OS) and 0.1 D(OD) of the mean spherical equivalent. Adding prescribed lenses to the model eye, estimates best focus to 0.03 mm and 0.02 mm of the retinal plane; consistent with better than 6/6 VA measured for OS/OD. A VisTech VCTS 6500 contrast sensitivity chart is used to verify the eye model. A 75% match with theory is found for OS, a 50% match is found for OD

Research Commons@Waikato

Applied Cognitive Sciences

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

Cognitive science is an interdisciplinary field in the study of the mind and intelligence. The term cognition refers to a variety of mental processes, including perception, problem solving, learning, decision making, language use, and emotional experience. The basis of the cognitive sciences is the contribution of philosophy and computing to the study of cognition. Computing is very important in the study of cognition because computer-aided research helps to develop mental processes, and computers are used to test scientific hypotheses about mental organization and functioning. This book provides a platform for reviewing these disciplines and presenting cognitive research as a separate discipline

Directory of Open Access Books (DOAB)

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

Emotional body language synthesis for humanoid robots

Author: Marmpena Asimina
Publication venue: 'University of Plymouth'
Publication date: 01/01/2021
Field of study

Some of the chapters of this thesis are based on research published by the author. Chapter 4 is based on Marmpena M., Lim, A., and Dahl, T. S. (2018). How does the robot feel? Perception of valence and arousal in emotional body language. Paladyn, Journal of Behavioral Robotics, 9(1), 168-182. DOI: https://doi.org/10.1515/pjbr-2018-0012. Chapter 6 is based on Marmpena M., Lim, A., Dahl, T. S., and Hemion, N. (2019). Generating robotic emotional body language with Variational Autoencoders. In Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pages 545–551. DOI:10.1109/ACII.2019.8925459. Chapter 7 extends Marmpena M., Garcia, F., and Lim, A. (2020). Generating robotic emotional body language of targeted valence and arousal with Conditional Variational Autoencoders. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’20, page 357–359. DOI: https://doi.org/10.1145/3371382.3378360. The designed or generated robotic emotional body language expressions data presented in this thesis are publicly available: https://github.com/minamar/rebl-pepper-dataIn the next decade, societies will witness a rise in service robots deployed in social environments, such as schools, homes, or shops, where they will operate as assistants, public relation agents, or companions. People are expected to willingly engage and collaborate with these robots to accomplish positive outcomes. To facilitate collaboration, robots need to comply with the behavioural and social norms used by humans in their daily interactions. One such behavioural norm is the expression of emotion through body language. Previous work on emotional body language synthesis for humanoid robots has been mainly focused on hand-coded design methods, often employing features extracted from human body language. However, the hand-coded design is cumbersome and results in a limited number of expressions with low variability. This limitation can be at the expense of user engagement since the robotic behaviours will appear repetitive and predictable, especially in long-term interaction. Furthermore, design approaches strictly based on human emotional body language might not transfer effectively on robots because of their simpler morphology. Finally, most previous work is using six or fewer basic emotion categories in the design and the evaluation phase of emotional expressions. This approach might result in lossy compression of the granularity in emotion expression. The current thesis presents a methodology for developing a complete framework of emotional body language generation for a humanoid robot, intending to address these three limitations. Our starting point is a small set of animations designed by professional animators with the robot morphology in mind. We conducted an initial user study to acquire reliable dimensional labels of valence and arousal for each animation. In the next step, we used the motion sequences from these animations to train a Variational Autoencoder, a deep learning model, to generate numerous new animations in an unsupervised setting. Finally, we extended the model to condition the generative process with valence and arousal attributes, and we conducted a user study to evaluate the interpretability of the animations in terms of valence, arousal, and dominance. The results indicate moderate to strong interpretability

Plymouth Electronic Archive and Research Library