47 research outputs found

    Saliency prediction in 360° architectural scenes:Performance and impact of daylight variations

    Get PDF
    Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the performance of eleven commonly used saliency models that combine traditional and deep learning methods on 126 rendered interior scenes with associated head tracking data. The data was extracted from three experiments conducted in virtual reality between 2016 and 2018. Two of these datasets pertain to the perceptual effects of daylight and include variations of daylighting conditions for a limited set of interior spaces, thereby allowing to test the influence of light conditions on human head movement. Ground truth maps were extracted from the collected head tracking logs, and the prediction accuracy of the models was tested via the correlation coefficient between ground truth and prediction maps. To address the possible inflation of results due to the equator bias, we conducted complementary analyses by restricting the area of investigation to the equatorial image regions. Although limited to immersive virtual environments, the promising performance of some traditional models such as GBVS360eq and BMS360eq for colored and textured architectural rendered spaces offers us the prospect of their possible integration into design tools. We also observed a strong correlation in head movements for the same space lit by different types of sky, a finding whose generalization requires further investigations based on datasets more specifically developed to address this question.</p

    Saliency prediction in 360° architectural scenes: Performance and impact of daylight variations

    Get PDF
    Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the performance of eleven commonly used saliency models that combine traditional and deep learning methods on 126 rendered interior scenes with associated head tracking data. The data was extracted from three experiments conducted in virtual reality between 2016 and 2018. Two of these datasets pertain to the perceptual effects of daylight and include variations of daylighting conditions for a limited set of interior spaces, thereby allowing to test the influence of light conditions on human head movement. Ground truth maps were extracted from the collected head tracking logs, and the prediction accuracy of the models was tested via the correlation coefficient between ground truth and prediction maps. To address the possible inflation of results due to the equator bias, we conducted complementary analyses by restricting the area of investigation to the equatorial image regions. Although limited to immersive virtual environments, the promising performance of some traditional models such as GBVS360eq and BMS360eq for colored and textured architectural rendered spaces offers us the prospect of their possible integration into design tools. We also observed a strong correlation in head movements for the same space lit by different types of sky, a finding whose generalization requires further investigations based on datasets more specifically developed to address this question

    Saliency prediction in 360° architectural scenes:Performance and impact of daylight variations

    Get PDF
    Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the performance of eleven commonly used saliency models that combine traditional and deep learning methods on 126 rendered interior scenes with associated head tracking data. The data was extracted from three experiments conducted in virtual reality between 2016 and 2018. Two of these datasets pertain to the perceptual effects of daylight and include variations of daylighting conditions for a limited set of interior spaces, thereby allowing to test the influence of light conditions on human head movement. Ground truth maps were extracted from the collected head tracking logs, and the prediction accuracy of the models was tested via the correlation coefficient between ground truth and prediction maps. To address the possible inflation of results due to the equator bias, we conducted complementary analyses by restricting the area of investigation to the equatorial image regions. Although limited to immersive virtual environments, the promising performance of some traditional models such as GBVS360eq and BMS360eq for colored and textured architectural rendered spaces offers us the prospect of their possible integration into design tools. We also observed a strong correlation in head movements for the same space lit by different types of sky, a finding whose generalization requires further investigations based on datasets more specifically developed to address this question.</p

    Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction

    Get PDF
    Roboter finden aufgrund ihrer außergewöhnlichen Arbeitsleistung, Präzision, Effizienz und Skalierbarkeit immer mehr Verwendung in den verschiedensten Anwendungsbereichen. Diese Entwicklung wurde zusätzlich begünstigt durch Fortschritte in der Künstlichen Intelligenz (KI), insbesondere im Maschinellem Lernen (ML). Mit Hilfe moderner neuronaler Netze sind Roboter in der Lage, Objekte in ihrer Umgebung zu erkennen und mit ihnen zu interagieren. Ein erhebliches Manko besteht jedoch darin, dass das Training dieser Objekterkennungsmodelle, in aller Regel mit einer zugrundeliegenden Abhängig von umfangreichen Datensätzen und der Verfügbarkeit großer Datenmengen einhergeht. Dies ist insbesondere dann problematisch, wenn der konkrete Einsatzort des Roboters und die Umgebung, einschließlich der darin befindlichen Objekte, nicht im Voraus bekannt sind. Die breite und ständig wachsende Palette von Objekten macht es dabei praktisch unmöglich, das gesamte Spektrum an existierenden Objekten allein mit bereits zuvor erstellten Datensätzen vollständig abzudecken. Das Ziel dieser Dissertation war es, einem Roboter unbekannte Objekte mit Hilfe von Human-Robot Interaction (HRI) beizubringen, um ihn von seiner Abhängigkeit von Daten sowie den Einschränkungen durch vordefinierte Szenarien zu befreien. Die Synergie von Eye Tracking und Augmented Reality (AR) ermöglichte es dem als Lehrer fungierenden Menschen, mit dem Roboter zu kommunizieren und ihn mittels des menschlichen Blickes auf Objekte hinzuweisen. Dieser holistische Ansatz ermöglichte die Konzeption eines multimodalen HRI-Systems, durch das der Roboter Objekte identifizieren und dreidimensional segmentieren konnte, obwohl sie ihm zu diesem Zeitpunkt noch unbekannt waren, um sie anschließend aus unterschiedlichen Blickwinkeln eigenständig zu inspizieren. Anhand der Klasseninformationen, die ihm der Mensch mitteilte, war der Roboter daraufhin in der Lage, die entsprechenden Objekte zu erlernen und später wiederzuerkennen. Mit dem Wissen, das dem Roboter durch diesen auf HRI basierenden Lehrvorgang beigebracht worden war, war dessen Fähigkeit Objekte zu erkennen vergleichbar mit den Fähigkeiten modernster Objektdetektoren, die auf umfangreichen Datensätzen trainiert worden waren. Dabei war der Roboter jedoch nicht auf vordefinierte Klassen beschränkt, was seine Vielseitigkeit und Anpassungsfähigkeit unter Beweis stellte. Die im Rahmen dieser Dissertation durchgeführte Forschung leistete bedeutende Beiträge an der Schnittstelle von Machine Learning (ML), AR, Eye Tracking und Robotik. Diese Erkenntnisse tragen nicht nur zum besseren Verständnis der genannten Felder bei, sondern ebnen auch den Weg für weitere interdisziplinäre Forschung. Die in dieser Dissertation enthalten wissenschaftlichen Artikel wurden auf hochrangigen Konferenzen in den Bereichen Robotik, Eye Tracking und HRI veröffentlicht.Robots are becoming increasingly popular in a wide range of environments due to their exceptional work capacity, precision, efficiency, and scalability. This development has been further encouraged by advances in Artificial Intelligence (AI), particularly Machine Learning (ML). By employing sophisticated neural networks, robots are given the ability to detect and interact with objects in their vicinity. However, a significant drawback arises from the underlying dependency on extensive datasets and the availability of substantial amounts of training data for these object detection models. This issue becomes particularly problematic when the specific deployment location of the robot and the surroundings, including the objects within it, are not known in advance. The vast and ever-expanding array of objects makes it virtually impossible to comprehensively cover the entire spectrum of existing objects using preexisting datasets alone. The goal of this dissertation was to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) in order to liberate it from its data dependency, unleashing it from predefined scenarios. In this context, the combination of eye tracking and Augmented Reality (AR) created a powerful synergy that empowered the human teacher to seamlessly communicate with the robot and effortlessly point out objects by means of human gaze. This holistic approach led to the development of a multimodal HRI system that enabled the robot to identify and visually segment the Objects of Interest (OOIs) in three-dimensional space, even though they were initially unknown to it, and then examine them autonomously from different angles. Through the class information provided by the human, the robot was able to learn the objects and redetect them at a later stage. Due to the knowledge gained from this HRI based teaching process, the robot’s object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets, without being restricted to predefined classes, showcasing its versatility and adaptability. The research conducted within the scope of this dissertation made significant contributions at the intersection of ML, AR, eye tracking, and robotics. These findings not only enhance the understanding of these fields, but also pave the way for further interdisciplinary research. The scientific articles included in this dissertation have been published at high-impact conferences in the fields of robotics, eye tracking, and HRI

    Deep Networks Based Energy Models for Object Recognition from Multimodality Images

    Get PDF
    Object recognition has been extensively investigated in computer vision area, since it is a fundamental and essential technique in many important applications, such as robotics, auto-driving, automated manufacturing, and security surveillance. According to the selection criteria, object recognition mechanisms can be broadly categorized into object proposal and classification, eye fixation prediction and saliency object detection. Object proposal tends to capture all potential objects from natural images, and then classify them into predefined groups for image description and interpretation. For a given natural image, human perception is normally attracted to the most visually important regions/objects. Therefore, eye fixation prediction attempts to localize some interesting points or small regions according to human visual system (HVS). Based on these interesting points and small regions, saliency object detection algorithms propagate the important extracted information to achieve a refined segmentation of the whole salient objects. In addition to natural images, object recognition also plays a critical role in clinical practice. The informative insights of anatomy and function of human body obtained from multimodality biomedical images such as magnetic resonance imaging (MRI), transrectal ultrasound (TRUS), computed tomography (CT) and positron emission tomography (PET) facilitate the precision medicine. Automated object recognition from biomedical images empowers the non-invasive diagnosis and treatments via automated tissue segmentation, tumor detection and cancer staging. The conventional recognition methods normally utilize handcrafted features (such as oriented gradients, curvature, Haar features, Haralick texture features, Laws energy features, etc.) depending on the image modalities and object characteristics. It is challenging to have a general model for object recognition. Superior to handcrafted features, deep neural networks (DNN) can extract self-adaptive features corresponding with specific task, hence can be employed for general object recognition models. These DNN-features are adjusted semantically and cognitively by over tens of millions parameters corresponding to the mechanism of human brain, therefore leads to more accurate and robust results. Motivated by it, in this thesis, we proposed DNN-based energy models to recognize object on multimodality images. For the aim of object recognition, the major contributions of this thesis can be summarized below: 1. We firstly proposed a new comprehensive autoencoder model to recognize the position and shape of prostate from magnetic resonance images. Different from the most autoencoder-based methods, we focused on positive samples to train the model in which the extracted features all come from prostate. After that, an image energy minimization scheme was applied to further improve the recognition accuracy. The proposed model was compared with three classic classifiers (i.e. support vector machine with radial basis function kernel, random forest, and naive Bayes), and demonstrated significant superiority for prostate recognition on magnetic resonance images. We further extended the proposed autoencoder model for saliency object detection on natural images, and the experimental validation proved the accurate and robust saliency object detection results of our model. 2. A general multi-contexts combined deep neural networks (MCDN) model was then proposed for object recognition from natural images and biomedical images. Under one uniform framework, our model was performed in multi-scale manner. Our model was applied for saliency object detection from natural images as well as prostate recognition from magnetic resonance images. Our experimental validation demonstrated that the proposed model was competitive to current state-of-the-art methods. 3. We designed a novel saliency image energy to finely segment salient objects on basis of our MCDN model. The region priors were taken into account in the energy function to avoid trivial errors. Our method outperformed state-of-the-art algorithms on five benchmarking datasets. In the experiments, we also demonstrated that our proposed saliency image energy can boost the results of other conventional saliency detection methods

    A Conceptual Model of Exploration Wayfinding: An Integrated Theoretical Framework and Computational Methodology

    Get PDF
    This thesis is an attempt to integrate contending cognitive approaches to modeling wayfinding behavior. The primary goal is to create a plausible model for exploration tasks within indoor environments. This conceptual model can be extended for practical applications in the design, planning, and Social sciences. Using empirical evidence a cognitive schema is designed that accounts for perceptual and behavioral preferences in pedestrian navigation. Using this created schema, as a guiding framework, the use of network analysis and space syntax act as a computational methods to simulate human exploration wayfinding in unfamiliar indoor environments. The conceptual model provided is then implemented in two ways. First of which is by updating an existing agent-based modeling software directly. The second means of deploying the model is using a spatial interaction model that distributed visual attraction and movement permeability across a graph-representation of building floor plans

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos

    Get PDF
    Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated

    Materialising data experience through textile thinking

    Get PDF
    In our digitally enabled lives, we are constantly entering into relationships and interacting with data. With little regard for their technical ability, consumers are obliged to accept and live with data experience. Digital literacy and information technology skills help to navigate these technologies, but little is known about the intimate practices of interacting with data from digital systems. The aim of this practice-based research is to identify how knowledge and experience of physical materials can offer novel processes and value to progress communications regarding digital use. This study draws on theory and practice from textile design and sets out to position textiles as a research discipline. Embodied methods are used to explore human relationships with textiles and materials to create physical representations which can be used to generate and share insights from people’s varied engagements with technology and data. Findings are presented which are valuable to the fields of both textile design and the field of data physicalisation. The methods employed engage the human body as a research tool using the senses to explore and create meaningful experiences with technology. Material handling, modelmaking, workshops and sensory ethnography, all captured on film, facilitate an embodied approach to explore the experience of data. Through this approach, alternative readings of everyday technology emerge. The theoretical contribution of this thesis is the paradigm of textile thinking as a research methodology. Textile thinking refers to the actions and mindset of textile designers. In this study the tacit knowledge employed by textile designers is presented as a challenge for reporting on design activity and is responded to through the use of embodied methods for engaging materials in research. Textile thinking approaches are embodied in practical experiments which invited people to express how they engage emotionally in relationships with technology and data. The field of textile design is interrogated to identify the unique characteristics of the discipline which are valuable research tools. Practical research shows that physical data representations enhanced through material choice can be used as opportunities for engagement to examine how people connect emotionally with information. The findings showed that this methodology increased the likelihood of sharing insights into the use of digital products and could be used to elicit emotional responses to technology and data experience. The types of responses included childhood memories, sensations, individual insights into the comfort of technology and engagement with information. This broader understanding shows how this approach can be used by designers, to stimulate an interest in using data and to improve engagement with the digital world through design. The outcomes offer new references and broader perspectives for textile design as a research discipline, supporting a paradigm shift to explore and accept new ways of approaching research and developing design theory
    corecore