8 research outputs found

    On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection

    Get PDF
    It has become apparent that a Gaussian center bias can serve as an important prior for visual saliency detection, which has been demonstrated for predicting human eye fixations and salient object detection. Tseng et al. have shown that the photographer's tendency to place interesting objects in the center is a likely cause for the center bias of eye fixations. We investigate the influence of the photographer's center bias on salient object detection, extending our previous work. We show that the centroid locations of salient objects in photographs of Achanta and Liu's data set in fact correlate strongly with a Gaussian model. This is an important insight, because it provides an empirical motivation and justification for the integration of such a center bias in salient object detection algorithms and helps to understand why Gaussian models are so effective. To assess the influence of the center bias on salient object detection, we integrate an explicit Gaussian center bias model into two state-of-the-art salient object detection algorithms. This way, first, we quantify the influence of the Gaussian center bias on pixel- and segment-based salient object detection. Second, we improve the performance in terms of F1 score, Fb score, area under the recall-precision curve, area under the receiver operating characteristic curve, and hit-rate on the well-known data set by Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we exemplarily demonstrate that implicit center biases are partially responsible for the outstanding performance of state-of-the-art algorithms. Last but not least, as a result of debiasing Cheng et al.'s algorithm, we introduce a non-biased salient object detection method, which is of interest for applications in which the image data is not likely to have a photographer's center bias (e.g., image data of surveillance cameras or autonomous robots)

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    Computational Models of Perceptual Organization and Bottom-up Attention in Visual and Audio-Visual Environments

    Get PDF
    Figure Ground Organization (FGO) - inferring spatial depth ordering of objects in a visual scene - involves determining which side of an occlusion boundary (OB) is figure (closer to the observer) and which is ground (further away from the observer). Attention, the process that governs how only some part of sensory information is selected for further analysis based on behavioral relevance, can be exogenous, driven by stimulus properties such as an abrupt sound or a bright flash, the processing of which is purely bottom-up; or endogenous (goal-driven or voluntary), where top-down factors such as familiarity, aesthetic quality, etc., determine attentional selection. The two main objectives of this thesis are developing computational models of: (i) FGO in visual environments; (ii) bottom-up attention in audio-visual environments. In the visual domain, we first identify Spectral Anisotropy (SA), characterized by anisotropic distribution of oriented high frequency spectral power on the figure side and lack of it on the ground side, as a novel FGO cue, that can determine Figure/Ground (FG) relations at an OB with an accuracy exceeding 60%. Next, we show a non-linear Support Vector Machine based classifier trained on the SA features achieves an accuracy close to 70% in determining FG relations, the highest for a stand-alone local cue. We then show SA can be computed in a biologically plausible manner by pooling the Complex cell responses of different scales in a specific orientation, which also achieves an accuracy greater than or equal to 60% in determining FG relations. Next, we present a biologically motivated, feed forward model of FGO incorporating convexity, surroundedness, parallelism as global cues and SA, T-junctions as local cues, where SA is computed in a biologically plausible manner. Each local cue, when added alone, gives statistically significant improvement in the model's performance. The model with both local cues achieves higher accuracy than those of models with individual cues in determining FG relations, indicating SA and T-Junctions are not mutually contradictory. Compared to the model with no local cues, the model with both local cues achieves greater than or equal to 8.78% improvement in determining FG relations at every border location of images in the BSDS dataset. In the audio-visual domain, first we build a simple computational model to explain how visual search can be aided by providing concurrent, co-spatial auditory cues. Our model shows that adding a co-spatial, concurrent auditory cue can enhance the saliency of a weakly visible target among prominent visual distractors, the behavioral effect of which could be faster reaction time and/or better search accuracy. Lastly, a bottom-up, feed-forward, proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes is presented. We demonstrate that the performance of proto-object based AVSM in detecting and localizing salient objects/events is in agreement with human judgment. In addition, we show the AVSM computed as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps

    Multi-modal and multi-camera attention in smart environments

    No full text
    This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating different modalities, we extend classical 2D saliency maps to multi-camera and multi-modal 3D saliency spaces. For experimental validation we realized the proposed system within a smart environment. The evaluation took place for a demanding setup under real-life conditions, including focus of attention selection for multiple subjects and concurrently active modalities

    Videobasierte Gestenerkennung in einer intelligenten Umgebung

    Get PDF
    Diese Dissertation umfasst die Konzeption einer berĂŒhrungslosen und nutzerunabhĂ€ngigen visuellen Klassifikation von Armgesten anhand ihrer rĂ€umlich-zeitlichen Bewegungsmuster mit Methoden der Computer Vision, der Mustererkennung und des maschinellen Lernens. Das Anwendungsszenario ist hierbei ein intelligenter Konferenzraum, der mit mehreren handelsĂŒblichen Kameras ausgerĂŒstet ist. Dieses Szenario stellt aus drei GrĂŒnden eine besondere Herausforderung dar: FĂŒr eine möglichst intuitive Interaktion ist es erstens notwendig, die Erkennung unabhĂ€ngig von der Position und Orientierung des Nutzers im Raum zu realisieren. Somit werden vereinfachende Annahmen bezĂŒglich der relativen Positionen von Nutzer und Kamera weitgehend ausgeschlossen. Zweitens wird ein realistisches Innenraumszenario betrachtet, bei dem sich die Umgebungsbedingungen abrupt Ă€ndern können und sehr unterschiedliche Blickwinkel der Kameras auftreten. Das erfordert die Entwicklung adaptiver Methoden, die sich schnell an derartige Änderungen anpassen können bzw. in weiten Grenzen dagegen robust sind. Drittens stellt die Verwendung eines nicht synchronisierten Multikamerasystems eine Neuerung dar, die dazu fĂŒhrt, dass wĂ€hrend der 3D-Rekonstruktion der Hypothesen aus verschiedenen Kamerabildern besonderes Augenmerk auf den Umgang mit dem auftretenden zeitlichen Versatz gelegt werden muss. Dies hat auch Folgen fĂŒr die Klassifikationsaufgabe, weil in den rekonstruierten 3D-Trajektorien mit entsprechenden Ungenauigkeiten zu rechnen ist. Ein wichtiges Kriterium fĂŒr die Akzeptanz einer gestenbasierten Mensch-Maschine-Schnittstelle ist ihre ReaktivitĂ€t. Daher wird bei der Konzeption besonderes Augenmerk auf die effiziente Umsetzbarkeit der gewĂ€hlten Methoden gelegt. Insbesondere wird eine parallele Verarbeitungsstruktur realisiert, in der die verschiedenen Kameradatenströme getrennt verarbeitet und die Einzelergebnisse anschließend kombiniert werden. Im Rahmen der Dissertation wurde die komplette Bildverarbeitungspipeline prototypisch realisiert. Sie umfasst unter anderem die Schritte Personendetektion, Personentracking, Handdetektion, 3D-Rekonstruktion der Hypothesen und Klassifikation der rĂ€umlich-zeitlichen Gestentrajektorien mit semikontinuierlichen Hidden Markov Modellen (HMM). Die realisierten Methoden werden anhand realistischer, anspruchsvoller DatensĂ€tze ausfĂŒhrlich evaluiert. Dabei werden sowohl fĂŒr die Personen- als auch fĂŒr die Handdetektion sehr gute Ergebnisse erzielt. Die Gestenklassifikation erreicht Klassifikationsraten von annĂ€hernd 90% fĂŒr neun verschiedene Gesten

    Organic User Interfaces for InteractiveInterior Design

    Get PDF
    PhD ThesisOrganic User Interfaces (OUIs) are flexible, actuated, digital interfaces characterized by being aesthetically pleasing, physically manipulated and ubiquitously embedded within real-world environments. I postulate that OUIs have specific qualities that offer great potential to realize the vision of smart spaces and ubiquitous computing environments. This thesis makes the case for embedding OUI interaction into architectural spaces, interior elements and decorative artefacts using smart materials – a concept I term ‘OUI Interiors’. Through this thesis, I investigate: 1) What interactive materials and making techniques can be used to design and build OUIs? 2) What OUI decorative artefacts and interior elements can we create? and 3) What can we learn for design by situating OUI interiors? These key research questions form the basis of this PhD and guide all stages of inquiry, analysis, and reporting. Grounded by the state-of-the-art of Interactive Interiors in both research and practice, I developed new techniques of seamlessly embedding smart materials into interior finishing materials via research through design exploration (in the form of a Swatchbook). I also prototyped a number of interactive decorative objects that change shape and colour as a form of organicactuation, in response to seamless soft-sensing (presented in a Product Catalogue). These inspirational artefacts include table-runners, wall-art, pattern-changing wall-tiles, furry-throw, vase, cushion and matching painting, rug, objets d’art and tasselled curtain. Moreover, my situated studies of how people interact idiosyncratically with interactive decorative objects provide insights and reflections on the overall material experience. Through multi-disciplinary collaboration, I have also put these materials in the hands of designers to realize the potentials and limitations of such a paradigm and design three interactive spaces. The results of my research are materialized in a tangible outcome (a Manifesto) exploring design opportunities of OUI Interior Design, and critically considering new aesthetic possibilities
    corecore