97 research outputs found

    On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection

    Get PDF
    It has become apparent that a Gaussian center bias can serve as an important prior for visual saliency detection, which has been demonstrated for predicting human eye fixations and salient object detection. Tseng et al. have shown that the photographer's tendency to place interesting objects in the center is a likely cause for the center bias of eye fixations. We investigate the influence of the photographer's center bias on salient object detection, extending our previous work. We show that the centroid locations of salient objects in photographs of Achanta and Liu's data set in fact correlate strongly with a Gaussian model. This is an important insight, because it provides an empirical motivation and justification for the integration of such a center bias in salient object detection algorithms and helps to understand why Gaussian models are so effective. To assess the influence of the center bias on salient object detection, we integrate an explicit Gaussian center bias model into two state-of-the-art salient object detection algorithms. This way, first, we quantify the influence of the Gaussian center bias on pixel- and segment-based salient object detection. Second, we improve the performance in terms of F1 score, Fb score, area under the recall-precision curve, area under the receiver operating characteristic curve, and hit-rate on the well-known data set by Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we exemplarily demonstrate that implicit center biases are partially responsible for the outstanding performance of state-of-the-art algorithms. Last but not least, as a result of debiasing Cheng et al.'s algorithm, we introduce a non-biased salient object detection method, which is of interest for applications in which the image data is not likely to have a photographer's center bias (e.g., image data of surveillance cameras or autonomous robots)

    Domain general learning: Infants use social and non-social cues when learning object statistics.

    Get PDF
    Previous research has shown that infants can learn from social cues. But is a social cue more effective at directing learning than a non-social cue? This study investigated whether 9-month-old infants (N = 55) could learn a visual statistical regularity in the presence of a distracting visual sequence when attention was directed by either a social cue (a person) or a non-social cue (a rectangle). The results show that both social and non-social cues can guide infants' attention to a visual shape sequence (and away from a distracting sequence). The social cue more effectively directed attention than the non-social cue during the familiarization phase, but the social cue did not result in significantly stronger learning than the non-social cue. The findings suggest that domain general attention mechanisms allow for the comparable learning seen in both conditions

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    Gaze control for visually guided manipulation

    Get PDF
    Human studies have shown that gaze shifts are mostly driven by the task. One explanation is that fixations gather information about task relevant properties, where task relevance is signalled by reward. This thesis pursues primarily an engineering science goal to determine what mechanisms a rational decision maker could employ to select a gaze location optimally, or near optimally, given limited information and limited computation time. To do so we formulate and characterise three computational models of gaze shifting (implemented on a simulated humanoid robot), which use lookahead to imagine the informational effects of possible gaze fixations. Our first model selects the gaze that most reduces uncertainty in the scene (Unc), the second maximises expected rewards by reducing uncertainty (Rew+Unc), and the third maximises the expected gain in cumulative reward by reducing uncertainty (Rew+Unc+Gain). We also present an integrated account of a visual search process into the Rew+Unc+Gain gaze scheme. Our secondary goal is concerned with the way in which humans might select the next gaze location. We compare the hand-eye coordination timings of our models to previously published human data, and we provide evidence that only the models that incorporate both uncertainty and reward (Rew+Unc and Rew+Unc+Gain) match human data

    Expressive social exchange between humans and robots

    Get PDF
    Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 253-264).Sociable humanoid robots are natural and intuitive for people to communicate with and to teach. We present recent advances in building an autonomous humanoid robot, Kismet, that can engage humans in expressive social interaction. We outline a set of design issues and a framework that we have found to be of particular importance for sociable robots. Having a human-in-the-loop places significant social constraints on how the robot aesthetically appears, how its sensors are configured, its quality of movement, and its behavior. Inspired by infant social development, psychology, ethology, and evolutionary perspectives, this work integrates theories and concepts from these diverse viewpoints to enable Kismet to enter into natural and intuitive social interaction with a human caregiver, reminiscent of parent-infant exchanges. Kismet perceives a variety of natural social cues from visual and auditory channels, and delivers social signals to people through gaze direction, facial expression, body posture, and vocalizations. We present the implementation of Kismet's social competencies and evaluate each with respect to: 1) the ability of naive subjects to read and interpret the robot's social cues, 2) the robot's ability to perceive and appropriately respond to naturally offered social cues, 3) the robot's ability to elicit interaction scenarios that afford rich learning potential, and 4) how this produces a rich, flexible, dynamic interaction that is physical, affective, and social. Numerous studies with naive human subjects are described that provide the data upon which we base our evaluations.by Cynthia L. Breazeal.Sc.D

    A Focus on Selection for Fixation

    Get PDF
    A computational explanation of how visual attention, interpretation of visual stimuli, and eye movements combine to produce visual behavior, seems elusive. Here, we focus on one component: how selection is accomplished for the next fixation. The popularity of saliency map models drives the inference that this is solved, but we argue otherwise. We provide arguments that a cluster of complementary, conspicuity representations drive selection, modulated by task goals and history, leading to a hybrid process that encompasses early and late attentional selection. This design is also constrained by the architectural characteristics of the visual processing pathways. These elements combine into a new strategy for computing fixation targets and a first simulation of its performance is presented. A sample video of this performance can be found by clicking on the "Supplementary Files" link under the "Article Tools" heading

    Evaluating the replicability and specificity of evidence for natural pedagogy theory

    Get PDF
    Do infants understand that they are being communicated to? This thesis first outlines issues facing the field of infancy research that affect confidence in the literature on this (and any) topic to date. Following this, an introductory chapter evaluates evidence for the three core claims of Natural Pedagogy (NP), and the compatibility of this evidence with alternative theories. This is followed by three experimental chapters. In Study 1, we attempted two replications of the study with the highest theoretical value for NP (Yoon et al., 2008). This study has high stakes theoretically, as it is the only study providing evidence for the most specific claim of NP that is difficult to explain by low-level mechanisms. Therefore, a replication of this result that included a reduction of possible confounds and a more sophisticated measure of attention throughout the task was of great theoretical value. In this study, we were unable to replicate the original findings. In Study 2 we went beyond the evidence for the claims made in the outline of NP, and instead generated a new, specific prediction that we believe NP would make. This is important, as theories are only useful if they can make clear, testable predictions. In this study, we pitted pedagogically demonstrated actions and simple actions against each other and evaluated infants’ transmission of these actions to someone else. We found no evidence for NP, finding evidence for preferential transmission of simple actions instead. In Study 3 we went beyond NP, and tested a clear prediction stemming from an alternative low-level theory for how infants develop gaze-following ability. We found evidence that infants learn to gaze-follow through reinforcement. Overall, this thesis contributes to the vast literature on infants as recipients of communication, as well as highlighting methods for conducting open and reproducible infancy research
    • …
    corecore