5 research outputs found

    Towards an Embodied Developing Vision System

    No full text
    Many cognitive scientists now agree that artificial cognition might be probably achieved developmentally, starting from a set of basic-level premature capabilities and incrementally self-extending itself with experience through discrete or continuous stages bred with experience. Although we are still far from seeing an artificial full-fledged self-extending cognitive system, the literature has provided promising examples and demonstrations. Nonetheless, not much thought is given to the modeling of how an artificial vision system, an important part of a developing cognitive system, can develop itself in a similar manner. In this article, we dwell upon the issue of a developing vision system, the relevant problems and possible solutions whenever possible

    Learning affordances for categorizing objects and their properties

    No full text
    In this paper, we demonstrate that simple interactions with objects in the environment leads to a manifestation of the perceptual properties of objects. This is achieved by deriving a condensed representation of the effects of actions (called effect prototypes in the paper), and investigating the relevance between perceptual features extracted from the objects and the actions that can be applied to them. With this at hand, we show that the agent can categorize (i.e., partition) its raw sensory perceptual feature vector, extracted from the environment, which is an important step for development of concepts and language. Moreover, after learning how to predict the effect prototypes of objects, the agent can categorize objects based on the predicted effects of actions that can be applied on them

    Unsupervised Learning of Affordance Relations on a Humanoid Robot

    No full text
    In this paper, we study how the concepts learned by a robot can be linked to verbal concepts that humans use in language. Specifically, we develop a simple tapping behaviour on the iCub humanoid robot simulator and allow the robot to interact with a set of objects of different types and sizes to learn affordance relations in its environment. The robot records its perception, obtained from a range camera, as a feature vector, before and after applying tapping on an object. We compute effect features by subtracting initial features from final features. We cluster the effect features using Kohonen self-organizing maps to generate a set of effect categories in an unsupervised fashion. We analyze the clusters using the types and sizes of objects that fall into the effect clusters, as well as the success/fail labels manually attached to the interactions. The hand labellings and the clusters formed by robot are found to match. We conjecture that this leads to the interpretation that the robot and humans share the same "effect concepts" which could be used in human-robot communication, for example as verbs. Furthermore, we use ReliefF feature extraction method to determine the initial features that are related to clustered effects and train a multi-class support vector machine (SVM) classifier to learn the mapping between the relevant initial features and the effect categories. The results show that, 1) despite the lack of supervision, the effect clusters tend to be homogeneous in terms of success/fail, 2) the relevant features consist mainly of shape, but not size, 3) the number of relevant features remains approximately constant with respect to the number of effect clusters formed, and 4) the SVM classifier can successfully learn the effect categories using the relevant features

    Detecting image communities

    No full text
    In this work, we propose a novel community detection method that is specifically designed for image communities. We define image community as a coherent subgroup of images within a large set of images. In order to detect image communities, we construct an image graph by utilizing visual affinity between each image pair and then prune most of the links. Instead of affinity values, we prefer ranking of neighboring images and get rid of range mismatch of affinity values. The resulting directed graph is processed to detect the image communities by using the proposed deterministic method. The proposed method is compared against state-of-the-art community detection methods that can operate on directed graphs. In the experiments, we use various sets of images for which ground truths are determined manually. The results indicate that our method significantly outperforms the compared state-of-the-art methods. Furthermore, the proposed method appears to have a consistent performance between sets unlike the compared methods. We believe that the proposed community detection method can be successfully utilized in many different applications

    Multimodal concept detection in broadcast media: KavTan

    No full text
    Concept detection stands as an important problem for efficient indexing and retrieval in large video archives. In this work, the KavTan System, which performs high-level semantic classification in one of the largest TV archives of Turkey, is presented. In this system, concept detection is performed using generalized visual and audio concept detection modules that are supported by video text detection, audio keyword spotting and specialized audio-visual semantic detection components. The performance of the presented framework was assessed objectively over a wide range of semantic concepts (5 high-level, 14 visual, 9 audio, 2 supplementary) by using a significant amount of precisely labeled ground truth data. KavTan System achieves successful high-level concept detection performance in unconstrained TV broadcast by efficiently utilizing multimodal information that is systematically extracted from both spatial and temporal extent of multimedia data
    corecore