54,915 research outputs found

    Toward emergent representations for video

    Full text link

    EC^2: Emergent Communication for Embodied Control

    Full text link
    Embodied control requires agents to leverage multi-modal pre-training to quickly learn how to act in new environments, where video demonstrations contain visual and motion details needed for low-level perception and control, and language instructions support generalization with abstract, symbolic structures. While recent approaches apply contrastive learning to force alignment between the two modalities, we hypothesize better modeling their complementary differences can lead to more holistic representations for downstream adaption. To this end, we propose Emergent Communication for Embodied Control (EC^2), a novel scheme to pre-train video-language representations for few-shot embodied control. The key idea is to learn an unsupervised "language" of videos via emergent communication, which bridges the semantics of video details and structures of natural language. We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control. Through extensive experiments in Metaworld and Franka Kitchen embodied benchmarks, EC^2 is shown to consistently outperform previous contrastive learning methods for both videos and texts as task inputs. Further ablations confirm the importance of the emergent language, which is beneficial for both video and language learning, and significantly superior to using pre-trained video captions. We also present a quantitative and qualitative analysis of the emergent language and discuss future directions toward better understanding and leveraging emergent communication in embodied tasks.Comment: Published in CVPR202

    Design Fiction Diegetic Prototyping: A Research Framework for Visualizing Service Innovations

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Purpose: This paper presents a design fiction diegetic prototyping methodology and research framework for investigating service innovations that reflect future uses of new and emerging technologies. Design/methodology/approach: Drawing on speculative fiction, we propose a methodology that positions service innovations within a six-stage research development framework. We begin by reviewing and critiquing designerly approaches that have traditionally been associated with service innovations and futures literature. In presenting our framework, we provide an example of its application to the Internet of Things (IoT), illustrating the central tenets proposed and key issues identified. Findings: The research framework advances a methodology for visualizing future experiential service innovations, considering how realism may be integrated into a designerly approach. Research limitations/implications: Design fiction diegetic prototyping enables researchers to express a range of ‘what if’ or ‘what can it be’ research questions within service innovation contexts. However, the process encompasses degrees of subjectivity and relies on knowledge, judgment and projection. Practical implications: The paper presents an approach to devising future service scenarios incorporating new and emergent technologies in service contexts. The proposed framework may be used as part of a range of research designs, including qualitative, quantitative and mixed method investigations. Originality: Operationalizing an approach that generates and visualizes service futures from an experiential perspective contributes to the advancement of techniques that enables the exploration of new possibilities for service innovation research

    Emerging Linguistic Functions in Early Infancy

    Get PDF
    This paper presents results from experimental studies on early language acquisition in infants and attempts to interpret the experimental results within the framework of the Ecological Theory of Language Acquisition (ETLA) recently proposed by (Lacerda et al., 2004a). From this perspective, the infant’s first steps in the acquisition of the ambient language are seen as a consequence of the infant’s general capacity to represent sensory input and the infant’s interaction with other actors in its immediate ecological environment. On the basis of available experimental evidence, it will be argued that ETLA offers a productive alternative to traditional descriptive views of the language acquisition process by presenting an operative model of how early linguistic function may emerge through interaction

    Supporting reinterpretation in computer-aided conceptual design

    Get PDF
    This paper presents research that aims to inform the development of computational tools that better support design exploration and idea transformation - key objectives in conceptual design. Analyses of experimental data from two fields - product design and architecture - suggest that the interactions of designers with their sketches can be formalised according to a finite number of generalised shape rules defined within a shape grammar. Such rules can provide a basis for the generation of alternative design concepts and they have informed the development of a prototype shape synthesis system that supports dynamic reinterpretation of shapes in design activity. The notion of 'sub-shapes' is introduced and the significance of these to perception, recognition and the development of emergent structures is discussed. The paper concludes with some speculation on how such a system might find application in a range of design fields

    Deep Visual Foresight for Planning Robot Motion

    Full text link
    A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation -- pushing objects -- and can handle novel objects not seen during training.Comment: ICRA 2017. Supplementary video: https://sites.google.com/site/robotforesight
    • 

    corecore