88,054 research outputs found

    Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

    Full text link
    A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.Comment: 7 pages, 8 figure

    Presenting the networked home: a content analysis of promotion material of Ambient Intelligence applications

    Get PDF
    Ambient Intelligence (AmI) for the home uses information and communication technologies to make users’ everyday life more comfortable. AmI is still in its developmental phase and is headed towards the first stages of diffusion. \ud Characteristics of AmI design can be observed, among others, in the promotion material of initial producers. A literature study revealed that AmI originally envisioned a central role for the user, convenience that AmI offers them and that attention should be paid to critical policy issues such as privacy and a potential loss of freedom. A content analysis of current promotion material of several high-tech companies revealed that these original ideas are not all reflected in the material. Attributes which were used most in the promotion material were ‘connectedness’, ‘control’, ‘easiness’ and ‘personalization’. An analysis of the pictures in the promotion material showed that almost half of the pictures contained no humans but appliances. These results only partly correspond to the original vision on AmI, since the emphasis is now on technology. The results represent a serious problem, since both users, as well as critical policy issues are underexposed in the current promotion material

    Speech-Gesture Mapping and Engagement Evaluation in Human Robot Interaction

    Full text link
    A robot needs contextual awareness, effective speech production and complementing non-verbal gestures for successful communication in society. In this paper, we present our end-to-end system that tries to enhance the effectiveness of non-verbal gestures. For achieving this, we identified prominently used gestures in performances by TED speakers and mapped them to their corresponding speech context and modulated speech based upon the attention of the listener. The proposed method utilized Convolutional Pose Machine [4] to detect the human gesture. Dominant gestures of TED speakers were used for learning the gesture-to-speech mapping. The speeches by them were used for training the model. We also evaluated the engagement of the robot with people by conducting a social survey. The effectiveness of the performance was monitored by the robot and it self-improvised its speech pattern on the basis of the attention level of the audience, which was calculated using visual feedback from the camera. The effectiveness of interaction as well as the decisions made during improvisation was further evaluated based on the head-pose detection and interaction survey.Comment: 8 pages, 9 figures, Under review in IRC 201

    Hyperpresent avatars

    Get PDF
    This paper will discuss two student projects, which were developed during a hybrid course between art/design and computer sciences at Sabancı University; both of which involve the creation of two avatars whose visual attributes are determined by data feeds from ‘Real Life’ sources by following up from Biocca's concept of the Cyborg’s Dilemma, we will describe the creative and technological processes which went into the materialization of these two avatars

    Semi-Supervised Speech Emotion Recognition with Ladder Networks

    Full text link
    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture
    • …
    corecore