9,360 research outputs found

    A Voice and Pointing Gesture Interaction System for Supporting Human Spontaneous Decisions in Autonomous Cars

    Get PDF
    Autonomous cars are expected to improve road safety, traffic and mobility. It is projected that in the next 20-30 years fully autonomous vehicles will be on the market. The advancement on the research and development of this technology will allow the disengagement of humans from the driving task, which will be responsibility of the vehicle intelligence. In this scenario new vehicle interior designs are proposed, enabling more flexible human vehicle interactions inside them. In addition, as some important stakeholders propose, control elements such as the steering wheel and accelerator and brake pedals may not be needed any longer. However, this user control disengagement is one of the main issues related with the user acceptance of this technology. Users do not seem to be comfortable with the idea of giving all the decision power to the vehicle. In addition, there can be location awareness situations where the user makes a spontaneous decision and requires some type of vehicle control. Such is the case of stopping at a particular point of interest or taking a detour in the pre-calculated autonomous route of the car. Vehicle manufacturers\u27 maintain the steering wheel as a control element, allowing the driver to take over the vehicle if needed or wanted. This causes a constraint in the previously mentioned human vehicle interaction flexibility. Thus, there is an unsolved dilemma between providing users enough control over the autonomous vehicle and route so they can make spontaneous decision, and interaction flexibility inside the car. This dissertation proposes the use of a voice and pointing gesture human vehicle interaction system to solve this dilemma. Voice and pointing gestures have been identified as natural interaction techniques to guide and command mobile robots, potentially providing the needed user control over the car. On the other hand, they can be executed anywhere inside the vehicle, enabling interaction flexibility. The objective of this dissertation is to provide a strategy to support this system. For this, a method based on pointing rays intersections for the computation of the point of interest (POI) that the user is pointing to is developed. Simulation results show that this POI computation method outperforms the traditional ray-casting based by 76.5% in cluttered environments and 36.25% in combined cluttered and non-cluttered scenarios. The whole system is developed and demonstrated using a robotics simulator framework. The simulations show how voice and pointing commands performed by the user update the predefined autonomous path, based on the recognized command semantics. In addition, a dialog feedback strategy is proposed to solve conflicting situations such as ambiguity in the POI identification. This additional step is able to solve all the previously mentioned POI computation inaccuracies. In addition, it allows the user to confirm, correct or reject the performed commands in case the system misunderstands them

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

    No full text
    In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

    CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

    Get PDF
    Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules

    Multimodal MP3 Jukebox

    Get PDF
    This report, prepared for the Worcester Polytechnic Institute, describes the design, testing, and analysis of a voice- and button-controlled MP3 Jukebox for automotive applications. This project, sponsored by the Bose Corporation, incorporated modern software components and complete interaction features. An inexpensive, high-function driving simulation system was developed and used in conjunction with the Peripheral Detection Task (PDT) to measure driver distraction. As tested with four subjects, Jukebox interaction increased the overall median reaction time by 133 milliseconds
    corecore