4,168 research outputs found

    Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook

    Full text link
    In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.Comment: To appear in Expert Systems with Applications. Accompanying INTERSPEECH 2022 Tutorial on the same topic. Including latest advancements in large language models (LLMs

    View recommendation for multi-camera demonstration-based training

    Get PDF
    While humans can effortlessly pick a view from multiple streams, automatically choosing the best view is a challenge. Choosing the best view from multi-camera streams poses a problem regarding which objective metrics should be considered. Existing works on view selection lack consensus about which metrics should be considered to select the best view. The literature on view selection describes diverse possible metrics. And strategies such as information-theoretic, instructional design, or aesthetics-motivated fail to incorporate all approaches. In this work, we postulate a strategy incorporating information-theoretic and instructional design-based objective metrics to select the best view from a set of views. Traditionally, information-theoretic measures have been used to find the goodness of a view, such as in 3D rendering. We adapted a similar measure known as the viewpoint entropy for real-world 2D images. Additionally, we incorporated similarity penalization to get a more accurate measure of the entropy of a view, which is one of the metrics for the best view selection. Since the choice of the best view is domain-dependent, we chose demonstration-based training scenarios as our use case. The limitation of our chosen scenarios is that they do not include collaborative training and solely feature a single trainer. To incorporate instructional design considerations, we included the trainer’s body pose, face, face when instructing, and hands visibility as metrics. To incorporate domain knowledge we included predetermined regions’ visibility as another metric. All of those metrics are taken into account to produce a parameterized view recommendation approach for demonstration-based training. An online study using recorded multi-camera video streams from a simulation environment was used to validate those metrics. Furthermore, the responses from the online study were used to optimize the view recommendation performance with a normalized discounted cumulative gain (NDCG) value of 0.912, which shows good performance with respect to matching user choices

    Investigating Fluidity for Human-Robot Interaction with Real-Time, Real-World Grounding Strategies

    Get PDF
    Hough J, Schlangen D. Investigating Fluidity for Human-Robot Interaction with Real-Time, Real-World Grounding Strategies. In: Proceedings of the 17th Annual SIGdial Meeting on Discourse and Dialogue. 2016

    Symbiotic interaction between humans and robot swarms

    Get PDF
    Comprising of a potentially large team of autonomous cooperative robots locally interacting and communicating with each other, robot swarms provide a natural diversity of parallel and distributed functionalities, high flexibility, potential for redundancy, and fault-tolerance. The use of autonomous mobile robots is expected to increase in the future and swarm robotic systems are envisioned to play important roles in tasks such as: search and rescue (SAR) missions, transportation of objects, surveillance, and reconnaissance operations. To robustly deploy robot swarms on the field with humans, this research addresses the fundamental problems in the relatively new field of human-swarm interaction (HSI). Four groups of core classes of problems have been addressed for proximal interaction between humans and robot swarms: interaction and communication; swarm-level sensing and classification; swarm coordination; swarm-level learning. The primary contribution of this research aims to develop a bidirectional human-swarm communication system for non-verbal interaction between humans and heterogeneous robot swarms. The guiding field of application are SAR missions. The core challenges and issues in HSI include: How can human operators interact and communicate with robot swarms? Which interaction modalities can be used by humans? How can human operators instruct and command robots from a swarm? Which mechanisms can be used by robot swarms to convey feedback to human operators? Which type of feedback can swarms convey to humans? In this research, to start answering these questions, hand gestures have been chosen as the interaction modality for humans, since gestures are simple to use, easily recognized, and possess spatial-addressing properties. To facilitate bidirectional interaction and communication, a dialogue-based interaction system is introduced which consists of: (i) a grammar-based gesture language with a vocabulary of non-verbal commands that allows humans to efficiently provide mission instructions to swarms, and (ii) a swarm coordinated multi-modal feedback language that enables robot swarms to robustly convey swarm-level decisions, status, and intentions to humans using multiple individual and group modalities. The gesture language allows humans to: select and address single and multiple robots from a swarm, provide commands to perform tasks, specify spatial directions and application-specific parameters, and build iconic grammar-based sentences by combining individual gesture commands. Swarms convey different types of multi-modal feedback to humans using on-board lights, sounds, and locally coordinated robot movements. The swarm-to-human feedback: conveys to humans the swarm's understanding of the recognized commands, allows swarms to assess their decisions (i.e., to correct mistakes: made by humans in providing instructions, and errors made by swarms in recognizing commands), and guides humans through the interaction process. The second contribution of this research addresses swarm-level sensing and classification: How can robot swarms collectively sense and recognize hand gestures given as visual signals by humans? Distributed sensing, cooperative recognition, and decision-making mechanisms have been developed to allow robot swarms to collectively recognize visual instructions and commands given by humans in the form of gestures. These mechanisms rely on decentralized data fusion strategies and multi-hop messaging passing algorithms to robustly build swarm-level consensus decisions. Measures have been introduced in the cooperative recognition protocol which provide a trade-off between the accuracy of swarm-level consensus decisions and the time taken to build swarm decisions. The third contribution of this research addresses swarm-level cooperation: How can humans select spatially distributed robots from a swarm and the robots understand that they have been selected? How can robot swarms be spatially deployed for proximal interaction with humans? With the introduction of spatially-addressed instructions (pointing gestures) humans can robustly address and select spatially- situated individuals and groups of robots from a swarm. A cascaded classification scheme is adopted in which, first the robot swarm identifies the selection command (e.g., individual or group selection), and then the robots coordinate with each other to identify if they have been selected. To obtain better views of gestures issued by humans, distributed mobility strategies have been introduced for the coordinated deployment of heterogeneous robot swarms (i.e., ground and flying robots) and to reshape the spatial distribution of swarms. The fourth contribution of this research addresses the notion of collective learning in robot swarms. The questions that are answered include: How can robot swarms learn about the hand gestures given by human operators? How can humans be included in the loop of swarm learning? How can robot swarms cooperatively learn as a team? Online incremental learning algorithms have been developed which allow robot swarms to learn individual gestures and grammar-based gesture sentences supervised by human instructors in real-time. Humans provide different types of feedback (i.e., full or partial feedback) to swarms for improving swarm-level learning. To speed up the learning rate of robot swarms, cooperative learning strategies have been introduced which enable individual robots in a swarm to intelligently select locally sensed information and share (exchange) selected information with other robots in the swarm. The final contribution is a systemic one, it aims on building a complete HSI system towards potential use in real-world applications, by integrating the algorithms, techniques, mechanisms, and strategies discussed in the contributions above. The effectiveness of the global HSI system is demonstrated in the context of a number of interactive scenarios using emulation tests (i.e., performing simulations using gesture images acquired by a heterogeneous robotic swarm) and by performing experiments with real robots using both ground and flying robots

    The significance of silence. Long gaps attenuate the preference for ‘yes’ responses in conversation.

    Get PDF
    In conversation, negative responses to invitations, requests, offers and the like more often occur with a delay – conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when given relatively fast (300 ms) or delayed (1000 ms). Participants heard minidialogues, with turns extracted from a spoken corpus, while having their EEG recorded. We find that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’, however this contrast is not present for delayed responses. This shows that an immediate response is expected to be positive – but this expectation disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. Additionally, however, 'No' responses elicit a late frontal positivity both when they are fast and when they are delayed. Thus, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred and may require an account. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, as an immediate response
    • …
    corecore