1,167 research outputs found

    Speaker segmentation and clustering

    Get PDF
    This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

    Tracking and modeling focus of attention in meetings [online]

    Get PDF
    Abstract This thesis addresses the problem of tracking the focus of attention of people. In particular, a system to track the focus of attention of participants in meetings is developed. Obtaining knowledge about a person\u27s focus of attention is an important step towards a better understanding of what people do, how and with what or whom they interact or to what they refer. In meetings, focus of attention can be used to disambiguate the addressees of speech acts, to analyze interaction and for indexing of meeting transcripts. Tracking a user\u27s focus of attention also greatly contributes to the improvement of human­computer interfaces since it can be used to build interfaces and environments that become aware of what the user is paying attention to or with what or whom he is interacting. The direction in which people look; i.e., their gaze, is closely related to their focus of attention. In this thesis, we estimate a subject\u27s focus of attention based on his or her head orientation. While the direction in which someone looks is determined by head orientation and eye gaze, relevant literature suggests that head orientation alone is a su#cient cue for the detection of someone\u27s direction of attention during social interaction. We present experimental results from a user study and from several recorded meetings that support this hypothesis. We have developed a Bayesian approach to model at whom or what someone is look­ ing based on his or her head orientation. To estimate head orientations in meetings, the participants\u27 faces are automatically tracked in the view of a panoramic camera and neural networks are used to estimate their head orientations from pre­processed images of their faces. Using this approach, the focus of attention target of subjects could be correctly identified during 73% of the time in a number of evaluation meet­ ings with four participants. In addition, we have investigated whether a person\u27s focus of attention can be pre­dicted from other cues. Our results show that focus of attention is correlated to who is speaking in a meeting and that it is possible to predict a person\u27s focus of attention based on the information of who is talking or was talking before a given moment. We have trained neural networks to predict at whom a person is looking, based on information about who was speaking. Using this approach we were able to predict who is looking at whom with 63% accuracy on the evaluation meetings using only information about who was speaking. We show that by using both head orientation and speaker information to estimate a person\u27s focus, the accuracy of focus detection can be improved compared to just using one of the modalities for focus estimation. To demonstrate the generality of our approach, we have built a prototype system to demonstrate focus­aware interaction with a household robot and other smart appliances in a room using the developed components for focus of attention tracking. In the demonstration environment, a subject could interact with a simulated household robot, a speech­enabled VCR or with other people in the room, and the recipient of the subject\u27s speech was disambiguated based on the user\u27s direction of attention. Zusammenfassung Die vorliegende Arbeit beschäftigt sich mit der automatischen Bestimmung und Ver­folgung des Aufmerksamkeitsfokus von Personen in Besprechungen. Die Bestimmung des Aufmerksamkeitsfokus von Personen ist zum Verständnis und zur automatischen Auswertung von Besprechungsprotokollen sehr wichtig. So kann damit beispielsweise herausgefunden werden, wer zu einem bestimmten Zeitpunkt wen angesprochen hat beziehungsweise wer wem zugehört hat. Die automatische Bestim­mung des Aufmerksamkeitsfokus kann desweiteren zur Verbesserung von Mensch-Maschine­Schnittstellen benutzt werden. Ein wichtiger Hinweis auf die Richtung, in welche eine Person ihre Aufmerksamkeit richtet, ist die Kopfstellung der Person. Daher wurde ein Verfahren zur Bestimmung der Kopfstellungen von Personen entwickelt. Hierzu wurden künstliche neuronale Netze benutzt, welche als Eingaben vorverarbeitete Bilder des Kopfes einer Person erhalten, und als Ausgabe eine Schätzung der Kopfstellung berechnen. Mit den trainierten Netzen wurde auf Bilddaten neuer Personen, also Personen, deren Bilder nicht in der Trainingsmenge enthalten waren, ein mittlerer Fehler von neun bis zehn Grad für die Bestimmung der horizontalen und vertikalen Kopfstellung erreicht. Desweiteren wird ein probabilistischer Ansatz zur Bestimmung von Aufmerksamkeits­zielen vorgestellt. Es wird hierbei ein Bayes\u27scher Ansatzes verwendet um die A­posterior iWahrscheinlichkeiten verschiedener Aufmerksamkteitsziele, gegeben beobachteter Kopfstellungen einer Person, zu bestimmen. Die entwickelten Ansätze wurden auf mehren Besprechungen mit vier bis fünf Teilnehmern evaluiert. Ein weiterer Beitrag dieser Arbeit ist die Untersuchung, inwieweit sich die Blickrich­tung der Besprechungsteilnehmer basierend darauf, wer gerade spricht, vorhersagen läßt. Es wurde ein Verfahren entwickelt um mit Hilfe von neuronalen Netzen den Fokus einer Person basierend auf einer kurzen Historie der Sprecherkonstellationen zu schätzen. Wir zeigen, dass durch Kombination der bildbasierten und der sprecherbasierten Schätzung des Aufmerksamkeitsfokus eine deutliche verbesserte Schätzung erreicht werden kann. Insgesamt wurde mit dieser Arbeit erstmals ein System vorgestellt um automatisch die Aufmerksamkeit von Personen in einem Besprechungsraum zu verfolgen. Die entwickelten Ansätze und Methoden können auch zur Bestimmung der Aufmerk­samkeit von Personen in anderen Bereichen, insbesondere zur Steuerung von comput­erisierten, interaktiven Umgebungen, verwendet werden. Dies wird an einer Beispielapplikation gezeigt

    A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version

    Get PDF
    During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective

    Towards Integration of Cognitive Models in Dialogue Management: Designing the Virtual Negotiation Coach Application

    Get PDF
    This paper presents an approach to flexible and adaptive dialogue management driven by cognitive modelling of human dialogue behaviour. Artificial intelligent agents, based on the ACT-R cognitive architecture, together with human actors are participating in a (meta)cognitive skills training within a negotiation scenario. The agent  employs instance-based learning to decide about its own actions and to reflect on the behaviour of the opponent. We show that task-related actions can be handled by a cognitive agent who is a plausible dialogue partner.  Separating task-related and dialogue control actions enables the application of sophisticated models along with a flexible architecture  in which  various alternative modelling methods can be combined. We evaluated the proposed approach with users assessing  the relative contribution of various factors to the overall usability of a dialogue system. Subjective perception of effectiveness, efficiency and satisfaction were correlated with various objective performance metrics, e.g. number of (in)appropriate system responses, recovery strategies, and interaction pace. It was observed that the dialogue system usability is determined most by the quality of agreements reached in terms of estimated Pareto optimality, by the user's negotiation strategies selected, and by the quality of system recognition, interpretation and responses. We compared human-human and human-agent performance with respect to the number and quality of agreements reached, estimated cooperativeness level, and frequency of accepted negative outcomes. Evaluation experiments showed promising, consistently positive results throughout the range of the relevant scales

    Proceedings of the 1st joint workshop on Smart Connected and Wearable Things 2016

    Get PDF
    These are the Proceedings of the 1st joint workshop on Smart Connected and Wearable Things (SCWT'2016, Co-located with IUI 2016). The SCWT workshop integrates the SmartObjects and IoWT workshops. It focusses on the advanced interactions with smart objects in the context of the Internet-of-Things (IoT), and on the increasing popularity of wearables as advanced means to facilitate such interactions
    • …
    corecore