2,545 research outputs found
A comparison of addressee detection methods for multiparty conversations
Several algorithms have recently been proposed for recognizing addressees in a group conversational setting. These algorithms can rely on a variety of factors including previous conversational roles, gaze and type of dialogue act. Both statistical supervised machine learning algorithms as well as rule based methods have been developed. In this paper, we compare several algorithms developed for several different genres of muliparty dialogue, and propose a new synthesis algorithm that matches the performance of machine learning algorithms while maintaning the transparancy of semantically meaningfull rule-based algorithms
Personalizing gesture recognition using hierarchical bayesian neural networks
Building robust classifiers trained on data susceptible to group or subject-specific variations is a challenging pattern recognition problem. We develop hierarchical Bayesian neural networks to capture subject-specific variations and share statistical strength across subjects. Leveraging recent work on learning Bayesian neural networks, we build fast, scalable algorithms for inferring the posterior distribution over all network weights in the hierarchy. We also develop methods for adapting our model to new subjects when a small number of subject-specific personalization data is available. Finally, we investigate active learning algorithms for interactively labeling personalization data in resource-constrained scenarios. Focusing on the problem of gesture recognition where inter-subject variations are commonplace, we demonstrate the effectiveness of our proposed techniques. We test our framework on three widely used gesture recognition datasets, achieving personalization performance competitive with the state-of-the-art.http://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlhttp://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlhttp://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlPublished versio
Adaptive Bayesian networks for video processing
ABSTRACT Due to its static nature, the inference capability of Bayesian Networks (BNs) oflen deteriorates when the basis of input data varies, especially in video processing applications where the environment often changes constantly. This paper presents an adaptive BN where the network parameters are adjusted in accordance to input variations. An efficient re-training method is introduced for updating the parameters and the proposed network is applied to shadow removal in video sequence processing with quantitative results demonstrating the significance of adapting the network with environmental changes
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht
In der vorliegenden Dissertation wird ein zweistufiger Ansatz zur Sprecherklassifikation am Beispiel Alter und Geschlecht vorgestellt. Dazu werden zunächst die Ergebnisse umfangreicher Korpusanalysen präsentiert, die als Referenzbasis humanwissenschaftlicher Studien geeignet sind. Es wird gezeigt, dass die Modelle, die mithilfe dieser Daten trainiert wurden, in der Lage sind, die genannten Sprechereigenschaften mit einer Genauigkeit zu erkennen, die teilweise das FĂźnffache des jeweiligen Zufallsniveaus beträgt. DarĂźber hinaus zeichnet sich der vorgestellte Ansatz vor allen Dingen durch die so genannte Zweite Ebene aus, auf der mithilfe von Dynamischen Bayesschen Netzen eine Fusion multipler Klassifikationsergebnisse unter BerĂźcksichtigung des auditiven Kontextes erfolgt. In der Arbeit wird auĂerdem ein konkretes Sprecherklassifikationssystem beschrieben, welches fĂźr das Anwendungsszenario von mobilen, sprachbasierten Dialogsystemen entwickelt worden ist.This dissertation describes a two-layered speaker classification approach on the example of age and gender. First of all, the results of comprehensive corpus analyses are presented that are suitable to serve as a reference basis for further studies in human sciences. It is showed, that the models which are trained using these data are able to recognize the above mentioned characteristics with an accuracy that is up to five times better than the respective chance level. In addition, the presented approach distinguishes itself by the so called Second Layer, on which a context sensitive fusion of multiple classification results is accomplished using Dynamic Bayesian Networks. The dissertation also describes a concrete speaker classification system which was developed for the application scenario of mobile spoken dialog systems
- âŚ