29 research outputs found

    Improving Semantic Concept Detection And Retrieval Using Contextual Estimates

    No full text
    In this paper we introduce a novel contextual fusion method to improve the detection scores of semantic concepts in images and videos. Our method consists of three phases. For each individual concept, the prior probability of the concept is incorporated with detection score of an individual SVM detector. Then probabilistic estimates of the target concept are computed using all of the individual SVM detectors. Finally, these estimates are linearly combined using weights learned from the training set. This procedure is applied to each target concept individually. We show significant improvements to our detection scores on the TRECVID 2005 development set and LSCOM-Lite annotation set. We achieved on average +3.9% improvements in 29 out of 39 concepts. © 2007 IEEE

    Plastic behaviour of Fe-Mn binary alloys

    No full text
    UZUN, ORHAN/0000-0001-7586-9075;WOS: 000229726700008The plastic deformation behaviour of binary Fe-Mn alloys with Mn content in the range 0.42 - 1.21% has been investigated in tensile tests performed at different temperatures and strain rates. The results show that the temperature and the strain rate introduce opposite effects on the, stress-strain curves. On the other hand, the stress-strain response of the Fe-Mn binary alloys is sensitive to the Mn content. Obviously there is a linear increase of the yield stress with increasing Mn content due to solid solution hardening

    Emotion recognition in speech using cross-modal transfer in the wild

    No full text
    Obtaining large, human labelled speech datasets to train models for emotion recognition is a notoriously challenging task, hindered by annotation cost and label ambiguity. In this work, we consider the task of learning embeddings for speech classification without access to any form of labelled audio. We base our approach on a simple hypothesis: that the emotional content of speech correlates with the facial expression of the speaker. By exploiting this relationship, we show that annotations of expression can be transferred from the visual domain (faces) to the speech domain (voices) through cross-modal distillation. We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets. Code, models and data are available
    corecore