333 research outputs found

    Corpus selection

    Get PDF
    Entregable del proyecto Collaborative Annotation of multi-MOdal, MultI-Lingual and multi-mEdia documents. This document describes the different corpora that will be used during the Camomile projectPeer ReviewedPreprin

    QCompere @ REPERE 2013

    No full text
    International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

    Towards a complete Binary Key System for the Speaker Diarization Task

    Get PDF
    International audienceSpeaker diarization is the task of partitioning an audio stream into homogeneous segments according to speaker identity. Today state-of-the-art speaker diarization systems have achieved very competitive performance. However, any small improvement in Diarization Error Rate (DER) is usually subject to very large processing times (real time factor above one), which makes systems not suitable for some time-critical, real-life applications. Recently, a novel fast speaker diarization technique based on speaker modeling using binary keys was presented. The proposed technique speeds up the process up to ten times faster than real-time with little increase of DER. Although the approach shows great potential, the presented results are still preliminary. The goal of this paper is to further investigate this technique, in order to move towards a complete binary-key based system for the speaker diarization task. Preliminary experiments in Speech Activity Detection (SAD) based on binary keys show the feasibility of the binary key modeling approach for this task. Furthermore, the system has been tested on two different kinds of test data: meeting audio recordings and TV shows. The experiments carried out on NIST RT05 and REPERE databases show promising results and indicate that there is still room for further improvement

    Ways of Forgetting and Remembering the Eloquence of the 19th Century: Editors of Romanian Political Speeches

    Get PDF
    The paper presents a critical evaluation of the existing anthologies of Romanian oratory and analyzes the pertinence of a new research line: how to trace back the foundations of Romanian versatile political memory, both from a lexical and from an ideological point of view. As I argue in the first part of the paper, collecting and editing the great speeches of Romanian orators seems crucial for today’s understanding of politics (politicians’ speaking/ actions as well as voters’ behavior/ electoral habits). In the second part, I focus on the particularities generated by a dramatic change of media support (in the context of Romania’s high rates of illiteracy at the end of the 19th century): from “writing” information on the slippery surface of memory (declaimed political texts such as “proclamations,” “petitions,” and “appeals”) to “writing” as such (transcribed political speeches). The last part of the paper problematizes the making of a new canon of Romanian eloquence as well as the opportunity of a new assemblage of oratorical texts, illustrative for the 19th century politics, and endeavors to settle a series of virtual editing principle

    Study on the Students’ Perception of Knowledge Usefulness and Necessity Concerning Tourists’ Protection

    Get PDF
    The subject concerning the protection of the consumers’ rights and interests is of high concern nowadays, because the market economy, by its mechanisms and by the principles it promotes is permanently associated with the notion of correctness. Considering the consumer’s needs, the notion of correctness from the market economy has in view the assurance of great informative possibilities, of choosing and buying the products at convenient prices, with a corresponding quality. The market should be transparent, the information should circulate freely, and the prices should be known. Consequently, the competition will be efficient, loyal, and beneficial for the consumer. Taking into account that one fundamental right of the consumer is to be informed and educated, we intend to realize a study on the identification and solving of the problems met by the students from the departments of Tourism Geography and Environmental Geography, concerning the usefulness and the necessity of a knowledge regarding the protection of tourism consumers. In correlation with this objective, we consider that the academic environment in which the students develop themselves represents an opportunity in what concerns sending and accumulating information from different fields, to efficiently contribute to the formation of a citizen who is aware of his rights as a consumer.consumers’ protection, tourism, problem identification, students

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    QCompere @ REPERE 2013

    Get PDF
    International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    Speaker diarisation and longitudinal linking in multi-genre broadcast data

    Get PDF
    This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.This work is in part supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology). C. Zhang is also supported by a Cambridge International Scholarship from the Cambridge Commonwealth, European & International Trust.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ASRU.2015.740485
    corecore