5 research outputs found

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    TDMA massiu per a comunicacions satèl·lit

    No full text
    This project simulates with MATLAB a satellite Up-Link environment. The protocol used by the users to access the satellite is the CRDSA (Contention Resolution Diversity Slotted Aloha), a random access protocol based in TDMA(Time Division Multiple Access). This protocol manages the signals received by the satellite by applying a Successive Interference Cancellation Algorithm (SIC).Este proyecto simula con MATLAB un entorno de Up-Link a un satélite. El protocolo de acceso utilizado es el CRDSA (Contention Resolution Diversity Slotted Aloha), un protocolo basado en TDMA (Time Division Multiple Access, Acceso Múltiple por División en Tiempo). Este protocolo gestiona las señales recibidas por el satélite mediante un algoritmo de cancelación de interferencias.Aquest projecte simula un entorn d'Up-Link a un satèl·lit. El protocol d'accés utilitzat pels diferents usuaris que accedeixen aleatòriament al satèl·lit és el CRDSA (Contention Resolution Diversity Slotted Aloha), un protocol basat en TDMA (Time Division Multiple Access, Accés Multiple per Divisió de Temps). Aquest protocol gestiona les senyals rebudes pel satèl·lit mitjançant un Algoritme de Cancel·lació d'Interferències

    TDMA massiu per a comunicacions satèl·lit

    No full text
    This project simulates with MATLAB a satellite Up-Link environment. The protocol used by the users to access the satellite is the CRDSA (Contention Resolution Diversity Slotted Aloha), a random access protocol based in TDMA(Time Division Multiple Access). This protocol manages the signals received by the satellite by applying a Successive Interference Cancellation Algorithm (SIC).Este proyecto simula con MATLAB un entorno de Up-Link a un satélite. El protocolo de acceso utilizado es el CRDSA (Contention Resolution Diversity Slotted Aloha), un protocolo basado en TDMA (Time Division Multiple Access, Acceso Múltiple por División en Tiempo). Este protocolo gestiona las señales recibidas por el satélite mediante un algoritmo de cancelación de interferencias.Aquest projecte simula un entorn d'Up-Link a un satèl·lit. El protocol d'accés utilitzat pels diferents usuaris que accedeixen aleatòriament al satèl·lit és el CRDSA (Contention Resolution Diversity Slotted Aloha), un protocol basat en TDMA (Time Division Multiple Access, Accés Multiple per Divisió de Temps). Aquest protocol gestiona les senyals rebudes pel satèl·lit mitjançant un Algoritme de Cancel·lació d'Interferències

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    No full text
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer Reviewe
    corecore