5 research outputs found

    Open set object recognition

    Get PDF
    Deep Learning is a widely used technique for classification tasks. In practise, the most common classifiers are not useful for certain tasks, as they were developed to work in an environment were the number of classes is bounded in training phase. In this thesis, we present an alternative classifier that is able to deal with data that belongs to new classes during testing time. This type of data, in which the number of classes is not defined, is referred to as open data. There are some Machine Learning classifiers that have been modified to work with open sets, specially based on SVM. Conversely, in the context of Deep Learning open data is a relatively new area of research. In this thesis we work with OpenMax classifier, showing its improvement when working with open data while also achieving similar results to traditional classifiers for known data

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    Open set object recognition

    No full text
    Deep Learning is a widely used technique for classification tasks. In practise, the most common classifiers are not useful for certain tasks, as they were developed to work in an environment were the number of classes is bounded in training phase. In this thesis, we present an alternative classifier that is able to deal with data that belongs to new classes during testing time. This type of data, in which the number of classes is not defined, is referred to as open data. There are some Machine Learning classifiers that have been modified to work with open sets, specially based on SVM. Conversely, in the context of Deep Learning open data is a relatively new area of research. In this thesis we work with OpenMax classifier, showing its improvement when working with open data while also achieving similar results to traditional classifiers for known data

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    No full text
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer Reviewe
    corecore