34 research outputs found

    What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

    Full text link
    Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations. Speech SSL models, such as WavLM, employ masked prediction training to encode general-purpose representations. In contrast, speaker SSL models, exemplified by DINO-based models, adopt utterance-level training objectives primarily for speaker representation. Understanding how these models represent information is essential for refining model efficiency and effectiveness. Unlike the various analyses of speech SSL, there has been limited investigation into what information speaker SSL captures and how its representation differs from speech SSL or other fully-supervised speaker models. This paper addresses these fundamental questions. We explore the capacity to capture various speech properties by applying SUPERB evaluation probing tasks to speech and speaker SSL models. We also examine which layers are predominantly utilized for each task to identify differences in how speech is represented. Furthermore, we conduct direct comparisons to measure the similarities between layers within and across models. Our analysis unveils that 1) the capacity to represent content information is somewhat unrelated to enhanced speaker representation, 2) specific layers of speech SSL models would be partly specialized in capturing linguistic information, and 3) speaker SSL models tend to disregard linguistic information but exhibit more sophisticated speaker representation.Comment: Accepted at ICASSP 202

    Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks

    Get PDF
    Abstract This paper focuses on language modeling with adequate robustness to support different domain tasks. To this end, we propose a hierarchical latent word language model (h-LWLM). The proposed model can be regarded as a generalized form of the standard LWLMs. The key advance is introducing a multiple latent variable space with hierarchical structure. The structure can flexibly take account of linguistic phenomena not present in the training data. This paper details the definition as well as a training method based on layer-wise inference and a practical usage in natural language processing tasks with an approximation technique. Experiments on speech recognition show the effectiveness of h-LWLM in out-of domain tasks

    CRISPR/Cas9 mediated genome editing in ES cells and its application for chimeric analysis in mice

    Get PDF
    Oji, A., Noda, T., Fujihara, Y. et al. CRISPR/Cas9 mediated genome editing in ES cells and its application for chimeric analysis in mice. Sci Rep 6, 31666 (2016). https://doi.org/10.1038/srep3166

    Spermatozoa lacking Fertilization Influencing Membrane Protein (FIMP) fail to fuse with oocytes in mice

    Get PDF
    Fujihara, Y., Lu, Y., Noda, T., Oji, A., Larasati, T., Kojima-Kita, K., . . . Ikawa, M. (2020). Spermatozoa lacking fertilization influencing membrane protein (FIMP) fail to fuse with oocytes in mice. Proceedings of the National Academy of Sciences of the United States of America, 117(17), 9393-9400. doi:10.1073/pnas.191706011

    Identification of multiple male reproductive tractspecific proteins that regulate sperm migration through the oviduct in mice

    Get PDF
    Fujihara, Y., Noda, T., Kobayashi, K., Oji, A., Kobayashi, S., Matsumura, T., . . . Ikawa, M. (2019). Identification of multiple male reproductive tractspecific proteins that regulate sperm migration through the oviduct in mice. Proceedings of the National Academy of Sciences of the United States of America, 116(37), 18498-18506. doi:10.1073/pnas.190873611

    Noise-Robust Speaker Verification Using F 0 Features

    No full text
    This paper proposes a noise-robust speaker verification method augmented by fundamental frequency (F 0 ). The paper first describes a noise-robust F0 extraction method using the Hough transform. Then, it proposes a robust speaker verification method using multi-stream HMMs which fuse the extracted F 0 and cepstral features. Experiments are conducted using fourconnected -digit utterances of Japanese by 37 male speakers recorded at five sessions over a half year period. The utterances are contaminated with white noise at various SNR levels. Experimental results show that the F0 features improve the verification performance in all SNR conditions

    An Improved Approximation Algorithm for Wage Determination and Online Task Allocation in Crowd-Sourcing

    No full text
    Crowd-sourcing has attracted much attention due to its growing importance to society, and numerous studies have been conducted on task allocation and wage determination. Recent works have focused on optimizing task allocation and workers' wages, simultaneously. However, existing methods do not provide good solutions for real-world crowd-sourcing platforms due to the low approximation ratio or myopic problem settings. We tackle an optimization problem for wage determination and online task allocation in crowd-sourcing and propose a fast 1-1/(k+3)^(1/2)-approximation algorithm, where k is the minimum of tasks' budgets (numbers of possible assignments). This approximation ratio is greater than or equal to the existing method. The proposed method reduces the tackled problem to a non-convex multi-period continuous optimization problem by approximating the objective function. Then, the method transforms the reduced problem into a minimum convex cost flow problem, which is a well-known combinatorial optimization problem, and solves it by the capacity scaling algorithm. Synthetic experiments and simulation experiments using real crowd-sourcing data show that the proposed method solves the problem faster and outputs higher objective values than existing methods

    Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

    No full text
    corecore