19 research outputs found

    Neural approaches to spoken content embedding

    Full text link
    Comparing spoken segments is a central operation to speech processing. Traditional approaches in this area have favored frame-level dynamic programming algorithms, such as dynamic time warping, because they require no supervision, but they are limited in performance and efficiency. As an alternative, acoustic word embeddings -- fixed-dimensional vector representations of variable-length spoken word segments -- have begun to be considered for such tasks as well. However, the current space of such discriminative embedding models, training approaches, and their application to real-world downstream tasks is limited. We start by considering ``single-view" training losses where the goal is to learn an acoustic word embedding model that separates same-word and different-word spoken segment pairs. Then, we consider ``multi-view" contrastive losses. In this setting, acoustic word embeddings are learned jointly with embeddings of character sequences to generate acoustically grounded embeddings of written words, or acoustically grounded word embeddings. In this thesis, we contribute new discriminative acoustic word embedding (AWE) and acoustically grounded word embedding (AGWE) approaches based on recurrent neural networks (RNNs). We improve model training in terms of both efficiency and performance. We take these developments beyond English to several low-resource languages and show that multilingual training improves performance when labeled data is limited. We apply our embedding models, both monolingual and multilingual, to the downstream tasks of query-by-example speech search and automatic speech recognition. Finally, we show how our embedding approaches compare with and complement more recent self-supervised speech models.Comment: PhD thesi

    Visually grounded learning of keyword prediction from untranscribed speech

    Full text link
    During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance---acting as a spoken bag-of-words classifier---without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. "man" and "person", making it even more effective as a semantic keyword spotter.Comment: 5 pages, 3 figures, 5 tables; small updates, added link to code; accepted to Interspeech 201

    Exposure to the BPA-Substitute Bisphenol S Causes Unique Alterations of Germline Function

    Get PDF
    <div><p>Concerns about the safety of Bisphenol A, a chemical found in plastics, receipts, food packaging and more, have led to its replacement with substitutes now found in a multitude of consumer products. However, several popular BPA-free alternatives, such as Bisphenol S, share a high degree of structural similarity with BPA, suggesting that these substitutes may disrupt similar developmental and reproductive pathways. We compared the effects of BPA and BPS on germline and reproductive functions using the genetic model system <i>Caenorhabditis elegans</i>. We found that, similarly to BPA, BPS caused severe reproductive defects including germline apoptosis and embryonic lethality. However, meiotic recombination, targeted gene expression, whole transcriptome and ontology analyses as well as ToxCast data mining all indicate that these effects are partly achieved via mechanisms distinct from BPAs. These findings therefore raise new concerns about the safety of BPA alternatives and the risk associated with human exposure to mixtures.</p></div

    Bisphenols exposure induces DNA damage checkpoint kinase CHK-1 activation.

    No full text
    <p>(A) Immunostaining of phosphorylated CHK-1 on mid- to late-pachytene nuclei from dissected gonads of worms exposed to vehicle control (0.1% ethanol), 500 μM BPA,500 μM BPS or to their mixture (Scale bar, 10 μm). (B) Percentage of examined worms with elevated pCHK-1 in each group. Error bars represent SEM. N = 10 worms per trial, three repeats per treatment group. All tests are based on t statistics. **P<0.01.</p

    Distinct expression changes of genes implicated in DSBR and DNA damage checkpoints activation pathways.

    No full text
    <p>The expression levels of target genes were assayed from isolated germlines by quantitative RT-PCR. Error bars represent SEM for 3–4 biological replicates each tested in duplicate. Two-tailed Student’s <i>t</i>-test between vehicle control (0.1% ethanol) and each treatment group (500 μM BPA or BPS). *P<0.05, **P<0.01 and ***P<0.001.</p
    corecore