47,593 research outputs found

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    PANDA: Pose Aligned Networks for Deep Attribute Modeling

    Full text link
    We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion. Convolutional Neural Nets (CNN) have been shown to perform very well on large scale object recognition problems. In the context of attribute classification, however, the signal is often subtle and it may cover only a small part of the image, while the image is dominated by the effects of pose and viewpoint. Discounting for pose variation would require training on very large labeled datasets which are not presently available. Part-based models, such as poselets and DPM have been shown to perform well for this problem but they are limited by shallow low-level features. We propose a new method which combines part-based models and deep learning by training pose-normalized CNNs. We show substantial improvement vs. state-of-the-art methods on challenging attribute classification tasks in unconstrained settings. Experiments confirm that our method outperforms both the best part-based methods on this problem and conventional CNNs trained on the full bounding box of the person.Comment: 8 page

    Conditional Similarity Networks

    Full text link
    What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions of similarities cannot be captured in a single space. To address this shortcoming, we propose Conditional Similarity Networks (CSNs) that learn embeddings differentiated into semantically distinct subspaces that capture the different notions of similarities. CSNs jointly learn a disentangled embedding where features for different similarities are encoded in separate dimensions as well as masks that select and reweight relevant dimensions to induce a subspace that encodes a specific similarity notion. We show that our approach learns interpretable image representations with visually relevant semantic subspaces. Further, when evaluating on triplet questions from multiple similarity notions our model even outperforms the accuracy obtained by training individual specialized networks for each notion separately.Comment: CVPR 201

    Computed tomographic morphometry of tympanic bulla shape and position in brachycephalic and mesaticephalic dog breeds

    Get PDF
    Anatomic variations in skull morphology have been previously described for brachycephalic dogs; however there is little published information on interbreed variations in tympanic bulla morphology. This retrospective observational study aimed to (1) provide detailed descriptions of the computed tomographic (CT) morphology of tympanic bullae in a sample of dogs representing four brachycephalic breeds (Pugs, French Bulldogs, English Bulldog, and Cavalier King Charles Spaniels) versus two mesaticephalic breeds (Labrador retrievers and Jack Russell Terriers); and (2) test associations between tympanic bulla morphology and presence of middle ear effusion. Archived head CT scans for the above dog breeds were retrieved and a single observer measured tympanic bulla shape (width:height ratio), wall thickness, position relative to the temporomandibular joint, and relative volume (volume:body weight ratio). A total of 127 dogs were sampled. Cavalier King Charles Spaniels had significantly flatter tympanic bullae (greater width:height ratios) versus Pugs, English Bulldogs, Labrador retrievers, and Jack Russell terriers. French Bulldogs and Pugs had significantly more overlap between tympanic bullae and temporomandibular joints versus other breeds. All brachycephalic breeds had significantly lower tympanic bulla volume:weight ratios versus Labrador retrievers. Soft tissue attenuating material (middle ear effusion) was present in the middle ear of 48/100 (48%) of brachycephalic breeds, but no significant association was found between tympanic bulla CT measurements and presence of this material. Findings indicated that there are significant interbreed variations in tympanic bulla morphology, however no significant relationship between tympanic bulla morphology and presence of middle ear effusion could be identified
    corecore