426 research outputs found

    Fashion Style Generation: Evolutionary Search with Gaussian Mixture Models in the Latent Space

    Get PDF
    This paper presents a novel approach for guiding a Generative Adversarial Network trained on the FashionGen dataset to generate designs corresponding to target fashion styles. Finding the latent vectors in the generator's latent space that correspond to a style is approached as an evolutionary search problem. A Gaussian mixture model is applied to identify fashion styles based on the higher-layer representations of outfits in a clothing-specific attribute prediction model. Over generations, a genetic algorithm optimizes a population of designs to increase their probability of belonging to one of the Gaussian mixture components or styles. Showing that the developed system can generate images of maximum fitness visually resembling certain styles, our approach provides a promising direction to guide the search for style-coherent designs.Comment: - to be published at: International Conference on Computational Intelligence in Music, Sound, Art and Design : EvoMUSART 2022 - typo corrected in abstrac

    Sliced Wasserstein Distance for Learning Gaussian Mixture Models

    Full text link
    Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm. Specifically, we propose minimizing the sliced-Wasserstein distance between the mixture model and the data distribution with respect to the GMM parameters. In contrast to the KL-divergence, the energy landscape for the sliced-Wasserstein distance is more well-behaved and therefore more suitable for a stochastic gradient descent scheme to obtain the optimal GMM parameters. We show that our formulation results in parameter estimates that are more robust to random initializations and demonstrate that it can estimate high-dimensional data distributions more faithfully than the EM algorithm

    Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

    Full text link
    In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition

    Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation

    Full text link
    We present a data-driven framework for ground-motion synthesis that generates three-component acceleration time histories conditioned on moment magnitude, rupture distance , time-average shear-wave velocity at the top 30m30m (VS30V_{S30}), and style of faulting. We use a Generative Adversarial Neural Operator (GANO), a resolution invariant architecture that guarantees model training independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (cGM-GANO) and discuss its advantages compared to previous work. We next train cGM-GANO on simulated ground motions generated by the Southern California Earthquake Center Broadband Platform (BBP) and on recorded KiK-net data and show that the model can learn the overall magnitude, distance, and VS30V_{S30} scaling of effective amplitude spectra (EAS) ordinates and pseudo-spectral accelerations (PSA). Results specifically show that cGM-GANO produces consistent median scaling with the training data for the corresponding tectonic environments over a wide range of frequencies for scenarios with sufficient data coverage. For the BBP dataset, cGM-GANO cannot learn the ground motion scaling of the stochastic frequency components; for the KiK-net dataset, the largest misfit is observed at short distances and for soft soil conditions due to the scarcity of such data. Except for these conditions, the aleatory variability of EAS and PSA are captured reasonably well. Lastly, cGM-GANO produces similar median scaling to traditional GMMs for frequencies greater than 1Hz for both PSA and EAS but underestimates the aleatory variability of EAS. Discrepancies in the comparisons between the synthetic ground motions and GMMs are attributed to inconsistencies between the training dataset and the datasets used in GMM development. Our pilot study demonstrates GANO's potential for efficient synthesis of broad-band ground motion
    corecore