426 research outputs found
Fashion Style Generation: Evolutionary Search with Gaussian Mixture Models in the Latent Space
This paper presents a novel approach for guiding a Generative Adversarial
Network trained on the FashionGen dataset to generate designs corresponding to
target fashion styles. Finding the latent vectors in the generator's latent
space that correspond to a style is approached as an evolutionary search
problem. A Gaussian mixture model is applied to identify fashion styles based
on the higher-layer representations of outfits in a clothing-specific attribute
prediction model. Over generations, a genetic algorithm optimizes a population
of designs to increase their probability of belonging to one of the Gaussian
mixture components or styles. Showing that the developed system can generate
images of maximum fitness visually resembling certain styles, our approach
provides a promising direction to guide the search for style-coherent designs.Comment: - to be published at: International Conference on Computational
Intelligence in Music, Sound, Art and Design : EvoMUSART 2022 - typo
corrected in abstrac
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
Adversarial Network Bottleneck Features for Noise Robust Speaker Verification
In this paper, we propose a noise robust bottleneck feature representation
which is generated by an adversarial network (AN). The AN includes two cascade
connected networks, an encoding network (EN) and a discriminative network (DN).
Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used
as input to the EN and the output of the EN is used as the noise robust
feature. The EN and DN are trained in turn, namely, when training the DN, noise
types are selected as the training labels and when training the EN, all labels
are set as the same, i.e., the clean speech label, which aims to make the AN
features invariant to noise and thus achieve noise robustness. We evaluate the
performance of the proposed feature on a Gaussian Mixture Model-Universal
Background Model based speaker verification system, and make comparison to MFCC
features of speech enhanced by short-time spectral amplitude minimum mean
square error (STSA-MMSE) and deep neural network-based speech enhancement
(DNN-SE) methods. Experimental results on the RSR2015 database show that the
proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE
and DNN-SE based MFCCs for different noise types and signal-to-noise ratios.
Furthermore, the AN-BN feature is able to improve the speaker verification
performance under the clean condition
Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation
We present a data-driven framework for ground-motion synthesis that generates
three-component acceleration time histories conditioned on moment magnitude,
rupture distance , time-average shear-wave velocity at the top
(), and style of faulting. We use a Generative Adversarial Neural
Operator (GANO), a resolution invariant architecture that guarantees model
training independent of the data sampling frequency. We first present the
conditional ground-motion synthesis algorithm (cGM-GANO) and discuss its
advantages compared to previous work. We next train cGM-GANO on simulated
ground motions generated by the Southern California Earthquake Center Broadband
Platform (BBP) and on recorded KiK-net data and show that the model can learn
the overall magnitude, distance, and scaling of effective amplitude
spectra (EAS) ordinates and pseudo-spectral accelerations (PSA). Results
specifically show that cGM-GANO produces consistent median scaling with the
training data for the corresponding tectonic environments over a wide range of
frequencies for scenarios with sufficient data coverage. For the BBP dataset,
cGM-GANO cannot learn the ground motion scaling of the stochastic frequency
components; for the KiK-net dataset, the largest misfit is observed at short
distances and for soft soil conditions due to the scarcity of such data. Except
for these conditions, the aleatory variability of EAS and PSA are captured
reasonably well. Lastly, cGM-GANO produces similar median scaling to
traditional GMMs for frequencies greater than 1Hz for both PSA and EAS but
underestimates the aleatory variability of EAS. Discrepancies in the
comparisons between the synthetic ground motions and GMMs are attributed to
inconsistencies between the training dataset and the datasets used in GMM
development. Our pilot study demonstrates GANO's potential for efficient
synthesis of broad-band ground motion
- …