28 research outputs found

    High Resolution Face Editing with Masked GAN Latent Code Optimization

    Full text link
    Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: (i) are still largely focused on low-resolution images, (ii) often generate editing results with visual artefacts, or (iii) lack fine-grained control and alter multiple (entangled) attributes at once, when trying to generate the desired facial semantics. In this paper, we aim to address these issues though a novel attribute editing approach called MaskFaceGAN. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: (i) preservation of relevant image content, (ii) generation of the targeted facial attributes, and (iii) spatially--selective treatment of local image areas. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the CelebA-HQ, Helen and SiblingsDB-HQf datasets and in comparison with several state-of-the-art techniques from the literature, i.e., StarGAN, AttGAN, STGAN, and two versions of InterFaceGAN. Our experimental results show that the proposed approach is able to edit face images with respect to several facial attributes with unprecedented image quality and at high-resolutions (1024x1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is made freely available from: https://github.com/MartinPernus/MaskFaceGAN.Comment: The updated paper will be submitted to IEEE Transactions on Image Processing. Added more qualitative and quantitative results to the main part of the paper. This version now also includes the supplementary materia

    Bodo pametni nadzorni sistemi prisluhnili, razumeli in spregovorili slovensko?

    Get PDF
    Članek obravnava tehnologije govorjenega jezika, ki bi lahko omogočile t. i. pametnim nadzornim sistemom, da bi nekoč prisluhnili, razumeli in spregovorili slovensko. Tovrstni sistemi se z uporabo senzorjev in naprednih računalniških metod umetnega zaznavanja in razpoznavanja vzorcev do neke mere zavedajo okolja ter prisotnosti ljudi in drugih pojavov, ki bi lahko bili predmet varnostnega nadzora. Med tovrstne pojave spada tudi govor, ki lahko predstavlja ključni vir informacije pri določenih varnostnonadzornih okoliščinah. Tehnologije, ki omogočajo samodejno razpoznavanje in tvorjenje govora ter samodejno razpoznavanje govorcev in njihovega psihofizičnega stanja s pomočjo napredne računalniške analize govornega zvočnega signala, odpirajo povsem nove dimenzije razvoja pametnih nadzornih sistemov. Samodejno razpoznavanje varnostno sumljivih govornih izjav, kričanja in klicev na pomoč ter samodejno zaznavanje varnostno sumljivega psihofizičnega stanja govorcev tovrstnim sistemom doda pridih umetne inteligence. Članek predstavlja trenutno stanje razvoja omenjenih tehnologij in možnosti njihove uporabe za slovenski govorjeni jezik ter različne varnostnonadzorne scenarije uporabe tovrstnih sistemov. Naslovljena so tudi širša pravna in etična vprašanja, ki jih odpira razvoj in uporaba tovrstnih tehnologij. Govorni nadzor je namreč eno najbolj občutljivih vprašanj varstva zasebnosti

    Zgoščena predstavitev slovarjev izgovarjav s končnimi super pretvorniki

    Get PDF
    Računalniški modeli končnih pretvornikov omogočajo zgoščeno predstavitev slovarjev izgovarjav, ki jih uporabljajo tako sintetizatorji govora kot tudi razpoznavalniki govora. V članku je predstavljen nov tip končnih pretvornikov, t. i. končni super pretvorniki, s katerimi lahko slovarje izgovarjav predstavimo z manjšim številom stanj in prehodov kot z uporabo običajnih minimalnih determinističnih končnih pretvornikov. Predstavljen je učinkovit postopek gradnje končnih super pretvornikov, ki ohranjajo svojo determinističnost, poleg besed iz danega slovarja izgovarjav pa lahko sprejmejo in pretvorijo tudi nekatere druge besede, ki niso bile zastopane v izvirnem slovarju. Oddani izhodni fonetični prepisi za določene sprejete izvenslovarske besede so sicer lahko napačni, vendar se izkaže, da je napaka pri pretvorbi primerljiva z napakami, ki jih dosegajo trenutno najboljše metode za samodejno grafemsko-fonemsko pretvorbo besed za slovenski jezik. Za preizkus in preverjanje predlaganega postopka gradnje končnih super pretvornikov je bil uporabljen slovar izgovarjav SI-PRON za slovenski jezik, ki vsebuje več kot milijon različnih slovarskih vnosov. Rezultati poskusov so med drugim podali presenetljivo ugotovitev, da se velikost končnih pretvornikov z naraščanjem obsega slovarja prek določenega števila besed prične zmanjševati, kar pripisujemo predvsem velikemu številu pregibnih oblik besed v slovenščini

    Making the most of single sensor information

    Full text link
    Most commercially successful face recognition systems combine information from multiple sensors (2D and 3D, visible light and infrared, etc.) to achieve reliable recognition in various environments. When only a single sensor is available, the robustness as well as efficacy of the recognition process suffer. In this paper, we focus on face recognition using images captured by a single 3D sensor and propose a method based on the use of region covariance matrixes and Gaussian mixture models (GMMs). All steps of the proposed framework are automated, and no metadata, such as pre-annotated eye, nose, or mouth positions is required, while only a very simple clustering-based face detection is performed. The framework computes a set of region covariance descriptors from local regions of different face image representations and then uses the unscented transform to derive low-dimensional feature vectors, which are finally modeled by GMMs. In the last step, a support vector machine classification scheme is used to make a decision about the identity of the input 3D facial image. The proposed framework has several desirable characteristics, such as an inherent mechanism for data fusion/integration (through the region covariance matrixes), the ability to explore facial images at different levels of locality, and the ability to integrate a domain-specific prior knowledge into the modeling procedure. Several normalization techniques are incorporated into the proposed framework to further improve performance. Extensive experiments are performed on three prominent databases (FRGC v2, CASIA, and UMB-DB) yielding competitive results

    Utilizing forced alignment for phonetic analysis of Slovene speech

    No full text

    Towards Robust 3D Face Verification Using Gaussian Mixture Models

    No full text
    This paper focuses on the use of Gaussian Mixture models (GMM) for 3D face verification. A special interest is taken in practical aspects of 3D face verification systems, where all steps of the verification procedure need to be automated and no meta-data, such as pre-annotated eye/nose/mouth positions, is available to the system. In such settings the performance of the verification system correlates heavily with the performance of the employed alignment (i.e., geometric normalization) procedure. We show that popular holistic as well as local recognition techniques, such as principal component analysis (PCA), or Scale-invariant feature transform (SIFT)-based methods considerably deteriorate in their performance when an “imperfect” geometric normalization procedure is used to align the 3D face scans and that in these situations GMMs should be preferred. Moreover, several possibilities to improve the performance and robustness of the classical GMM framework are presented and evaluated: i) explicit inclusion of spatial information, during the GMM construction procedure, ii) implicit inclusion of spatial information during the GMM construction procedure and iii) on-line evaluation and possible rejection of local feature vectors based on their likelihood. We successfully demonstrate the feasibility of the proposed modifications on the Face Recognition Grand Challenge data set

    Confidence Weighted Subspace Projection Techniques for Robust Face Recognition in the Presence of Partial Occlusions ∗

    No full text
    Subspace projection techniques are known to be susceptible to the presence of partial occlusions in the image data. To overcome this susceptibility, we present in this paper a confidence weighting scheme that assigns weights to pixels according to a measure, which quantifies the confidence that the pixel in question represents an outlier. With this procedure the impact of the occluded pixels on the subspace representation is reduced and robustness to partial occlusions is obtained. Next, the confidence weighting concept is improved by a local procedure for the estimation of the subspace representation. Both the global weighting approach and the local estimation procedure are assessed in face recognition experiments on the AR database, where encouraging results are obtained with partially occluded facial images. 1
    corecore