28 research outputs found
High Resolution Face Editing with Masked GAN Latent Code Optimization
Face editing represents a popular research topic within the computer vision
and image processing communities. While significant progress has been made
recently in this area, existing solutions: (i) are still largely focused on
low-resolution images, (ii) often generate editing results with visual
artefacts, or (iii) lack fine-grained control and alter multiple (entangled)
attributes at once, when trying to generate the desired facial semantics. In
this paper, we aim to address these issues though a novel attribute editing
approach called MaskFaceGAN. The proposed approach is based on an optimization
procedure that directly optimizes the latent code of a pre-trained
(state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with
respect to several constraints that ensure: (i) preservation of relevant image
content, (ii) generation of the targeted facial attributes, and (iii)
spatially--selective treatment of local image areas. The constraints are
enforced with the help of an (differentiable) attribute classifier and face
parser that provide the necessary reference information for the optimization
procedure. MaskFaceGAN is evaluated in extensive experiments on the CelebA-HQ,
Helen and SiblingsDB-HQf datasets and in comparison with several
state-of-the-art techniques from the literature, i.e., StarGAN, AttGAN, STGAN,
and two versions of InterFaceGAN. Our experimental results show that the
proposed approach is able to edit face images with respect to several facial
attributes with unprecedented image quality and at high-resolutions
(1024x1024), while exhibiting considerably less problems with attribute
entanglement than competing solutions. The source code is made freely available
from: https://github.com/MartinPernus/MaskFaceGAN.Comment: The updated paper will be submitted to IEEE Transactions on Image
Processing. Added more qualitative and quantitative results to the main part
of the paper. This version now also includes the supplementary materia
Bodo pametni nadzorni sistemi prisluhnili, razumeli in spregovorili slovensko?
Članek obravnava tehnologije govorjenega jezika, ki bi lahko omogočile t. i. pametnim nadzornim sistemom, da bi nekoč prisluhnili, razumeli in spregovorili slovensko. Tovrstni sistemi se z uporabo senzorjev in naprednih računalniških metod umetnega zaznavanja in razpoznavanja vzorcev do neke mere zavedajo okolja ter prisotnosti ljudi in drugih pojavov, ki bi lahko bili predmet varnostnega nadzora. Med tovrstne pojave spada tudi govor, ki lahko predstavlja ključni vir informacije pri določenih varnostnonadzornih okoliščinah. Tehnologije, ki omogočajo samodejno razpoznavanje in tvorjenje govora ter samodejno razpoznavanje govorcev in njihovega psihofizičnega stanja s pomočjo napredne računalniške analize govornega zvočnega signala, odpirajo povsem nove dimenzije razvoja pametnih nadzornih sistemov. Samodejno razpoznavanje varnostno sumljivih govornih izjav, kričanja in klicev na pomoč ter samodejno zaznavanje varnostno sumljivega psihofizičnega stanja govorcev tovrstnim sistemom doda pridih umetne inteligence. Članek predstavlja trenutno stanje razvoja omenjenih tehnologij in možnosti njihove uporabe za slovenski govorjeni jezik ter različne varnostnonadzorne scenarije uporabe tovrstnih sistemov. Naslovljena so tudi širša pravna in etična vprašanja, ki jih odpira razvoj in uporaba tovrstnih tehnologij. Govorni nadzor je namreč eno najbolj občutljivih vprašanj varstva zasebnosti
Zgoščena predstavitev slovarjev izgovarjav s končnimi super pretvorniki
Računalniški modeli končnih pretvornikov omogočajo zgoščeno predstavitev slovarjev izgovarjav, ki jih uporabljajo tako sintetizatorji govora kot tudi razpoznavalniki govora. V članku je predstavljen nov tip končnih pretvornikov, t. i. končni super pretvorniki, s katerimi lahko slovarje izgovarjav predstavimo z manjšim številom stanj in prehodov kot z uporabo običajnih minimalnih determinističnih končnih pretvornikov. Predstavljen je učinkovit postopek gradnje končnih super pretvornikov, ki ohranjajo svojo determinističnost, poleg besed iz danega slovarja izgovarjav pa lahko sprejmejo in pretvorijo tudi nekatere druge besede, ki niso bile zastopane v izvirnem slovarju. Oddani izhodni fonetični prepisi za določene sprejete izvenslovarske besede so sicer lahko napačni, vendar se izkaže, da je napaka pri pretvorbi primerljiva z napakami, ki jih dosegajo trenutno najboljše metode za samodejno grafemsko-fonemsko pretvorbo besed za slovenski jezik. Za preizkus in preverjanje predlaganega postopka gradnje končnih super pretvornikov je bil uporabljen slovar izgovarjav SI-PRON za slovenski jezik, ki vsebuje več kot milijon različnih slovarskih vnosov. Rezultati poskusov so med drugim podali presenetljivo ugotovitev, da se velikost končnih pretvornikov z naraščanjem obsega slovarja prek določenega števila besed prične zmanjševati, kar pripisujemo predvsem velikemu številu pregibnih oblik besed v slovenščini
Making the most of single sensor information
Most commercially successful face recognition systems combine information from multiple sensors (2D and 3D, visible light and infrared, etc.) to achieve reliable recognition in various environments. When only a single sensor is available, the robustness as well as efficacy of the recognition process suffer. In this paper, we focus on face recognition using images captured by a single 3D sensor and propose a method based on the use of region covariance matrixes and Gaussian mixture models (GMMs). All steps of the proposed framework are automated, and no metadata, such as pre-annotated eye, nose, or mouth positions is required, while only a very simple clustering-based face detection is performed. The framework computes a set of region covariance descriptors from local regions of different face image representations and then uses the unscented transform to derive low-dimensional feature vectors, which are finally modeled by GMMs. In the last step, a support vector machine classification scheme is used to make a decision about the identity of the input 3D facial image. The proposed framework has several desirable characteristics, such as an inherent mechanism for data fusion/integration (through the region covariance matrixes), the ability to explore facial images at different levels of locality, and the ability to integrate a domain-specific prior knowledge into the modeling procedure. Several normalization techniques are incorporated into the proposed framework to further improve performance. Extensive experiments are performed on three prominent databases (FRGC v2, CASIA, and UMB-DB) yielding competitive results
Towards Robust 3D Face Verification Using Gaussian Mixture Models
This paper focuses on the use of Gaussian Mixture models (GMM) for 3D face verification. A special interest is taken in practical aspects of 3D face verification systems, where all steps of the verification procedure need to be automated and no meta-data, such as pre-annotated eye/nose/mouth positions, is available to the system. In such settings the performance of the verification system correlates heavily with the performance of the employed alignment (i.e., geometric normalization) procedure. We show that popular holistic as well as local recognition techniques, such as principal component analysis (PCA), or Scale-invariant feature transform (SIFT)-based methods considerably deteriorate in their performance when an “imperfect” geometric normalization procedure is used to align the 3D face scans and that in these situations GMMs should be preferred. Moreover, several possibilities to improve the performance and robustness of the classical GMM framework are presented and evaluated: i) explicit inclusion of spatial information, during the GMM construction procedure, ii) implicit inclusion of spatial information during the GMM construction procedure and iii) on-line evaluation and possible rejection of local feature vectors based on their likelihood. We successfully demonstrate the feasibility of the proposed modifications on the Face Recognition Grand Challenge data set
Confidence Weighted Subspace Projection Techniques for Robust Face Recognition in the Presence of Partial Occlusions ∗
Subspace projection techniques are known to be susceptible to the presence of partial occlusions in the image data. To overcome this susceptibility, we present in this paper a confidence weighting scheme that assigns weights to pixels according to a measure, which quantifies the confidence that the pixel in question represents an outlier. With this procedure the impact of the occluded pixels on the subspace representation is reduced and robustness to partial occlusions is obtained. Next, the confidence weighting concept is improved by a local procedure for the estimation of the subspace representation. Both the global weighting approach and the local estimation procedure are assessed in face recognition experiments on the AR database, where encouraging results are obtained with partially occluded facial images. 1