19 research outputs found

    Π­Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΠ΅ распознаваниС Π»ΠΈΡ† Π½Π° основС ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° нСйросСтСвых дСскрипторов ΠΈ дСтСктирования ΠΌΠΈΠ½ΠΎΡ€ΠΈΡ‚Π°Ρ€Π½Ρ‹Ρ… классов

    Get PDF
    Π˜ΡΡΠ»Π΅Π΄ΡƒΡŽΡ‚ΡΡ способы ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΡ точности распознавания Π»ΠΈΡ† Π½Π° основС обнаруТСния Π²Ρ…ΠΎΠ΄Π½Ρ‹Ρ… ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Ρ€Π΅Π΄ΠΊΠΎ Π²ΡΡ‚Ρ€Π΅Ρ‡Π°ΡŽΡ‚ΡΡ Π² Π½Π°Π±ΠΎΡ€Π°Ρ… Π΄Π°Π½Π½Ρ‹Ρ…, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‰ΠΈΡ…ΡΡ для обучСния нСйросСтСвых дСскрипторов. Π’ соврСмСнных свободно распространяСмых ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ… Π²Ρ‹Π±ΠΎΡ€ΠΊΠ°Ρ… обычнопрСдставлСны изобраТСния людСй Π² основном срСднСго возраста ΠΈ Π΅Π²Ρ€ΠΎΠΏΠ΅ΠΎΠΈΠ΄Π½ΠΎΠΉ расы, ΠΈΠ·-Π·Π° этого Π±ΠΎΠ»ΡŒΡˆΠΈΠ½ΡΡ‚Π²ΠΎ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² ΠΎΡˆΠΈΠ±Π°ΡŽΡ‚ΡΡ Π½Π° изобраТСниях ΠΏΠΎΠΆΠΈΠ»Ρ‹Ρ… людСй ΠΈΠ»ΠΈ Π΄Π΅Ρ‚Π΅ΠΉ, Π»ΠΈΡ†Π°Ρ… Π±ΠΎΠ»Π΅Π΅ Ρ€Π΅Π΄ΠΊΠΈΡ… Π½Π°Ρ†ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ ΠΈ Ρ‚.ΠΏ. Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ дСтСктирования Ρ‚Π°ΠΊΠΈΡ… данныхс ΠΏΠΎΡΠ»Π΅Π΄ΡƒΡŽΡ‰Π΅ΠΉ ΠΈΡ… ΠΎΡ‚Π±Ρ€Π°ΠΊΠΎΠ²ΠΊΠΎΠΉ, Π½Π° ΠΏΠ΅Ρ€Π²ΠΎΠΌ этапС ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ свСрточная нСйронная ΡΠ΅Ρ‚ΡŒ, прСдобучСнная Π½Π° ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎ созданном Π½Π°Π±ΠΎΡ€Π΅ Ρ€Π΅Π΄ΠΊΠΈΡ… Π΄Π°Π½Π½Ρ‹Ρ…. Π’Ρ‚ΠΎΡ€ΠΎΠΉ этап – ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° дСскрипторов для ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΡ Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠΉ эффСктивности классификации. Π­ΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΠΎΠ΅ исслСдованиС Π½Π° Π½Π°Π±ΠΎΡ€Π΅ Π΄Π°Π½Π½Ρ‹Ρ… VGGFace2 с использованиСм нСйросСтСвых дСскрипторов, Π² Ρ‚ΠΎΠΌ числС соврСмСнных ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ InsightFace, продСмонстрировало ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½Π½ΡƒΡŽ ΡΡ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° ΠΏ

    Balancing Biases and Preserving Privacy on Balanced Faces in the Wild

    Full text link
    Demographic biases exist in current models used for facial recognition (FR). Our Balanced Faces in the Wild (BFW) dataset is a proxy to measure bias across ethnicity and gender subgroups, allowing one to characterize FR performances per subgroup. We show that results are non-optimal when a single score threshold determines whether sample pairs are genuine or imposters. Furthermore, within subgroups, performance often varies significantly from the global average. Thus, specific error rates only hold for populations matching the validation data. We mitigate the imbalanced performances using a novel domain adaptation learning scheme on the facial features extracted from state-of-the-art neural networks, boosting the average performance. The proposed method also preserves identity information while removing demographic knowledge. The removal of demographic knowledge prevents potential biases from being injected into decision-making and protects privacy since demographic information is no longer available. We explore the proposed method and show that subgroup classifiers can no longer learn from the features projected using our domain adaptation scheme. For source code and data, see https://github.com/visionjo/facerec-bias-bfw.Comment: arXiv admin note: text overlap with arXiv:2102.0894

    Open-set face identification with automatic detection of out-of-distribution images

    Get PDF
    Одной ΠΈΠ· основных ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌ соврСмСнных нСйросСтСвых дСскрипторов Π² Π·Π°Π΄Π°Ρ‡Π΅ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΈ Π»ΠΈΡ† являСтся ΠΌΠ°Π»ΠΎΠ΅ число ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ… ΠΏΡ€ΠΈΠΌΠ΅Ρ€ΠΎΠ² ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ³ΠΎ Ρ‚ΠΈΠΏΠ°: изобраТСния ΠΏΠ»ΠΎΡ…ΠΎΠ³ΠΎ качСства, Ρ€Π°Π·Π½Ρ‹ΠΉ ΠΌΠ°ΡΡˆΡ‚Π°Π± ΠΈΠ»ΠΈ освСщСниС, Π»ΠΈΡ†Π° Π΄Π΅Ρ‚Π΅ΠΉ, ΠΏΠΎΠΆΠΈΠ»Ρ‹Ρ… людСй, Ρ€Π΅Π΄ΠΊΠΈΠ΅ расы. Π’ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Π΅ Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ распознавания оказываСтся Π½ΠΈΠ·ΠΊΠΎΠΉ для Π²Ρ…ΠΎΠ΄Π½Ρ‹Ρ… ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ, Π½Π΅ ΠΏΠΎΡ…ΠΎΠΆΠΈΡ… Π½Π° Π±ΠΎΠ»ΡŒΡˆΠΈΠ½ΡΡ‚Π²ΠΎ ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ Π² Π½Π°Π±ΠΎΡ€Π΅ Π΄Π°Π½Π½Ρ‹Ρ…, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΠΎΠΌ для настройки ΠΌΠ΅Ρ‚ΠΎΠ΄Π° извлСчСния ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ². Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ прСдлагаСтся способ прСодолСния Ρ‚Π°ΠΊΠΎΠΉ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹ Π·Π° счСт автоматичСского обнаруТСния Π½Π΅Ρ‚ΠΈΠΏΠΈΡ‡Π½Ρ‹Ρ… Π²Ρ…ΠΎΠ΄Π½Ρ‹Ρ… ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ Π½Π° основС ввСдСния ΠΏΡ€Π΅Π΄Π²Π°Ρ€ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ этапа ΠΈΡ… автоматичСской ΠΎΡ‚Π±Ρ€Π°ΠΊΠΎΠ²ΠΊΠΈ. Для этого ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½Π°Ρ свёрточная ΡΠ΅Ρ‚ΡŒ, обучСнная Π½Π° Π½Π°Π±ΠΎΡ€Π΅ Ρ€Π΅Π΄ΠΊΠΈΡ… Π΄Π°Π½Π½Ρ‹Ρ…, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΎΠ±Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π»ΠΈΡΡŒ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ извСстных Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² прСобразования ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ. Для ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΡ Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠΉ эффСктивности Ρ€Π΅ΡˆΠ΅Π½ΠΈΠ΅ ΠΎ Π½Π°Π»ΠΈΡ‡ΠΈΠΈ Ρ€Π΅Π΄ΠΊΠΎΠ³ΠΎ изобраТСния принимаСтся Π½Π° основС Ρ‚ΠΎΠ³ΠΎ ΠΆΠ΅ дСскриптора Π»ΠΈΡ†Π°, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ Π² классификаторС. Π­ΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΠΎΠ΅ исслСдованиС ΠΏΠΎΠ΄Ρ‚Π²Π΅Ρ€Π΄ΠΈΠ»ΠΎ прСимущСства Π² точности ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Π° для Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… Π½Π°Π±ΠΎΡ€ΠΎΠ² Π΄Π°Π½Π½Ρ‹Ρ… Π»ΠΈΡ† ΠΈ соврСмСнных нСйросСтСвых дСскрипторов.ИсслСдованиС Π²Ρ‹ΠΏΠΎΠ»Π½Π΅Π½ΠΎ Π·Π° счСт Π³Ρ€Π°Π½Ρ‚Π° Российского Π½Π°ΡƒΡ‡Π½ΠΎΠ³ΠΎ Ρ„ΠΎΠ½Π΄Π° (ΠΏΡ€ΠΎΠ΅ΠΊΡ‚ No 20-71-10010). ИсслСдованиС НиколСнко Π‘.И. ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠ°Π½ΠΎ Π‘Π°Π½ΠΊΡ‚-ΠŸΠ΅Ρ‚Π΅Ρ€Π±ΡƒΡ€Π³ΡΠΊΠΈΠΌ государствСнным унивСрситСтом, ΠΏΡ€ΠΎΠ΅ΠΊΡ‚ β„– 73555239 Β«Π˜ΡΠΊΡƒΡΡΡ‚Π²Π΅Π½Π½Ρ‹ΠΉ ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ ΠΈ Π½Π°ΡƒΠΊΠ° ΠΎ Π΄Π°Π½Π½Ρ‹Ρ…: тСория, тСхнология, отраслСвыС ΠΈ мСТдисциплинарныС исслСдования ΠΈ прилоТСния»

    Self-supervised Face Representation Learning

    Get PDF
    This thesis investigates fine-tuning deep face features in a self-supervised manner for discriminative face representation learning, wherein we develop methods to automatically generate pseudo-labels for training a neural network. Most importantly solving this problem helps us to advance the state-of-the-art in representation learning and can be beneficial to a variety of practical downstream tasks. Fortunately, there is a vast amount of videos on the internet that can be used by machines to learn an effective representation. We present methods that can learn a strong face representation from large-scale data be the form of images or video. However, while learning a good representation using a deep learning algorithm requires a large-scale dataset with manually curated labels, we propose self-supervised approaches to generate pseudo-labels utilizing the temporal structure of the video data and similarity constraints to get supervision from the data itself. We aim to learn a representation that exhibits small distances between samples from the same person, and large inter-person distances in feature space. Using metric learning one could achieve that as it is comprised of a pull-term, pulling data points from the same class closer, and a push-term, pushing data points from a different class further away. Metric learning for improving feature quality is useful but requires some form of external supervision to provide labels for the same or different pairs. In the case of face clustering in TV series, we may obtain this supervision from tracks and other cues. The tracking acts as a form of high precision clustering (grouping detections within a shot) and is used to automatically generate positive and negative pairs of face images. Inspired from that we propose two variants of discriminative approaches: Track-supervised Siamese network (TSiam) and Self-supervised Siamese network (SSiam). In TSiam, we utilize the tracking supervision to obtain the pair, additional we include negative training pairs for singleton tracks -- tracks that are not temporally co-occurring. As supervision from tracking may not always be available, to enable the use of metric learning without any supervision we propose an effective approach SSiam that can generate the required pairs automatically during training. In SSiam, we leverage dynamic generation of positive and negative pairs based on sorting distances (i.e. ranking) on a subset of frames and do not have to only rely on video/track based supervision. Next, we present a method namely Clustering-based Contrastive Learning (CCL), a new clustering-based representation learning approach that utilizes automatically discovered partitions obtained from a clustering algorithm (FINCH) as weak supervision along with inherent video constraints to learn discriminative face features. As annotating datasets is costly and difficult, using label-free and weak supervision obtained from a clustering algorithm as a proxy learning task is promising. Through our analysis, we show that creating positive and negative training pairs using clustering predictions help to improve the performance for video face clustering. We then propose a method face grouping on graphs (FGG), a method for unsupervised fine-tuning of deep face feature representations. We utilize a graph structure with positive and negative edges over a set of face-tracks based on their temporal structure of the video data and similarity-based constraints. Using graph neural networks, the features communicate over the edges allowing each track\u27s feature to exchange information with its neighbors, and thus push each representation in a direction in feature space that groups all representations of the same person together and separates representations of a different person. Having developed these methods to generate weak-labels for face representation learning, next we propose to learn compact yet effective representation for describing face tracks in videos into compact descriptors, that can complement previous methods towards learning a more powerful face representation. Specifically, we propose Temporal Compact Bilinear Pooling (TCBP) to encode the temporal segments in videos into a compact descriptor. TCBP possesses the ability to capture interactions between each element of the feature representation with one-another over a long-range temporal context. We integrated our previous methods TSiam, SSiam and CCL with TCBP and demonstrated that TCBP has excellent capabilities in learning a strong face representation. We further show TCBP has exceptional transfer abilities to applications such as multimodal video clip representation that jointly encodes images, audio, video and text, and video classification. All of these contributions are demonstrated on benchmark video clustering datasets: The Big Bang Theory, Buffy the Vampire Slayer and Harry Potter 1. We provide extensive evaluations on these datasets achieving a significant boost in performance over the base features, and in comparison to the state-of-the-art results

    Introduction: Ways of Machine Seeing

    Get PDF
    How do machines, and, in particular, computational technologies, change the way we see the world? This special issue brings together researchers from a wide range of disciplines to explore the entanglement of machines and their ways of seeing from new critical perspectives. This 'editorial' is for a special issue of AI & Society, which includes contributions from: MarΓ­a JesΓΊs Schultz Abarca, Peter Bell, Tobias Blanke, Benjamin Bratton, Claudio Celis Bueno, Kate Crawford, Iain Emsley, Abelardo Gil-Fournier, Daniel ChΓ‘vez Heras, Vladan Joler, Nicolas MalevΓ©, Lev Manovich, Nicholas Mirzoeff, Perle MΓΈhl, Bruno Moreschi, Fabian Offert, Trevor Paglan, Jussi Parikka, Luciana Parisi, Matteo Pasquinelli, Gabriel Pereira, Carloalberto Treccani, Rebecca Uliasz, and Manuel van der Veen

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity
    corecore