19 research outputs found
ΠΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠ΅ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΠ΅ Π»ΠΈΡ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΡ Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ² ΠΈ Π΄Π΅ΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΌΠΈΠ½ΠΎΡΠΈΡΠ°ΡΠ½ΡΡ ΠΊΠ»Π°ΡΡΠΎΠ²
ΠΡΡΠ»Π΅Π΄ΡΡΡΡΡ ΡΠΏΠΎΡΠΎΠ±Ρ ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΡ ΡΠΎΡΠ½ΠΎΡΡΠΈ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ Π»ΠΈΡ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΎΠ±Π½Π°ΡΡΠΆΠ΅Π½ΠΈΡ Π²Ρ
ΠΎΠ΄Π½ΡΡ
ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ, ΠΊΠΎΡΠΎΡΡΠ΅ ΡΠ΅Π΄ΠΊΠΎ Π²ΡΡΡΠ΅ΡΠ°ΡΡΡΡ Π² Π½Π°Π±ΠΎΡΠ°Ρ
Π΄Π°Π½Π½ΡΡ
, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡΡΠΈΡ
ΡΡ Π΄Π»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΡ
Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ². Π ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΠΎ ΡΠ°ΡΠΏΡΠΎΡΡΡΠ°Π½ΡΠ΅ΠΌΡΡ
ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
Π²ΡΠ±ΠΎΡΠΊΠ°Ρ
ΠΎΠ±ΡΡΠ½ΠΎΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Ρ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π»ΡΠ΄Π΅ΠΉ Π² ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΌ ΡΡΠ΅Π΄Π½Π΅Π³ΠΎ Π²ΠΎΠ·ΡΠ°ΡΡΠ° ΠΈ Π΅Π²ΡΠΎΠΏΠ΅ΠΎΠΈΠ΄Π½ΠΎΠΉ ΡΠ°ΡΡ, ΠΈΠ·-Π·Π° ΡΡΠΎΠ³ΠΎ Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²ΠΎ Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² ΠΎΡΠΈΠ±Π°ΡΡΡΡ Π½Π° ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡΡ
ΠΏΠΎΠΆΠΈΠ»ΡΡ
Π»ΡΠ΄Π΅ΠΉ ΠΈΠ»ΠΈ Π΄Π΅ΡΠ΅ΠΉ, Π»ΠΈΡΠ°Ρ
Π±ΠΎΠ»Π΅Π΅ ΡΠ΅Π΄ΠΊΠΈΡ
Π½Π°ΡΠΈΠΎΠ½Π°Π»ΡΠ½ΠΎΡΡΠ΅ΠΉ ΠΈ Ρ.ΠΏ. Π ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π΄Π΅ΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΡΠ°ΠΊΠΈΡ
Π΄Π°Π½Π½ΡΡ
Ρ ΠΏΠΎΡΠ»Π΅Π΄ΡΡΡΠ΅ΠΉ ΠΈΡ
ΠΎΡΠ±ΡΠ°ΠΊΠΎΠ²ΠΊΠΎΠΉ, Π½Π° ΠΏΠ΅ΡΠ²ΠΎΠΌ ΡΡΠ°ΠΏΠ΅ ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ ΡΠ²Π΅ΡΡΠΎΡΠ½Π°Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Π°Ρ ΡΠ΅ΡΡ, ΠΏΡΠ΅Π΄ΠΎΠ±ΡΡΠ΅Π½Π½Π°Ρ Π½Π° ΡΠΏΠ΅ΡΠΈΠ°Π»ΡΠ½ΠΎ ΡΠΎΠ·Π΄Π°Π½Π½ΠΎΠΌ Π½Π°Π±ΠΎΡΠ΅ ΡΠ΅Π΄ΠΊΠΈΡ
Π΄Π°Π½Π½ΡΡ
. ΠΡΠΎΡΠΎΠΉ ΡΡΠ°ΠΏ β ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ² Π΄Π»Ρ ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΡ Π²ΡΡΠΈΡΠ»ΠΈΡΠ΅Π»ΡΠ½ΠΎΠΉ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ. ΠΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΠΎΠ΅ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ Π½Π° Π½Π°Π±ΠΎΡΠ΅ Π΄Π°Π½Π½ΡΡ
VGGFace2 Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΡ
Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ², Π² ΡΠΎΠΌ ΡΠΈΡΠ»Π΅ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ InsightFace, ΠΏΡΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡΡΠΈΡΠΎΠ²Π°Π»ΠΎ ΠΏΠΎΠ²ΡΡΠ΅Π½Π½ΡΡ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΡ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏ
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild
Demographic biases exist in current models used for facial recognition (FR).
Our Balanced Faces in the Wild (BFW) dataset is a proxy to measure bias across
ethnicity and gender subgroups, allowing one to characterize FR performances
per subgroup. We show that results are non-optimal when a single score
threshold determines whether sample pairs are genuine or imposters.
Furthermore, within subgroups, performance often varies significantly from the
global average. Thus, specific error rates only hold for populations matching
the validation data. We mitigate the imbalanced performances using a novel
domain adaptation learning scheme on the facial features extracted from
state-of-the-art neural networks, boosting the average performance. The
proposed method also preserves identity information while removing demographic
knowledge. The removal of demographic knowledge prevents potential biases from
being injected into decision-making and protects privacy since demographic
information is no longer available. We explore the proposed method and show
that subgroup classifiers can no longer learn from the features projected using
our domain adaptation scheme. For source code and data, see
https://github.com/visionjo/facerec-bias-bfw.Comment: arXiv admin note: text overlap with arXiv:2102.0894
Open-set face identification with automatic detection of out-of-distribution images
ΠΠ΄Π½ΠΎΠΉ ΠΈΠ· ΠΎΡΠ½ΠΎΠ²Π½ΡΡ
ΠΏΡΠΎΠ±Π»Π΅ΠΌ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΡ
Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ² Π² Π·Π°Π΄Π°ΡΠ΅ ΠΈΠ΄Π΅Π½ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ Π»ΠΈΡ ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΌΠ°Π»ΠΎΠ΅ ΡΠΈΡΠ»ΠΎ ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
ΠΏΡΠΈΠΌΠ΅ΡΠΎΠ² ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ³ΠΎ ΡΠΈΠΏΠ°: ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ ΠΏΠ»ΠΎΡ
ΠΎΠ³ΠΎ ΠΊΠ°ΡΠ΅ΡΡΠ²Π°, ΡΠ°Π·Π½ΡΠΉ ΠΌΠ°ΡΡΡΠ°Π± ΠΈΠ»ΠΈ ΠΎΡΠ²Π΅ΡΠ΅Π½ΠΈΠ΅, Π»ΠΈΡΠ° Π΄Π΅ΡΠ΅ΠΉ, ΠΏΠΎΠΆΠΈΠ»ΡΡ
Π»ΡΠ΄Π΅ΠΉ, ΡΠ΅Π΄ΠΊΠΈΠ΅ ΡΠ°ΡΡ. Π ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠ΅ ΡΠΎΡΠ½ΠΎΡΡΡ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΠΎΠΊΠ°Π·ΡΠ²Π°Π΅ΡΡΡ Π½ΠΈΠ·ΠΊΠΎΠΉ Π΄Π»Ρ Π²Ρ
ΠΎΠ΄Π½ΡΡ
ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ, Π½Π΅ ΠΏΠΎΡ
ΠΎΠΆΠΈΡ
Π½Π° Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²ΠΎ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π² Π½Π°Π±ΠΎΡΠ΅ Π΄Π°Π½Π½ΡΡ
, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΠΎΠΌ Π΄Π»Ρ Π½Π°ΡΡΡΠΎΠΉΠΊΠΈ ΠΌΠ΅ΡΠΎΠ΄Π° ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ². Π ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄Π»Π°Π³Π°Π΅ΡΡΡ ΡΠΏΠΎΡΠΎΠ± ΠΏΡΠ΅ΠΎΠ΄ΠΎΠ»Π΅Π½ΠΈΡ ΡΠ°ΠΊΠΎΠΉ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ Π·Π° ΡΡΠ΅Ρ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΎΠ±Π½Π°ΡΡΠΆΠ΅Π½ΠΈΡ Π½Π΅ΡΠΈΠΏΠΈΡΠ½ΡΡ
Π²Ρ
ΠΎΠ΄Π½ΡΡ
ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π²Π²Π΅Π΄Π΅Π½ΠΈΡ ΠΏΡΠ΅Π΄Π²Π°ΡΠΈΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ ΡΡΠ°ΠΏΠ° ΠΈΡ
Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΎΡΠ±ΡΠ°ΠΊΠΎΠ²ΠΊΠΈ. ΠΠ»Ρ ΡΡΠΎΠ³ΠΎ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ ΡΠΏΠ΅ΡΠΈΠ°Π»ΡΠ½Π°Ρ ΡΠ²ΡΡΡΠΎΡΠ½Π°Ρ ΡΠ΅ΡΡ, ΠΎΠ±ΡΡΠ΅Π½Π½Π°Ρ Π½Π° Π½Π°Π±ΠΎΡΠ΅ ΡΠ΅Π΄ΠΊΠΈΡ
Π΄Π°Π½Π½ΡΡ
, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΎΠ±ΡΠ°Π±Π°ΡΡΠ²Π°Π»ΠΈΡΡ Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΠΈΠ·Π²Π΅ΡΡΠ½ΡΡ
Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ. ΠΠ»Ρ ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΡ Π²ΡΡΠΈΡΠ»ΠΈΡΠ΅Π»ΡΠ½ΠΎΠΉ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ ΠΎ Π½Π°Π»ΠΈΡΠΈΠΈ ΡΠ΅Π΄ΠΊΠΎΠ³ΠΎ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅ΡΡΡ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠΎΠ³ΠΎ ΠΆΠ΅ Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠ° Π»ΠΈΡΠ°, ΠΊΠΎΡΠΎΡΡΠΉ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ Π² ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΎΡΠ΅. ΠΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΠΎΠ΅ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠ΄ΠΈΠ»ΠΎ ΠΏΡΠ΅ΠΈΠΌΡΡΠ΅ΡΡΠ²Π° Π² ΡΠΎΡΠ½ΠΎΡΡΠΈ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π° Π΄Π»Ρ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΡ
Π½Π°Π±ΠΎΡΠΎΠ² Π΄Π°Π½Π½ΡΡ
Π»ΠΈΡ ΠΈ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΡ
Π΄Π΅ΡΠΊΡΠΈΠΏΡΠΎΡΠΎΠ².ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΎ Π·Π° ΡΡΠ΅Ρ Π³ΡΠ°Π½ΡΠ° Π ΠΎΡΡΠΈΠΉΡΠΊΠΎΠ³ΠΎ Π½Π°ΡΡΠ½ΠΎΠ³ΠΎ ΡΠΎΠ½Π΄Π° (ΠΏΡΠΎΠ΅ΠΊΡ No 20-71-10010). ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΠΈΠΊΠΎΠ»Π΅Π½ΠΊΠΎ Π‘.Π. ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠ°Π½ΠΎ Π‘Π°Π½ΠΊΡ-ΠΠ΅ΡΠ΅ΡΠ±ΡΡΠ³ΡΠΊΠΈΠΌ Π³ΠΎΡΡΠ΄Π°ΡΡΡΠ²Π΅Π½Π½ΡΠΌ ΡΠ½ΠΈΠ²Π΅ΡΡΠΈΡΠ΅ΡΠΎΠΌ, ΠΏΡΠΎΠ΅ΠΊΡ β 73555239 Β«ΠΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΠΉ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡ ΠΈ Π½Π°ΡΠΊΠ° ΠΎ Π΄Π°Π½Π½ΡΡ
: ΡΠ΅ΠΎΡΠΈΡ, ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡ, ΠΎΡΡΠ°ΡΠ»Π΅Π²ΡΠ΅ ΠΈ ΠΌΠ΅ΠΆΠ΄ΠΈΡΡΠΈΠΏΠ»ΠΈΠ½Π°ΡΠ½ΡΠ΅ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΠΈ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡΒ»
Self-supervised Face Representation Learning
This thesis investigates fine-tuning deep face features in a self-supervised manner for discriminative face representation learning, wherein we develop methods to automatically generate pseudo-labels for training a neural network. Most importantly solving this problem helps us to advance the state-of-the-art in representation learning and can be beneficial to a variety of practical downstream tasks. Fortunately, there is a vast amount of videos on the internet that can be used by machines to learn an effective representation. We present methods that can learn a strong face representation from large-scale data be the form of images or video.
However, while learning a good representation using a deep learning algorithm requires a large-scale dataset with manually curated labels, we propose self-supervised approaches to generate pseudo-labels utilizing the temporal structure of the video data and similarity constraints to get supervision from the data itself.
We aim to learn a representation that exhibits small distances between samples from the same person, and large inter-person distances in feature space. Using metric learning one could achieve that as it is comprised of a pull-term, pulling data points from the same class closer, and a push-term, pushing data points from a different class further away. Metric learning for improving feature quality is useful but requires some form of external supervision to provide labels for the same or different pairs. In the case of face clustering in TV series, we may obtain this supervision from tracks and other cues. The tracking acts as a form of high precision clustering (grouping detections within a shot) and is used to automatically generate positive and negative pairs of face images. Inspired from that we propose two variants of discriminative approaches: Track-supervised Siamese network (TSiam) and Self-supervised Siamese network (SSiam). In TSiam, we utilize the tracking supervision to obtain the pair, additional we include negative training pairs for singleton tracks -- tracks that are not temporally co-occurring. As supervision from tracking may not always be available, to enable the use of metric learning without any supervision we propose an effective approach SSiam that can generate the required pairs automatically during training. In SSiam, we leverage dynamic generation of positive and negative pairs based on sorting distances (i.e. ranking) on a subset of frames and do not have to only rely on video/track based supervision.
Next, we present a method namely Clustering-based Contrastive Learning (CCL), a new clustering-based representation learning approach that utilizes automatically discovered partitions obtained from a clustering algorithm (FINCH) as weak supervision along with inherent video constraints to learn discriminative face features. As annotating datasets is costly and difficult, using label-free and weak supervision obtained from a clustering algorithm as a proxy learning task is promising. Through our analysis, we show that creating positive and negative training pairs using clustering predictions help to improve the performance for video face clustering.
We then propose a method face grouping on graphs (FGG), a method for unsupervised fine-tuning of deep face feature representations. We utilize a graph structure with positive and negative edges over a set of face-tracks based on their temporal structure of the video data and similarity-based constraints. Using graph neural networks, the features communicate over the edges allowing each track\u27s feature to exchange information with its neighbors, and thus push each representation in a direction in feature space that groups all representations of the same person together and separates representations of a different person.
Having developed these methods to generate weak-labels for face representation learning, next we propose to learn compact yet effective representation for describing face tracks in videos into compact descriptors, that can complement previous methods towards learning a more powerful face representation. Specifically, we propose Temporal Compact Bilinear Pooling (TCBP) to encode the temporal segments in videos into a compact descriptor. TCBP possesses the ability to capture interactions between each element of the feature representation with one-another over a long-range temporal context. We integrated our previous methods TSiam, SSiam and CCL with TCBP and demonstrated that TCBP has excellent capabilities in learning a strong face representation. We further show TCBP has exceptional transfer abilities to applications such as multimodal video clip representation that jointly encodes images, audio, video and text, and video classification.
All of these contributions are demonstrated on benchmark video clustering datasets: The Big Bang Theory, Buffy the Vampire Slayer and Harry Potter 1. We provide extensive evaluations on these datasets achieving a significant boost in performance over the base features, and in comparison to the state-of-the-art results
Introduction: Ways of Machine Seeing
How do machines, and, in particular, computational technologies, change the way we see the world? This special issue brings together researchers from a wide range of disciplines to explore the entanglement of machines and their ways of seeing from new critical perspectives.
This 'editorial' is for a special issue of AI & Society, which includes contributions from: MarΓa JesΓΊs Schultz Abarca, Peter Bell, Tobias Blanke, Benjamin Bratton, Claudio Celis Bueno, Kate Crawford, Iain Emsley, Abelardo Gil-Fournier, Daniel ChΓ‘vez Heras, Vladan Joler, Nicolas MalevΓ©, Lev Manovich, Nicholas Mirzoeff, Perle MΓΈhl, Bruno Moreschi, Fabian Offert, Trevor Paglan, Jussi Parikka, Luciana Parisi, Matteo Pasquinelli, Gabriel Pereira, Carloalberto Treccani, Rebecca Uliasz, and Manuel van der Veen
Image and Video Forensics
Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity