378 research outputs found
Language-Aware Soft Prompting: Text-to-Text Optimization for Fewand Zero-Shot Adaptation of V&L Models
Soft prompt learning has emerged as a promising direction for adapting V &L models to a downstream task using a few training examples. However, current methods significantly overfit the training data suffering from large accuracy degradation when tested on unseen classes from the same domain. In addition, all prior methods operate exclusively under the assumption that both vision and language data is present. To this end, we make the following 5 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we also propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) Moreover, we identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) Importantly, we show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Expanding for the first time the setting to language-only adaptation, (5) we present a novel zero-shot variant of LASP where no visual samples at all are available for the downstream task. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Finally, (c) we show that our zero-shot variant improves upon CLIP without requiring any extra data. Code will be made available
How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks)
This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and finally evaluate it on all other 2D facial landmark datasets. (b)We create a guided by 2D landmarks network which converts 2D landmark annotations to 3D and unifies all existing datasets, leading to the creation of LS3D-W, the largest and most challenging 3D facial landmark dataset to date (~230,000 images). (c) Following that, we train a neural network for 3D face alignment and evaluate it on the newly introduced LS3D-W. (d) We further look into the effect of all “traditional” factors affecting face alignment performance like large pose, initialization and resolution, and introduce a “new” one, namely the size of the network. (e) We show that both 2D and 3D face alignment networks achieve performance of remarkable accuracy which is probably close to saturating the datasets used. Training and testing code as well as the dataset can be downloaded from https: //www.adrianbulat.com/face-alignment
Optimized Effective Potentials in Finite Basis Sets
The finite basis optimized effective potential (OEP) method within density
functional theory is examined as an ill-posed problem. It is shown that the
generation of nonphysical potentials is a controllable manifestation of the use
of unbalanced, and thus unsuitable, basis sets. A modified functional
incorporating a regularizing smoothness measure of the OEP is introduced. This
provides a condition on balanced basis sets for the potential, as well as a
method to determine the most appropriate OEP potential and energy from
calculations performed with any finite basis set.Comment: 23 pages, 28 figure
Large pose 3D face reconstruction from a single image via direct volumetric CNN regression
3D face reconstruction is a fundamental Computer Vision problem of
extraordinary difficulty. Current systems often assume the availability of
multiple facial images (sometimes from the same subject) as input, and must
address a number of methodological challenges such as establishing dense
correspondences across large facial poses, expressions, and non-uniform
illumination. In general these methods require complex and inefficient
pipelines for model building and fitting. In this work, we propose to address
many of these limitations by training a Convolutional Neural Network (CNN) on
an appropriate dataset consisting of 2D images and 3D facial models or scans.
Our CNN works with just a single 2D facial image, does not require accurate
alignment nor establishes dense correspondence between images, works for
arbitrary facial poses and expressions, and can be used to reconstruct the
whole 3D facial geometry (including the non-visible parts of the face)
bypassing the construction (during training) and fitting (during testing) of a
3D Morphable Model. We achieve this via a simple CNN architecture that performs
direct regression of a volumetric representation of the 3D facial geometry from
a single 2D image. We also demonstrate how the related task of facial landmark
localization can be incorporated into the proposed framework and help improve
reconstruction quality, especially for the cases of large poses and facial
expressions. Testing code will be made available online, along with pre-trained
models http://aaronsplace.co.uk/papers/jackson2017reconComment: 10 pages, ICCV 201
ISLAM AND THE DYNAMICS OF ETHNO-CONFESSIONAL REGIMES IN RUSSIA, 1990-2012
What explains Russian state policies toward Islam during the first two decades after the Soviet collapse? Research on secularism and state policies toward religion suggests several models of interaction. However, these models are often better at describing static relationships than they are at explaining change. This study advances a framework for understanding the conditions that presage a transformation of state-religion relations by examining significant differences between Russian state attitudes toward Islam in the early 1990s and the 2000s. In particular, notable changes in the licensing of Imams, the building permissions granted for mosques, and registration requirements for religious organizations call for explanation. In the 1990s, state-Islam relations were accommodationist: the state granted unrestricted access to the Russian public sphere for all Muslim communities and allowed a wide range of Islamic religious practices. State-Islam relations after about 2000 became regulatory: the state assumed a more active interventional role in the domestic Islamic community in order to control religious practices of particular Muslim factions and assure a privileged access to the Russian public sphere for state-approved “traditional” religious organizations. This study finds that, contingent on the interplay among competing national ideologies, which shape the country’s ethno-confessional regime, the state may either embrace unrestricted religious pluralism or adopt a regulatory stance toward certain religious communities. In their turn, structural factors such as public safety conditions and economic performance of the country may play an important role in determining the outcome of a struggle for ideological dominance. This framework largely explains the dynamics of Russian state attitudes toward the largest minority religion in the country during the first two decades after the collapse of the Soviet state and offers analytical insights on the dynamic nature of state-Islam relations in other secular states with considerable Muslim populations
- …