549,441 research outputs found
Linear Object Classes and Image Synthesis from a Single Example Image
The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced techniques that are applicable under restricted conditions but simpler. The approach exploits image transformations that are specific to the relevant object class and learnable from example views of other "prototypical" objects of the same class. In this paper, we introduce such a new technique by extending the notion of linear class first proposed by Poggio and Vetter. For linear object classes it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively "rotate" high- resolution face images from a single 2D view
Exemplar Learning for Medical Image Segmentation
Medical image annotation typically requires expert knowledge and hence incurs
time-consuming and expensive data annotation costs. To reduce this burden, we
propose a novel learning scenario, Exemplar Learning (EL), to explore automated
learning processes for medical image segmentation from a single annotated image
example. This innovative learning task is particularly suitable for medical
image segmentation, where all categories of organs can be presented in one
single image for annotation all at once. To address this challenging EL task,
we propose an Exemplar Learning-based Synthesis Net (ELSNet) framework for
medical image segmentation that enables innovative exemplar-based data
synthesis, pixel-prototype based contrastive embedding learning, and
pseudo-label based exploitation of the unlabeled data. Specifically, ELSNet
introduces two new modules for image segmentation: an exemplar-guided synthesis
module, which enriches and diversifies the training set by synthesizing
annotated samples from the given exemplar, and a pixel-prototype based
contrastive embedding module, which enhances the discriminative capacity of the
base segmentation model via contrastive self-supervised learning. Moreover, we
deploy a two-stage process for segmentation model training, which exploits the
unlabeled data with predicted pseudo segmentation labels. To evaluate this new
learning framework, we conduct extensive experiments on several organ
segmentation datasets and present an in-depth analysis. The empirical results
show that the proposed exemplar learning framework produces effective
segmentation results
Learning to restore multiple image degradations simultaneously
Image corruptions are common in the real world, for example images in the wild may come with unknown blur, bias field, noise, or other kinds of non-linear distributional shifts, thus hampering encoding methods and rendering downstream task unreliable. Image upgradation requires a complicated balance between high-level contextualised information and spatial specific details. Existing approaches to solving the problems are designed to focus on single corruption, which unavoidably results in poor performance when the acquisitions suffer from multiple degradations. In this study, we investigate the possibility of handling multiple degradations and enhancing the quality of images via deblurring, bias field correction, and denoising. To tackle the problems with propagating errors caused by independent learning, we propose a unified and scalable framework, which consists of three special decoders. Two decoders learn artifact attention from provided images thereby generating realistic individual artifact and multiple artifacts on single image; the third decoder is trained towards removing artifact on the synthetic image with multiple corruptions thereby generating high quality image. We additionally provide improvements over previous image degradation synthesis approaches by modelling multiple image degradations directly from data observations. We first create a toy MNIST dataset and investigate the properties of the proposed algorithm. We then use brain MRI datasets to demonstrate our methodβs robustness, including both simulated (where necessary) and real-world artifacts. In addition, our method can be used for single/or multiple degradation(s) synthesis by implementing the learned degradation operators in a new domain from a given dataset. The code will be released upon acceptance of the paper
Cross domain Image Transformation and Generation by Deep Learning
Compared with single domain learning, cross-domain learning is more challenging due to the large domain variation. In addition, cross-domain image synthesis is more difficult than other cross learning problems, including, for example, correlation analysis, indexing, and retrieval, because it needs to learn complex function which contains image details for photo-realism. This work investigates cross-domain image synthesis in two common and challenging tasks, i.e., image-to-image and non-image-to-image transfer/synthesis.The image-to-image transfer is investigated in Chapter 2, where we develop a method for transformation between face images and sketch images while preserving the identity. Different from existing works that conduct domain transfer in a one-pass manner, we design a recurrent bidirectional transformation network (r-BTN), which allows bidirectional domain transfer in an integrated framework. More importantly, it could perceptually compose partial inputs from two domains to simultaneously synthesize face and sketch images with consistent identity. Most existing works could well synthesize images from patches that cover at least 70% of the original image. The proposed r-BTN could yield appealing results from patches that cover less than 10% because of the recursive estimation of the missing region in an incremental manner. Extensive experiments have been conducted to demonstrate the superior performance of r-BTN as compared to existing solutions.Chapter 3 targets at image transformation/synthesis from non-image sources, i.e., generating talking face based on the audio input. Existing works either do not consider temporal dependency thus yielding abrupt facial/lip movement or are limited to the generation for a specific person thus lacking generalization capacity. A novel conditional recurrent generation network which incorporates image and audio features in the recurrent unit for temporal dependency is proposed such that smooth transition can be achieved for lip and facial movements. To achieve image- and video-realism, we adopt a pair of spatial-temporal discriminators. Accurate lip synchronization is essential to the success of talking face video generation where we construct a lip-reading discriminator to boost the accuracy of lip synchronization. Extensive experiments demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition regarding lip and facial movement
End-to-End Optimization of Scene Layout
We propose an end-to-end variational generative model for scene layout
synthesis conditioned on scene graphs. Unlike unconditional scene layout
generation, we use scene graphs as an abstract but general representation to
guide the synthesis of diverse scene layouts that satisfy relationships
included in the scene graph. This gives rise to more flexible control over the
synthesis process, allowing various forms of inputs such as scene layouts
extracted from sentences or inferred from a single color image. Using our
conditional layout synthesizer, we can generate various layouts that share the
same structure of the input example. In addition to this conditional generation
design, we also integrate a differentiable rendering module that enables layout
refinement using only 2D projections of the scene. Given a depth and a
semantics map, the differentiable rendering module enables optimizing over the
synthesized layout to fit the given input in an analysis-by-synthesis fashion.
Experiments suggest that our model achieves higher accuracy and diversity in
conditional scene synthesis and allows exemplar-based scene generation from
various input forms.Comment: CVPR 2020 (Oral). Project page: http://3dsln.csail.mit.edu
Two-stage filtration algorithm with interframe causal processing for multichannel image with presence of uncorrelated noise
Π Π²ΠΈΠΊΠΎΡΠΈΡΡΠ°Π½Π½ΡΠΌ Π²Π»Π°ΡΡΠΈΠ²ΠΎΡΡΡ ΡΠΌΠΎΠ²Π½ΠΎΡ Π½Π΅Π·Π°Π»Π΅ΠΆΠ½ΠΎΡΡΡ ΠΎΡΡΠΈΠΌΠ°Π½ΠΎ Π²ΠΈΡΠ°Π· Π΄Π»Ρ Π°ΠΏΠΎΡΡΠ΅ΡΡΠΎΡΠ½ΠΎΡ ΡΡΠ»ΡΠ½ΠΎΡΡΡ ΠΉΠΌΠΎΠ²ΡΡΠ½ΠΎΡΡΡ Π²ΡΠ΄Π»ΡΠΊΡΠ² Π±Π°Π³Π°ΡΠΎΠΊΠ°Π½Π°Π»ΡΠ½ΠΈΡ
Π·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½Ρ ΠΏΡΠΈ Π΄Π²ΠΎΠ΅ΡΠ°ΠΏΠ½ΡΠΉ ΡΡΠ»ΡΡΡΠ°ΡΡΡ Π· Π²Π½ΡΡΡΡΡΠ½ΡΠΎΠΊΠ°Π΄ΡΠΎΠ²ΠΎΡ ΠΊΠ°ΡΠ·Π°Π»ΡΠ½ΠΎΡ ΠΎΠ±ΡΠΎΠ±ΠΊΠΎΡ ΠΏΡΠΈ Π½Π°ΡΠ²Π½ΠΎΡΡΡ Π½Π΅ΠΊΠΎΡΠ΅Π»ΡΠΎΠ²Π°Π½ΠΎΡ Π·Π°Π²Π°Π΄ΠΈ. ΠΡΡΠΈΠΌΠ°Π½ΠΎ Π²ΠΈΡΠ°Π·ΠΈ Π΄Π»Ρ ΠΎΠ±ΡΠΈΡΠ»Π΅Π½Π½Ρ ΡΡ ΠΏΠ΅ΡΡΠΎΠ³ΠΎ Ρ Π΄ΡΡΠ³ΠΎΠ³ΠΎ ΠΌΠΎΠΌΠ΅Π½ΡΡΠ² Ρ Π²ΠΈΠΏΠ°Π΄ΠΊΡ Π³Π°ΡΡΡΡΠ²ΡΠΊΠΈΡ
Π±Π°Π³Π°ΡΠΎΠΊΠ°Π½Π°Π»ΡΠ½ΠΈΡ
Π·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½Ρ. ΠΠ½Π°Π»ΡΠ· Π°Π»Π³ΠΎΡΠΈΡΠΌΡ ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½ΠΎ Π½Π° ΠΌΠΎΠ΄Π΅Π»ΡΠ½ΠΎΠΌΡ ΠΏΡΠΈΠΊΠ»Π°Π΄Ρ Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΡΠ²Π°Π½Π½Ρ Π½Π° ΠΠΠ.Introduction. When solving a number of practical problems the usage of multichannel images is common practice. Multichannel of this data permits or increases the efficiency of solving the problem, or allows to obtain useful information, which in principle cannot be extracted from the single-channel images. One of the main types of noise occurring in a multichannel image is uncorrelated noise. The optimal image filtering algorithms require enormous computational cost. Therefore, the important practical value is the synthesis of multi-channel image filtering algorithms, providing the required performance indicators at moderate computational cost.
Theoretical results. Using conditional independence properties, the expression for the a posteriori probability density of pixels at the two-stage multi-channel image filtration with causal frame processing in the presence of uncorrelated noise is obtained. Gaussian algorithm for determining the estimates of image pixels and error variance estimation with causal intra and inter-frame processing is obtained in the case of multichannel image.
Experimental results. The developed algorithm for considered example allows increasing the filtration accuracy of the sequence of homogeneous Gaussian images on a 20% - 45% compared to inter-frame averaging algorithm.
Conclusion. Optimal and quasi-two-stage multi-channel image filtration algorithms were synthesized. In algorithms the first stage is one-dimensional causal filtration along each of the coordinates, and the second is the union of the results. These algorithms allow reducing the computational cost in comparison with the optimal algorithm and thus ensuring acceptable accuracy characteristics.Π‘ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΡΠ²ΠΎΠΉΡΡΠ²Π° ΡΡΠ»ΠΎΠ²Π½ΠΎΠΉ Π½Π΅Π·Π°Π²ΠΈΡΠΈΠΌΠΎΡΡΠΈ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΎ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΠ΅ Π΄Π»Ρ Π°ΠΏΠΎΡΡΠ΅ΡΠΈΠΎΡΠ½ΠΎΠΉ ΠΏΠ»ΠΎΡΠ½ΠΎΡΡΠΈ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠΈ ΠΎΡΡΡΠ΅ΡΠΎΠ² ΠΌΠ½ΠΎΠ³ΠΎΠΊΠ°Π½Π°Π»ΡΠ½ΡΡ
ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ ΠΏΡΠΈ Π΄Π²ΡΡ
ΡΡΠ°ΠΏΠ½ΠΎΠΉ ΡΠΈΠ»ΡΡΡΠ°ΡΠΈΠΈ Ρ Π²Π½ΡΡΡΠΈΠΊΠ°Π΄ΡΠΎΠ²ΠΎΠΉ ΠΊΠ°ΡΠ·Π°Π»ΡΠ½ΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΎΠΉ ΠΏΡΠΈ Π½Π°Π»ΠΈΡΠΈΠΈ Π½Π΅ΠΊΠΎΡΡΠ΅Π»ΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠΉ ΠΏΠΎΠΌΠ΅Ρ
ΠΈ. ΠΠΎΠ»ΡΡΠ΅Π½Ρ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΡ Π΄Π»Ρ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ Π΅Π΅ ΠΏΠ΅ΡΠ²ΠΎΠ³ΠΎ ΠΈ Π²ΡΠΎΡΠΎΠ³ΠΎ ΠΌΠΎΠΌΠ΅Π½ΡΠΎΠ² Π² ΡΠ»ΡΡΠ°Π΅ Π³Π°ΡΡΡΠΎΠ²ΡΠΊΠΈΡ
ΠΌΠ½ΠΎΠ³ΠΎΠΊΠ°Π½Π°Π»ΡΠ½ΡΡ
ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ. ΠΠ½Π°Π»ΠΈΠ· Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½ Π½Π° ΠΌΠΎΠ΄Π΅Π»ΡΠ½ΠΎΠΌ ΠΏΡΠΈΠΌΠ΅ΡΠ΅ Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π½Π° ΠΠΠ
Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection
Malicious use of deepfakes leads to serious public concerns and reduces
people's trust in digital media. Although effective deepfake detectors have
been proposed, they are substantially vulnerable to adversarial attacks. To
evaluate the detector's robustness, recent studies have explored various
attacks. However, all existing attacks are limited to 2D image perturbations,
which are hard to translate into real-world facial changes. In this paper, we
propose adversarial head turn (AdvHeat), the first attempt at 3D adversarial
face views against deepfake detectors, based on face view synthesis from a
single-view fake image. Extensive experiments validate the vulnerability of
various detectors to AdvHeat in realistic, black-box scenarios. For example,
AdvHeat based on a simple random search yields a high attack success rate of
96.8% with 360 searching steps. When additional query access is allowed, we can
further reduce the step budget to 50. Additional analyses demonstrate that
AdvHeat is better than conventional attacks on both the cross-detector
transferability and robustness to defenses. The adversarial images generated by
AdvHeat are also shown to have natural looks. Our code, including that for
generating a multi-view dataset consisting of 360 synthetic views for each of
1000 IDs from FaceForensics++, is available at
https://github.com/twowwj/AdvHeaT
Learning from one example in machine vision by sharing probability densities
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 125-130).Human beings exhibit rapid learning when presented with a small number of images of a new object. A person can identify an object under a wide variety of visual conditions after having seen only a single example of that object. This ability can be partly explained by the application of previously learned statistical knowledge to a new setting. This thesis presents an approach to acquiring knowledge in one setting and using it in another. Specifically, we develop probability densities over common image changes. Given a single image of a new object and a model of change learned from a different object, we form a model of the new object that can be used for synthesis, classification, and other visual tasks. We start by modeling spatial changes. We develop a framework for learning statistical knowledge of spatial transformations in one task and using that knowledge in a new task. By sharing a probability density over spatial transformations learned from a sample of handwritten letters, we develop a handwritten digit classifier that achieves 88.6% accuracy using only a single hand-picked training example from each class. The classification scheme includes a new algorithm, congealing, for the joint alignment of a set of images using an entropy minimization criterion. We investigate properties of this algorithm and compare it to other methods of addressing spatial variability in images. We illustrate its application to binary images, gray-scale images, and a set of 3-D neonatal magnetic resonance brain volumes.Next, we extend the method of change modeling from spatial transformations to color transformations. By measuring statistically common joint color changes of a scene in an office environment, and then applying standard statistical techniques such as principal components analysis, we develop a probabilistic model of color change. We show that these color changes, which we call color flows, can be shared effectively between certain types of scenes. That is, a probability density over color change developed by observing one scene can provide useful information about the variability of another scene. We demonstrate a variety of applications including image synthesis, image matching, and shadow detection.by Erik G. Miller.Ph.D
Bayesian inference for radio observations
New telescopes like the Square Kilometre Array (SKA) will push into a new sensitivity regime and expose systematics, such as direction-dependent effects, that could previously be ignored. Current methods for handling such systematics rely on alternating best estimates of instrumental calibration and models of the underlying sky, which can lead to inadequate uncertainty estimates and biased results because any correlations between parameters are ignored. These deconvolution algorithms produce a single image that is assumed to be a true representation of the sky, when in fact it is just one realization of an infinite ensemble of images compatible with the noise in the data. In contrast, here we report a Bayesian formalism that simultaneously infers both systematics and science. Our technique, Bayesian Inference for Radio Observations (BIRO), determines all parameters directly from the raw data, bypassing image-making entirely, by sampling from the joint posterior probability distribution. This enables it to derive both correlations and accurate uncertainties, making use of the flexible software meqtrees to model the sky and telescope simultaneously. We demonstrate BIRO with two simulated sets of Westerbork Synthesis Radio Telescope data sets. In the first, we perform joint estimates of 103 scientific (flux densities of sources) and instrumental (pointing errors, beamwidth and noise) parameters. In the second example, we perform source separation with BIRO. Using the Bayesian evidence, we can accurately select between a single point source, two point sources and an extended Gaussian source, allowing for βsuper-resolution' on scales much smaller than the synthesized bea
- β¦