Search CORE

13 research outputs found

Deep representation learning for photorealistic content creation

Author: Xia Xide
Publication venue
Publication date: 08/07/2021
Field of study

We study the problem of deep representation learning for photorealistic content creation. This is a critical component in many computer vision applications ranging from virtual reality, videography, and even retail and advertising. In this thesis, we use deep neural techniques to develop end-to-end models that are capable of generating photorealistic results. Our framework is applied in three applications. First, we study real-time universal Photorealistic Image Style Transfer. Photorealistic style transfer is the task of transferring the artistic style of an image onto a content target, producing a result that is plausibly taken with a camera. We propose a new end-to-end model for photorealistic style transfer that is both fast and inherently generates photorealistic results. The core of our approach is a feed-forward neural network that learns local edge-aware affine transforms that automatically obey the photorealism constraint. Our method produces visually superior results and is three orders of magnitude faster, enabling real-time performance at 4K on a mobile phone. Next, we learn real-time localized Photorealistic Video Style Transfer. We present a novel algorithm for transferring artistic styles of an image onto local regions of a target video while preserving its photorealism. Local regions may be selected either fully automatically from an image, through using video segmentation algorithms, or from casual user guidance such as scribbles. Our method is real-time and works on arbitrary inputs without runtime optimization once trained. We demonstrate our method on a variety of style images and target videos, including the ability to transfer different styles onto multiple objects simultaneously, and smoothly transition between styles in time. Lastly, we tackle the problem of attribute-based Fashion Image Retrieval and Content Creation. We present an effective approach for generating new outfits based on the input queries through generative adversarial learning. We address this challenge by decomposing the complicated process into two stages. In the first stage, we present a novel attribute-aware global ranking network for attribute-based fashion retrieval. In the second stage, a generative model is used to finalize the retrieved results conditioned on an individual’s preferred style. We demonstrate promising results on standard large-scale benchmarks

Boston University Institutional Repository (OpenBU)

Deep metric learning to rank

Author: Cakir Fatih
He Kun
Kulis Brian
Sclaroff Stan
Xia Xide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2019
Field of study

We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has a low complexity compared to existing methods, and is tailored for stochastic gradient descent. To fully exploit the benefits of the ranking formulation, we also propose a new minibatch sampling scheme, as well as a simple heuristic to enable large-batch training. On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.Accepted manuscrip

Crossref

Boston University Institutional Repository (OpenBU)

Learning to Approximate a Bregman Divergence

Author: Castanon David
Kulis Brian
Saligrama Venkatesh
Siahkamari Ali
Xia Xide
Publication venue
Publication date: 01/01/2020
Field of study

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We develop a formulation and algorithm for learning arbitrary Bregman divergences based on approximating their underlying convex generating function via a piecewise linear function. We provide theoretical approximation bounds using our parameterization and show that the generalization error

O_p(m^{-1/2})

for metric learning using our framework matches the known generalization error in the strictly less general Mahalanobis metric learning setting. We further demonstrate empirically that our method performs well in comparison to existing metric learning methods, particularly for clustering and ranking problems.Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

Author: Saenko Kate
Shah Hardik
Sun Ximeng
Xia Xide
Zhang Peizhao
Zhang Pengchuan
Publication venue
Publication date: 31/03/2023
Field of study

Large Vision-Language Foundation Models (VLFM), such as CLIP, ALIGN and Florence, are trained on large-scale datasets of image-caption pairs and achieve superior transferability and robustness on downstream tasks, but they are difficult to use in many practical applications due to their large size, high latency and fixed architectures. Unfortunately, recent work shows training a small custom VLFM for resource-limited applications is currently very difficult using public and smaller-scale data. In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences. We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28.4M unpaired public sentences. The resulting model "Distill-ViT-B/32" rivals the CLIP-ViT-B/32 model pre-trained on its private WiT dataset (400M image-text pairs): Distill-ViT-B/32 achieves similar results in terms of zero-shot and linear-probing performance on both ImageNet and the ELEVATER (20 image classification tasks) benchmarks. It also displays comparable robustness when evaluated on five datasets with natural distribution shifts from ImageNet

arXiv.org e-Print Archive

Recommended from our members

An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

Author: AlQuraishi Mohammed
Tang Shengdong
Xia Xide
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2015
Field of study

Background: Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. Description We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions: This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu

Harvard University - DASH

Surface effects on the mechanical properties of nanoporous materials

Author: Feng Xi-Qiao
Li Xide
Liu Jianlin
Qin Qing Hua
Xia Re
Publication venue: 'IOP Publishing'
Publication date: 24/02/2016
Field of study

Using the theory of surface elasticity, we investigate the mechanical properties of nanoporous materials. The classical theory of porous materials is modified to account for surface effects, which become increasingly important as the characteristic size

The Australian National University

Expression profile of MSK1 and p-MSK1 (Thr-581 and Ser-360) following LPS intracerebral injection.

Author: Dekang Nie (495871)
Jian Chen (15340)
Jinlong Shi (495868)
Lanchun Ni (495869)
Liang Xia (323036)
Peipei Gong (495866)
Qingfeng Huang (495870)
Wei Shi (68167)
Xiaojian Lu (495872)
Xide Xu (495867)
Publication venue
Publication date
Field of study

A. Protein levels of t-MSK1, p-MSK1 Thr-581, p-MSK1 Ser-360 were detected before (control) and after injury. GAPDH was also detected by Western blotting. B. Quantification graphs (relative optical density) of the intensity of staining of p-MSK1 (Thr-581) and total MSK1 to GAPDH at each time point. GAPDH was used to confirm that equal amounts of protein were run on the gel. C–H. Immunofluorescence staining of MSK1 and p-MSk1 (Thr581) was performed to assess the staining changes for MSK1 and p-MSK1 immunoreactivity in the cortex at day 1 after LPS-injection. I. Negative control. * and # indicate significant differences at P<0.05, compared with normal brain cortex. Scale bars: 40 µm (C–F), 20 µm (G–J).</p

The Francis Crick Institute