588 research outputs found
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
Recent remarkable improvements in large-scale text-to-image generative models
have shown promising results in generating high-fidelity images. To further
enhance editability and enable fine-grained generation, we introduce a
multi-input-conditioned image composition model that incorporates a sketch as a
novel modal, alongside a reference image. Thanks to the edge-level
controllability using sketches, our method enables a user to edit or complete
an image sub-part with a desired structure (i.e., sketch) and content (i.e.,
reference image). Our framework fine-tunes a pre-trained diffusion model to
complete missing regions using the reference image while maintaining sketch
guidance. Albeit simple, this leads to wide opportunities to fulfill user needs
for obtaining the in-demand images. Through extensive experiments, we
demonstrate that our proposed method offers unique use cases for image
manipulation, enabling user-driven modifications of arbitrary scenes.Comment: 7 pages; Code URL: https://github.com/kangyeolk/Paint-by-Sketc
Improving Scene Text Recognition for Character-Level Long-Tailed Distribution
Despite the recent remarkable improvements in scene text recognition (STR),
the majority of the studies focused mainly on the English language, which only
includes few number of characters. However, STR models show a large performance
degradation on languages with a numerous number of characters (e.g., Chinese
and Korean), especially on characters that rarely appear due to the long-tailed
distribution of characters in such languages. To address such an issue, we
conducted an empirical analysis using synthetic datasets with different
character-level distributions (e.g., balanced and long-tailed distributions).
While increasing a substantial number of tail classes without considering the
context helps the model to correctly recognize characters individually,
training with such a synthetic dataset interferes the model with learning the
contextual information (i.e., relation among characters), which is also
important for predicting the whole word. Based on this motivation, we propose a
novel Context-Aware and Free Experts Network (CAFE-Net) using two experts: 1)
context-aware expert learns the contextual representation trained with a
long-tailed dataset composed of common words used in everyday life and 2)
context-free expert focuses on correctly predicting individual characters by
utilizing a dataset with a balanced number of characters. By training two
experts to focus on learning contextual and visual representations,
respectively, we propose a novel confidence ensemble method to compensate the
limitation of each expert. Through the experiments, we demonstrate that
CAFE-Net improves the STR performance on languages containing numerous number
of characters. Moreover, we show that CAFE-Net is easily applicable to various
STR models.Comment: 17 page
Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts
Test-time adaptation (TTA) aims to adapt a pre-trained model to the target
domain in a batch-by-batch manner during inference. While label distributions
often exhibit imbalances in real-world scenarios, most previous TTA approaches
typically assume that both source and target domain datasets have balanced
label distribution. Due to the fact that certain classes appear more frequently
in certain domains (e.g., buildings in cities, trees in forests), it is natural
that the label distribution shifts as the domain changes. However, we discover
that the majority of existing TTA methods fail to address the coexistence of
covariate and label shifts. To tackle this challenge, we propose a novel label
shift adapter that can be incorporated into existing TTA approaches to deal
with label shifts during the TTA process effectively. Specifically, we estimate
the label distribution of the target domain to feed it into the label shift
adapter. Subsequently, the label shift adapter produces optimal parameters for
the target label distribution. By predicting only the parameters for a part of
the pre-trained source model, our approach is computationally efficient and can
be easily applied, regardless of the model architectures. Through extensive
experiments, we demonstrate that integrating our strategy with TTA approaches
leads to substantial performance improvements under the joint presence of label
and covariate shifts.Comment: Accepted to ICCV 202
Low-swing signaling for energy efficient on-chip networks
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 65-69).On-chip networks have emerged as a scalable and high-bandwidth communication fabric in many-core processor chips. However, the energy consumption of these networks is becoming comparable to that of computation cores, making further scaling of core counts difficult. This thesis makes several contributions to low-swing signaling circuit design for the energy efficient on-chip networks in two separate projects: on-chip networks optimized for one-to-many multicasts and broadcasts, and link designs that allow on-chip networks to approach an ideal interconnection fabric. A low-swing crossbar switch, which is based on tri-state Reduced-Swing Drivers (RSDs), is presented for the first project. Measurement results of its test chip fabricated in 45nm SOI CMOS show that the tri-state RSD-based crossbar enables 55% power savings as compared to an equivalent full-swing crossbar and link. Also, the measurement results show that the proposed crossbar allows the broadcast-optimized on-chip networks using a single pipeline stage for physical data transmission to operate at 21% higher data rate, when compared with the full-swing networks. For the second project, two clockless low-swing repeaters, a Self-Resetting Logic Repeater (SRLR) and a Voltage-Locked Repeater (VLR), have been proposed and analyzed in simulation only. They both require no reference clock, differential signaling, and bias current. Such digital-intensive properties enable them to approach energy and delay performance of a point-to-point interconnect of variable lengths. Simulated in 45nm SOI CMOS, the 10mm SRLR featured with high energy efficiency consumes 338fJ/b at 5.4Gb/s/ch while the 10mm VLR raises its data rate up to 16.OGb/s/ch with 427fJ/b.by Sunghyun Park.S.M
Estimation of Water Quality Index for Coastal Areas in Korea Using GOCI Satellite Data Based on Machine Learning Approaches
In Korea, most industrial parks and major cities are located in coastal areas, which results in serious environmental problems in both coastal land and ocean. In order to effectively manage such problems especially in coastal ocean, water quality should be monitored. As there are many factors that influence water quality, the Korean Government proposed an integrated Water Quality Index (WQI) based on in situ measurements of ocean parameters(bottom dissolved oxygen, chlorophyll-a concentration, secchi disk depth, dissolved inorganic nitrogen, and dissolved inorganic phosphorus) by ocean division identified based on their ecological characteristics. Field-measured WQI, however, does not provide spatial continuity over vast areas. Satellite remote sensing can be an alternative for identifying WQI for surface water. In this study, two schemes were examined to estimate coastal WQI around Korea peninsula using in situ measurements data and Geostationary Ocean Color Imager (GOCI) satellite imagery from 2011 to 2013 based on machine learning approaches. Scheme 1 calculates WQI using estimated water quality-related factors using GOCI reflectance data, and scheme 2 estimates WQI using GOCI band reflectance data and basic products(chlorophyll-a, suspended sediment, colored dissolved organic matter). Three machine learning approaches including Random Forest (RF), Support Vector Regression (SVR), and a modified regression tree(Cubist) were used. Results show that estimation of secchi disk depth produced the highest accuracy among the ocean parameters, and RF performed best regardless of water quality-related factors. However, the accuracy of WQI from scheme 1 was lower than that from scheme 2 due to the estimation errors inherent from water quality-related factors and the uncertainty of bottom dissolved oxygen. In overall, scheme 2 appears more appropriate for estimating WQI for surface water in coastal areas and chlorophyll-a concentration was identified the most contributing factor to the estimation of WQI.ope
High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions
Image-based virtual try-on aims to synthesize an image of a person wearing a
given clothing item. To solve the task, the existing methods warp the clothing
item to fit the person's body and generate the segmentation map of the person
wearing the item before fusing the item with the person. However, when the
warping and the segmentation generation stages operate individually without
information exchange, the misalignment between the warped clothes and the
segmentation map occurs, which leads to the artifacts in the final image. The
information disconnection also causes excessive warping near the clothing
regions occluded by the body parts, so-called pixel-squeezing artifacts. To
settle the issues, we propose a novel try-on condition generator as a unified
module of the two stages (i.e., warping and segmentation generation stages). A
newly proposed feature fusion block in the condition generator implements the
information exchange, and the condition generator does not create any
misalignment or pixel-squeezing artifacts. We also introduce discriminator
rejection that filters out the incorrect segmentation map predictions and
assures the performance of virtual try-on frameworks. Experiments on a
high-resolution dataset demonstrate that our model successfully handles the
misalignment and occlusion, and significantly outperforms the baselines. Code
is available at https://github.com/sangyun884/HR-VITON.Comment: Accepted to ECCV 202
RobustSwap: A Simple yet Robust Face Swapping Model against Attribute Leakage
Face swapping aims at injecting a source image's identity (i.e., facial
features) into a target image, while strictly preserving the target's
attributes, which are irrelevant to identity. However, we observed that
previous approaches still suffer from source attribute leakage, where the
source image's attributes interfere with the target image's. In this paper, we
analyze the latent space of StyleGAN and find the adequate combination of the
latents geared for face swapping task. Based on the findings, we develop a
simple yet robust face swapping model, RobustSwap, which is resistant to the
potential source attribute leakage. Moreover, we exploit the coordination of
3DMM's implicit and explicit information as a guidance to incorporate the
structure of the source image and the precise pose of the target image. Despite
our method solely utilizing an image dataset without identity labels for
training, our model has the capability to generate high-fidelity and temporally
consistent videos. Through extensive qualitative and quantitative evaluations,
we demonstrate that our method shows significant improvements compared with the
previous face swapping models in synthesizing both images and videos. Project
page is available at https://robustswap.github.io/Comment: 21 page
- …