229 research outputs found
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Image captioning is one of the straightforward tasks that can take advantage
of large-scale web-crawled data which provides rich knowledge about the visual
world for a captioning model. However, since web-crawled data contains
image-text pairs that are aligned at different levels, the inherent noises
(e.g., misaligned pairs) make it difficult to learn a precise captioning model.
While the filtering strategy can effectively remove noisy data, however, it
leads to a decrease in learnable knowledge and sometimes brings about a new
problem of data deficiency. To take the best of both worlds, we propose a
noise-aware learning framework, which learns rich knowledge from the whole
web-crawled data while being less affected by the noises. This is achieved by
the proposed quality controllable model, which is learned using alignment
levels of the image-text pairs as an additional control signal during training.
The alignment-conditioned training allows the model to generate high-quality
captions of well-aligned by simply setting the control signal to desired
alignment level at inference time. Through in-depth analysis, we show that our
controllable captioning model is effective in handling noise. In addition, with
two tasks of zero-shot captioning and text-to-image retrieval using generated
captions (i.e., self-retrieval), we also demonstrate our model can produce
high-quality captions in terms of descriptiveness and distinctiveness. Code is
available at \url{https://github.com/kakaobrain/noc}
Large Language Models can Share Images, Too!
This paper explores the image-sharing capability of Large Language Models
(LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting,
without the help of visual foundation models. Inspired by the two-stage process
of image-sharing in human dialogues, we propose a two-stage framework that
allows LLMs to predict potential image-sharing turns and generate related image
descriptions using our effective restriction-based prompt template. With
extensive experiments, we unlock the \textit{image-sharing} capability of LLMs
in zero-shot prompting, with GPT-4 achieving the best performance.
Additionally, we uncover the emergent \textit{image-sharing} ability in
zero-shot prompting, demonstrating the effectiveness of restriction-based
prompts in both stages of our framework. Based on this framework, we augment
the PhotoChat dataset with images generated by Stable Diffusion at predicted
turns, namely PhotoChat++. To our knowledge, this is the first study to assess
the image-sharing ability of LLMs in a zero-shot setting without visual
foundation models. The source code and the dataset will be released after
publication
Tgif1 Counterbalances The Activity Of Core Pluripotency Factors In Mouse Embryonic Stem Cells
Core pluripotency factors, such as Oct4, Sox2, and Nanog, play important roles in maintaining embryonic stem cell (ESC) identity by autoregulatory feedforward loops. Nevertheless, the mechanism that provides precise control of the levels of the ESC core factors without indefinite amplification has remained elusive. Here, we report the direct repression of core pluripotency factors by Tgif1, a previously known terminal repressor of TGF beta/activin/nodal signaling. Overexpression of Tgif1 reduces the levels of ESC core factors, whereas its depletion leads to the induction of the pluripotency factors. We confirm the existence of physical associations between Tgif1 and Oct4, Nanog, and HDAC1/2 and further show the level of Tgif1 is not significantly altered by treatment with an activator/inhibitor of the TGF beta/activin/nodal signaling. Collectively, our findings establish Tgif1 as an integral member of the core regulatory circuitry of mouse ESCs that counterbalances the levels of the core pluripotency factors in a TGF beta/activin/nodal-independent manner.Cancer Prevention Research Institute of Texas (CPRIT) R1106Molecular Bioscience
Single Cell Training on Architecture Search for Image Denoising
Neural Architecture Search (NAS) for automatically finding the optimal
network architecture has shown some success with competitive performances in
various computer vision tasks. However, NAS in general requires a tremendous
amount of computations. Thus reducing computational cost has emerged as an
important issue. Most of the attempts so far has been based on manual
approaches, and often the architectures developed from such efforts dwell in
the balance of the network optimality and the search cost. Additionally, recent
NAS methods for image restoration generally do not consider dynamic operations
that may transform dimensions of feature maps because of the dimensionality
mismatch in tensor calculations. This can greatly limit NAS in its search for
optimal network structure. To address these issues, we re-frame the optimal
search problem by focusing at component block level. From previous work, it's
been shown that an effective denoising block can be connected in series to
further improve the network performance. By focusing at block level, the search
space of reinforcement learning becomes significantly smaller and evaluation
process can be conducted more rapidly. In addition, we integrate an innovative
dimension matching modules for dealing with spatial and channel-wise mismatch
that may occur in the optimal design search. This allows much flexibility in
optimal network search within the cell block. With these modules, then we
employ reinforcement learning in search of an optimal image denoising network
at a module level. Computational efficiency of our proposed Denoising Prior
Neural Architecture Search (DPNAS) was demonstrated by having it complete an
optimal architecture search for an image restoration task by just one day with
a single GPU
Motion correction for phase-resolved dynamic optical coherence tomography imaging of rodent cerebral cortex
Cardiac and respiratory motions in animals are the primary source of image quality degradation in dynamic imaging studies, especially when using phase-resolved imaging modalities such as spectral-domain optical coherence tomography (SD-OCT), whose phase signal is very sensitive to movements of the sample. This study demonstrates a method with which to compensate for motion artifacts in dynamic SD-OCT imaging of the rodent cerebral cortex. We observed that respiratory and cardiac motions mainly caused, respectively, bulk image shifts (BISs) and global phase fluctuations (GPFs). A cross-correlation maximization-based shift correction algorithm was effective in suppressing BISs, while GPFs were significantly reduced by removing axial and lateral global phase variations. In addition, a non-origin-centered GPF correction algorithm was examined. Several combinations of these algorithms were tested to find an optimized approach that improved image stability from 0.5 to 0.8 in terms of the cross-correlation over 4 s of dynamic imaging, and reduced phase noise by two orders of magnitude in ~8% voxels.K99 NS067050 - NINDS NIH HHS; R01EB000790 - NIBIB NIH HHS; R01 EB001954 - NIBIB NIH HHS; R01 EB001954-09 - NIBIB NIH HHS; P01NS055104 - NINDS NIH HHS; R01 NS057476 - NINDS NIH HHS; K99NS067050 - NINDS NIH HHS; R01 EB000790 - NIBIB NIH HHS; R01-EB001954 - NIBIB NIH HHS; R01NS057476 - NINDS NIH HHS; P01 NS055104 - NINDS NIH HHS; P41 EB015896 - NIBIB NIH HHSPublished versio
A novel method for crystalline silicon solar cells with low contact resistance and antireflection coating by an oxidized Mg layer
One of the key issues in the solar industry is lowering dopant concentration of emitter for high-efficiency crystalline solar cells. However, it is well known that a low surface concentration of dopants results in poor contact formation between the front Ag electrode and the n-layer of Si. In this paper, an evaporated Mg layer is used to reduce series resistance of c-Si solar cells. A layer of Mg metal is deposited on a lightly doped n-type Si emitter by evaporation. Ag electrode is screen printed to collect the generated electrons. Small work function difference between Mg and n-type silicon reduces the contact resistance. During a co-firing process, Mg is oxidized, and the oxidized layer serves as an antireflection layer. The measurement of an Ag/Mg/n-Si solar cell shows that Voc, Jsc, FF, and efficiency are 602 mV, 36.9 mA/cm2, 80.1%, and 17.75%, respectively. It can be applied to the manufacturing of low-cost, simple, and high-efficiency solar cells
Finite-difference Time-domain Study on Birefringence Changes of the Axon During Neural Activation
Recently, there has been a growing interest in optical imaging of neural activity because the
optical neuroimaging has considerable advantages over conventional imaging. Birefringence of the
axon has been reported to change during neural activation, but the neurophysiological origin of
the change is still unresolved. This study hypothesizes that the birefringence signal is at least partially
attributed to the transient cellular volume change associated with nerve excitation. To examine
this hypothesis, we investigated how the intensity of cross-polarized light transmitting through the
axon would change as the size of the axon changes. For this purpose, a two-dimensional finitedifference
time-domain program was developed with the improvement of the total-field/scatteredfield
method which reduces numerical noise. The results support our hypothesis in that the computed
cross-polarized signals exhibit some agreement with previously-reported birefringence signals.This work was supported by the ERC program of
MEST/KOSEF (grant #R11-2000-075-01001-0), and the
grant from the Industrial technology development program
(10031270) of the Ministry of Knowledge Economy
(MKE) of Korea
High-speed near-infrared transmission spectrometer for a new fast intrinsic optical neural recording
What do consumers think about widespread fashion counterfeits? A Q-methodological analysis of the diverse viewpoints
This study explores the complex facets of fashion counterfeits, focusing on (1) why such purchasing behaviors are widespread and (2) whether or not the behaviors are morally accepted and can be eliminated through law and regulations. In order to identify and categorize perceptual factors of the fashion counterfeits problem, we used Q-methodology with a combination of qualitative and quantitative techniques to identify different patterns of subjective perceptions (Brown, 2008)
- …