228 research outputs found

    Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

    Full text link
    Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that are aligned at different levels, the inherent noises (e.g., misaligned pairs) make it difficult to learn a precise captioning model. While the filtering strategy can effectively remove noisy data, however, it leads to a decrease in learnable knowledge and sometimes brings about a new problem of data deficiency. To take the best of both worlds, we propose a noise-aware learning framework, which learns rich knowledge from the whole web-crawled data while being less affected by the noises. This is achieved by the proposed quality controllable model, which is learned using alignment levels of the image-text pairs as an additional control signal during training. The alignment-conditioned training allows the model to generate high-quality captions of well-aligned by simply setting the control signal to desired alignment level at inference time. Through in-depth analysis, we show that our controllable captioning model is effective in handling noise. In addition, with two tasks of zero-shot captioning and text-to-image retrieval using generated captions (i.e., self-retrieval), we also demonstrate our model can produce high-quality captions in terms of descriptiveness and distinctiveness. Code is available at \url{https://github.com/kakaobrain/noc}

    Large Language Models can Share Images, Too!

    Full text link
    This paper explores the image-sharing capability of Large Language Models (LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting, without the help of visual foundation models. Inspired by the two-stage process of image-sharing in human dialogues, we propose a two-stage framework that allows LLMs to predict potential image-sharing turns and generate related image descriptions using our effective restriction-based prompt template. With extensive experiments, we unlock the \textit{image-sharing} capability of LLMs in zero-shot prompting, with GPT-4 achieving the best performance. Additionally, we uncover the emergent \textit{image-sharing} ability in zero-shot prompting, demonstrating the effectiveness of restriction-based prompts in both stages of our framework. Based on this framework, we augment the PhotoChat dataset with images generated by Stable Diffusion at predicted turns, namely PhotoChat++. To our knowledge, this is the first study to assess the image-sharing ability of LLMs in a zero-shot setting without visual foundation models. The source code and the dataset will be released after publication

    Tgif1 Counterbalances The Activity Of Core Pluripotency Factors In Mouse Embryonic Stem Cells

    Get PDF
    Core pluripotency factors, such as Oct4, Sox2, and Nanog, play important roles in maintaining embryonic stem cell (ESC) identity by autoregulatory feedforward loops. Nevertheless, the mechanism that provides precise control of the levels of the ESC core factors without indefinite amplification has remained elusive. Here, we report the direct repression of core pluripotency factors by Tgif1, a previously known terminal repressor of TGF beta/activin/nodal signaling. Overexpression of Tgif1 reduces the levels of ESC core factors, whereas its depletion leads to the induction of the pluripotency factors. We confirm the existence of physical associations between Tgif1 and Oct4, Nanog, and HDAC1/2 and further show the level of Tgif1 is not significantly altered by treatment with an activator/inhibitor of the TGF beta/activin/nodal signaling. Collectively, our findings establish Tgif1 as an integral member of the core regulatory circuitry of mouse ESCs that counterbalances the levels of the core pluripotency factors in a TGF beta/activin/nodal-independent manner.Cancer Prevention Research Institute of Texas (CPRIT) R1106Molecular Bioscience

    Single Cell Training on Architecture Search for Image Denoising

    Full text link
    Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the architectures developed from such efforts dwell in the balance of the network optimality and the search cost. Additionally, recent NAS methods for image restoration generally do not consider dynamic operations that may transform dimensions of feature maps because of the dimensionality mismatch in tensor calculations. This can greatly limit NAS in its search for optimal network structure. To address these issues, we re-frame the optimal search problem by focusing at component block level. From previous work, it's been shown that an effective denoising block can be connected in series to further improve the network performance. By focusing at block level, the search space of reinforcement learning becomes significantly smaller and evaluation process can be conducted more rapidly. In addition, we integrate an innovative dimension matching modules for dealing with spatial and channel-wise mismatch that may occur in the optimal design search. This allows much flexibility in optimal network search within the cell block. With these modules, then we employ reinforcement learning in search of an optimal image denoising network at a module level. Computational efficiency of our proposed Denoising Prior Neural Architecture Search (DPNAS) was demonstrated by having it complete an optimal architecture search for an image restoration task by just one day with a single GPU

    Motion correction for phase-resolved dynamic optical coherence tomography imaging of rodent cerebral cortex

    Get PDF
    Cardiac and respiratory motions in animals are the primary source of image quality degradation in dynamic imaging studies, especially when using phase-resolved imaging modalities such as spectral-domain optical coherence tomography (SD-OCT), whose phase signal is very sensitive to movements of the sample. This study demonstrates a method with which to compensate for motion artifacts in dynamic SD-OCT imaging of the rodent cerebral cortex. We observed that respiratory and cardiac motions mainly caused, respectively, bulk image shifts (BISs) and global phase fluctuations (GPFs). A cross-correlation maximization-based shift correction algorithm was effective in suppressing BISs, while GPFs were significantly reduced by removing axial and lateral global phase variations. In addition, a non-origin-centered GPF correction algorithm was examined. Several combinations of these algorithms were tested to find an optimized approach that improved image stability from 0.5 to 0.8 in terms of the cross-correlation over 4 s of dynamic imaging, and reduced phase noise by two orders of magnitude in ~8% voxels.K99 NS067050 - NINDS NIH HHS; R01EB000790 - NIBIB NIH HHS; R01 EB001954 - NIBIB NIH HHS; R01 EB001954-09 - NIBIB NIH HHS; P01NS055104 - NINDS NIH HHS; R01 NS057476 - NINDS NIH HHS; K99NS067050 - NINDS NIH HHS; R01 EB000790 - NIBIB NIH HHS; R01-EB001954 - NIBIB NIH HHS; R01NS057476 - NINDS NIH HHS; P01 NS055104 - NINDS NIH HHS; P41 EB015896 - NIBIB NIH HHSPublished versio

    A novel method for crystalline silicon solar cells with low contact resistance and antireflection coating by an oxidized Mg layer

    Get PDF
    One of the key issues in the solar industry is lowering dopant concentration of emitter for high-efficiency crystalline solar cells. However, it is well known that a low surface concentration of dopants results in poor contact formation between the front Ag electrode and the n-layer of Si. In this paper, an evaporated Mg layer is used to reduce series resistance of c-Si solar cells. A layer of Mg metal is deposited on a lightly doped n-type Si emitter by evaporation. Ag electrode is screen printed to collect the generated electrons. Small work function difference between Mg and n-type silicon reduces the contact resistance. During a co-firing process, Mg is oxidized, and the oxidized layer serves as an antireflection layer. The measurement of an Ag/Mg/n-Si solar cell shows that Voc, Jsc, FF, and efficiency are 602 mV, 36.9 mA/cm2, 80.1%, and 17.75%, respectively. It can be applied to the manufacturing of low-cost, simple, and high-efficiency solar cells

    Finite-difference Time-domain Study on Birefringence Changes of the Axon During Neural Activation

    Get PDF
    Recently, there has been a growing interest in optical imaging of neural activity because the optical neuroimaging has considerable advantages over conventional imaging. Birefringence of the axon has been reported to change during neural activation, but the neurophysiological origin of the change is still unresolved. This study hypothesizes that the birefringence signal is at least partially attributed to the transient cellular volume change associated with nerve excitation. To examine this hypothesis, we investigated how the intensity of cross-polarized light transmitting through the axon would change as the size of the axon changes. For this purpose, a two-dimensional finitedifference time-domain program was developed with the improvement of the total-field/scatteredfield method which reduces numerical noise. The results support our hypothesis in that the computed cross-polarized signals exhibit some agreement with previously-reported birefringence signals.This work was supported by the ERC program of MEST/KOSEF (grant #R11-2000-075-01001-0), and the grant from the Industrial technology development program (10031270) of the Ministry of Knowledge Economy (MKE) of Korea

    What do consumers think about widespread fashion counterfeits? A Q-methodological analysis of the diverse viewpoints

    Get PDF
    This study explores the complex facets of fashion counterfeits, focusing on (1) why such purchasing behaviors are widespread and (2) whether or not the behaviors are morally accepted and can be eliminated through law and regulations. In order to identify and categorize perceptual factors of the fashion counterfeits problem, we used Q-methodology with a combination of qualitative and quantitative techniques to identify different patterns of subjective perceptions (Brown, 2008)
    corecore