95 research outputs found
Enhanced bias stress stability of a-InGaZnO thin film transistors by inserting an ultra-thin interfacial InGaZnO:N layer
Amorphous indium-gallium-zinc oxide (a-IGZO) thin film transistors (TFTs) having an ultra-thin nitrogenated a-IGZO (a-IGZO:N) layer sandwiched at the channel/gate dielectric interface are fabricated. It is found that the device shows enhanced bias stress stability with significantly reduced threshold voltage drift under positive gate bias stress. Based on x-ray photoelectron spectroscopy measurement, the concentration of oxygen vacancies within the a-IGZO:N layer is suppressed due to the formation of N-Ga bonds. Meanwhile, low frequency noise analysis indicates that the average trap density near the channel/dielectric interface continuously drops as the nitrogen content within the a-IGZO:N layer increases. The improved interface quality upon nitrogen doping agrees with the enhanced bias stress stability of the a-IGZO TFTs.This work was supported in part by the State Key
Program for Basic Research of China under Grant Nos.
2010CB327504, 2011CB922100, and 2011CB301900; in
part by the National Natural Science Foundation of China
under Grant Nos. 60936004 and 11104130; in part by the
Natural Science Foundation of Jiangsu Province under Grant
Nos. BK2011556 and BK2011050; and in part by the
Priority Academic Program Development of Jiangsu Higher
Education Institutions
Learning to Program with Natural Language
Large Language Models (LLMs) have shown remarkable performance in various
basic natural language tasks, which raises hope for achieving Artificial
General Intelligence. For completing the complex task, we still need a program
for the task first and then ask LLMs to follow the program to generate the
specific solution. We propose using natural language as a new programming
language to describe task procedures, making them easily understandable to both
humans and LLMs. ~The LLM is capable of directly generating natural language
programs, but these programs may still contain factual errors or incomplete
steps. Therefore, we further propose the Learning to Program (\text{LP}) method
to ask LLMs themselves to learn the natural language program based on the
training dataset of the complex task first and then use the learned program to
guide the inference. Our experiments on the reasoning tasks of five different
reasoning types (8 datasets) demonstrate the effectiveness of our approach.
Further, our analysis experiment shows that the learned program can be directly
used to guide another LLM to improve its performance, which reveals a new
transfer learning paradigm.Comment: Work in progres
GameEval: Evaluating LLMs on Conversational Games
The rapid advancements in large language models (LLMs) have presented
challenges in evaluating those models. Existing evaluation methods are either
reference-based or preference based, which inevitably need human intervention
or introduce test bias caused by evaluator models. In this paper, we propose
GameEval, a novel approach to evaluating LLMs through goal-driven
conversational games, overcoming the limitations of previous methods. GameEval
treats LLMs as game players and assigns them distinct roles with specific goals
achieved by launching conversations of various forms, including discussion,
question answering, and voting. We design three unique games with cooperative
or adversarial objectives, accompanied by corresponding evaluation metrics, to
show how this new paradigm comprehensively evaluates model performance.Through
extensive experiments, we show that GameEval can effectively differentiate the
capabilities of various LLMs, providing a comprehensive assessment of their
integrated abilities to solve complex problems. Our public anonymous code is
available at https://github.com/GameEval/GameEval
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
Controllable video generation has gained significant attention in recent
years. However, two main limitations persist: Firstly, most existing works
focus on either text, image, or trajectory-based control, leading to an
inability to achieve fine-grained control in videos. Secondly, trajectory
control research is still in its early stages, with most experiments being
conducted on simple datasets like Human3.6M. This constraint limits the models'
capability to process open-domain images and effectively handle complex curved
trajectories. In this paper, we propose DragNUWA, an open-domain
diffusion-based video generation model. To tackle the issue of insufficient
control granularity in existing works, we simultaneously introduce text, image,
and trajectory information to provide fine-grained control over video content
from semantic, spatial, and temporal perspectives. To resolve the problem of
limited open-domain trajectory control in current research, We propose
trajectory modeling with three aspects: a Trajectory Sampler (TS) to enable
open-domain control of arbitrary trajectories, a Multiscale Fusion (MF) to
control trajectories in different granularities, and an Adaptive Training (AT)
strategy to generate consistent videos following trajectories. Our experiments
validate the effectiveness of DragNUWA, demonstrating its superior performance
in fine-grained control in video generation. The homepage link is
\url{https://www.microsoft.com/en-us/research/project/dragnuwa/
Electrical instability of amorphous indium-gallium-zinc oxide thin film transistors under monochromatic light illumination
The electrical instability behaviors of a positive-gate-bias-stressed amorphous indium-gallium-zinc oxide (a-IGZO) thin film transistor(TFT) are studied under monochromatic light illumination. It is found that as the wavelength of incident light reduces from 750 nm to 450 nm, the threshold voltage of the illuminated TFT shows a continuous negative shift, which is caused by photo-excitation of trapped electrons at the channel/dielectric interface. Meanwhile, an increase of the sub-threshold swing (SS) is observed when the illumination wavelength is below 625 nm (∼2.0 eV). The SS degradation is accompanied by a simultaneous increase of the field effect mobility (μFE) of the TFT, which then decreases at even shorter wavelength beyond 540 nm (∼2.3 eV). The variation of SS and μFE is explained by a physical model based on generation of singly ionized oxygen vacancies (Vo⁺) and double ionized oxygen vacancies (Vo²⁺) within the a-IGZO active layer by high energy photons, which would form trap states near the mid-gap and the conduction band edge, respectively.This work was supported by the State Key Program for
Basic Research of China under Grant Nos. 2010CB327504,
2011CB922100, 2011CB301900; the National Natural Science
Foundation of China under Grant Nos. 60825401,
60936004, 11104130, BK2011556, and BK2011050
Using Left and Right Brains Together: Towards Vision and Language Planning
Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have
demonstrated remarkable decision masking capabilities on a variety of tasks.
However, they inherently operate planning within the language space, lacking
the vision and spatial imagination ability. In contrast, humans utilize both
left and right hemispheres of the brain for language and visual planning during
the thinking process. Therefore, we introduce a novel vision-language planning
framework in this work to perform concurrent visual and language planning for
tasks with inputs of any form. Our framework incorporates visual planning to
capture intricate environmental details, while language planning enhances the
logical coherence of the overall system. We evaluate the effectiveness of our
framework across vision-language tasks, vision-only tasks, and language-only
tasks. The results demonstrate the superior performance of our approach,
indicating that the integration of visual and language planning yields better
contextually aware task execution.Comment: 19 pages, 13 figure
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Two-Tower Vision-Language (VL) models have shown promising improvements on
various downstream VL tasks. Although the most advanced work improves
performance by building bridges between encoders, it suffers from ineffective
layer-by-layer utilization of uni-modal representations and cannot flexibly
exploit different levels of uni-modal semantic knowledge. In this work, we
propose ManagerTower, a novel VL model architecture that gathers and combines
the insights of pre-trained uni-modal experts at different levels. The managers
introduced in each cross-modal layer can adaptively aggregate uni-modal
semantic knowledge to facilitate more comprehensive cross-modal alignment and
fusion. ManagerTower outperforms previous strong baselines both with and
without Vision-Language Pre-training (VLP). With only 4M VLP data, ManagerTower
achieves superior performances on various downstream VL tasks, especially
79.15% accuracy on VQAv2 Test-Std, 86.56% IR@1 and 95.64% TR@1 on Flickr30K.
Code and checkpoints are available at https://github.com/LooperXX/ManagerTower.Comment: Accepted by ACL 2023 Main Conference, Ora
Low-code LLM: Graphical User Interface over Large Language Models
Utilizing Large Language Models (LLMs) for complex tasks is challenging,
often involving a time-consuming and uncontrollable prompt engineering process.
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It
incorporates six types of simple low-code visual programming interactions to
achieve more controllable and stable responses. Through visual interaction with
a graphical user interface, users can incorporate their ideas into the process
without writing trivial prompts. The proposed Low-code LLM framework consists
of a Planning LLM that designs a structured planning workflow for complex
tasks, which can be correspondingly edited and confirmed by users through
low-code visual programming operations, and an Executing LLM that generates
responses following the user-confirmed workflow. We highlight three advantages
of the low-code LLM: user-friendly interaction, controllable generation, and
wide applicability. We demonstrate its benefits using four typical
applications. By introducing this framework, we aim to bridge the gap between
humans and LLMs, enabling more effective and efficient utilization of LLMs for
complex tasks. The code, prompts, and experimental details are available at
https://github.com/moymix/TaskMatrix/tree/main/LowCodeLLM. A system
demonstration video can be found at
https://www.youtube.com/watch?v=jb2C1vaeO3E.Comment: Accepted as a Demo Track paper at NAACL 202
- …