41 research outputs found
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Enhancing the zero-shot performance of instruction-following models requires
heavy computation, either by scaling the total number of training datasets or
the model size. In this work, we explore how retrieval of soft prompts obtained
through prompt tuning can efficiently assist hard prompts in zero-shot task
generalization. Specifically, we train soft prompt embeddings for each prompt
through prompt tuning, store the samples of the training instances mapped with
the prompt embeddings, and retrieve the corresponding prompt embedding of the
training instance closest to the query instance during inference. While only
adding 0.007% additional parameters, retrieval of soft prompt enhances the
performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets
as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39%
points. Also, we report an interesting finding that retrieving source
embeddings trained on similar answer choice formats is more important than
those on similar task types.Comment: EMNLP 2023 Finding
How Well Do Large Language Models Truly Ground?
Reliance on the inherent knowledge of Large Language Models (LLMs) can cause
issues such as hallucinations, lack of control, and difficulties in integrating
variable knowledge. To mitigate this, LLMs can be probed to generate responses
by grounding on external context, often given as input (knowledge-augmented
models). Yet, previous research is often confined to a narrow view of the term
"grounding", often only focusing on whether the response contains the correct
answer or not, which does not ensure the reliability of the entire response. To
address this limitation, we introduce a strict definition of grounding: a model
is considered truly grounded when its responses (1) fully utilize necessary
knowledge from the provided context, and (2) don't exceed the knowledge within
the contexts. We introduce a new dataset and a grounding metric to assess this
new definition and perform experiments across 13 LLMs of different sizes and
training methods to provide insights into the factors that influence grounding
performance. Our findings contribute to a better understanding of how to
improve grounding capabilities and suggest an area of improvement toward more
reliable and controllable LLM applications
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Language models (LMs) with less than 100B parameters are known to perform
poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when
solving unseen tasks. In this work, we aim to equip smaller LMs with the
step-by-step reasoning capability by instruction tuning with CoT rationales. In
order to achieve this goal, we first introduce a new instruction-tuning dataset
called the CoT Collection, which augments the existing Flan Collection
(including only 9 CoT tasks) with additional 1.84 million rationales across
1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT
Collection enables smaller LMs to have better CoT capabilities on unseen tasks.
On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of
+4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task
accuracy. Furthermore, we show that instruction tuning with CoT Collection
allows LMs to possess stronger few-shot learning capabilities on 4
domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and
+2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until
the max length by a +13.98% margin. Our code, the CoT Collection data, and
model checkpoints are publicly available.Comment: EMNLP 2023 (Main Conference
Low-frequency noise in junctionless multigate transistors
Low-frequency noise in n-type junctionless multigate transistors was investigated. It can be well understood with the carrier number fluctuations whereas the conduction is mainly limited by the bulk expecting Hooge mobility fluctuations. The trapping/release of charge carriers is related not only to the oxide-semiconductor interface but also to the depleted channel. The volume trap density is in the range of 6-30 x 10(16) cm(-3) eV(-1), which is similar to Si-SiO2 bulk transistors and remarkably lower than in high-k transistors. These results show that the noise in nanowire devices might be affected by additional trapping centers. (C) 2011 American Institute of Physics. (doi:10.1063/1.3569724
Growth of vertically aligned arrays of carbon nanotubes for high field emission
International audienceVertically aligned multi-walled carbon nanotubes have been grown on Ni-coated silicon substrates, by using either direct current diode or triode plasma-enhanced chemical vapor deposition at low temperature (around 620 °C). Acetylene gas has been used as the carbon source while ammonia and hydrogen have been used for etching. However densely packed (∼ 109 cm− 2) CNTs were obtained when the pressure was ∼ 100 Pa. The alignment of nanotubes is a necessary, but not a sufficient condition in order to get an efficient electron emission: the growth of nanotubes should be controlled along regular arrays, in order to minimize the electrostatic interactions between them. So a three dimensional numerical simulation has been developed to calculate the local electric field in the vicinity of the tips for a finite square array of nanotubes and thus to calculate the maximum of the electron emission current density as a function of the spacing between nanotubes. Finally the triode plasma- enhanced process combined with pre-patterned catalyst films (using different lithography techniques) has been chosen in order to grow regular arrays of aligned CNTs with different pitches in the micrometer range. The comparison between the experimental and the simulation data permits to define the most efficient CNT-based electron field emitter
Tactile Avatar: Tactile Sensing System Mimicking Human Tactile Cognition
As a surrogate for human tactile cognition, an artificial tactile perception and cognition system are proposed to produce smooth/soft and rough tactile sensations by its user's tactile feeling; and named this system as “tactile avatar”. A piezoelectric tactile sensor is developed to record dynamically various physical information such as pressure, temperature, hardness, sliding velocity, and surface topography. For artificial tactile cognition, the tactile feeling of humans to various tactile materials ranging from smooth/soft to rough are assessed and found variation among participants. Because tactile responses vary among humans, a deep learning structure is designed to allow personalization through training based on individualized histograms of human tactile cognition and recording physical tactile information. The decision error in each avatar system is less than 2% when 42 materials are used to measure the tactile data with 100 trials for each material under 1.2N of contact force with 4cm s−1 of sliding velocity. As a tactile avatar, the machine categorizes newly experienced materials based on the tactile knowledge obtained from training data. The tactile sensation showed a high correlation with the specific user's tendency. This approach can be applied to electronic devices with tactile emotional exchange capabilities, as well as advanced digital experiences. © 2021 The Authors. Advanced Science published by Wiley-VCH GmbH1
Prediction of Atmospheric Duct Conditions from a Clutter Power Spectrum Using Deep Learning
This paper presents a method for predicting atmospheric duct conditions from a clutter power spectrum using deep learning. To accurately predict the duct conditions, deep learning with a binary classification is applied to the proposed refractivity from the clutter (RFC) method. The input data set is the artificial clutter data that are generated via the Advanced Refractive Prediction System (AREPS) simulation software Ver. 3.6 in conjunction with random atmospheric refractive indices. The output of the RFC method is then predicted via binary classification, indicating whether the atmospheric conditions are duct or non-duct. For the cross-validation, the clutter power spectrum data are generated based on real atmospheric refractivity data. The results show that the DNN trained with 5600 pieces of data (validation accuracy of 95.99%) exhibits a binary classification accuracy of 98.36%. The deep neural network (DNN) trained with 28,000 pieces of data (validation accuracy of 98.20%) achieves a binary classification accuracy of 99.06% with an F1-score of 0.9921
Design of a Stacked Dual-Patch Antenna with 3D Printed Thick Quasi-Air Substrates and a Cavity Wall for Wideband Applications
In this paper, we propose a stacked dual-patch antenna with 3D printed thick quasi-air substrates and a cavity wall for wideband applications. To achieve the theoretical maximum bandwidth of the patch antenna, the quality factor of the system needs to be minimized. To achieve this, the area of the conductive radiator should be enlarged, while the permittivity of the substrate within the patch must be reduced close to 1. To realize a patch antenna with this maximum bandwidth, the stacked dual-patch configuration is employed to obtain an extended conductive radiator area. In addition, square-pipe resin frames manufactured using a 3D printing method are applied to the proposed antenna to implement a quasi-air substrate structure that has a low permittivity value close to 1. The proposed stacked dual-patch antenna with a quasi-air substrate has a broad bandwidth of 20.7%. The results demonstrate that by using the proposed antenna structure, broadband characteristics close to the fundamental bandwidth limit of the patch antenna can be achieved
Design of a Shared-Aperture Dual-Loop Antenna Using a Mutual Complementary Shape to Improve an Electromagnetic Transparent Characteristics Between S/X-Band Elements
In this paper, we propose an S/X-band shared-aperture array antenna with a mutual complementary design to improve the electromagnetic (EM) transparent characteristics. A unit-cell of the proposed antenna includes one dual-loop element for the S-band and dual-loop elements for the X-band. To configure the shared-aperture structure in a limited space, the S-band element is stacked on top of the X-band elements. To solve the practical engineering problems of the shared-aperture antennas, novel design techniques such as using a mutual complementary structure, a coupling compensation array, an interface layer, and an antenna modularization are employed. To verify the antenna feasibility, the fabricated unit-cell extends into a unit-cell array. The fractional bandwidth of the reflection coefficients for the proposed array are 14.7% and 15% in the S- and X-bands, respectively. In the S-band, as the steering direction of the main beam increases from 0° to 45°, the maximum gain decreases from 14.6 dBi to 11.8 dBi. In the X-band under the same conditions, the maximum gain varies from 26.6 dBi to 25.3 dBi
Statistical Indoor Exclusion Zone Analysis by Investigating Electromagnetic Fields inside a Nuclear Power Plant
This article investigates a statistical indoor exclusion zone (EZ) that can be efficiently applied to a nuclear power plant (NPP) by examining electromagnetic fields inside the actual NPP. To obtain the statistical indoor EZ, the indoor environment of the Korea Institute of Nuclear Safety (KINS) simulator room is modeled using the Wireless InSite commercial electromagnetic simulation software. The indoor space around the transmitting antenna is classified as multiple observation regions, and the EZ boundaries of each region are independently defined within each separate observation region. The EZ boundaries are then obtained using a margined regression model, which makes it possible to determine a reasonable boundary of the statistical indoor EZ. To validate the statistical indoor EZ, the received power inside the KINS simulator room is then measured, which agrees well with the simulated results. The results demonstrate that the proposed statistical indoor EZ can be properly obtained not only from the simulation data but also from the measurement data