26 research outputs found
Size Dependence of the Magnetic and Electrical Properties of the Spin-Valve Transistor
The electrical and magnetic properties of the spin-valve transistor (SVT) are investigated as a function of transistor size. A new fabrication process, designed to study the size dependence of the SVT properties, uses: silicon-on-insulator (SOI) wafers, a combination of ion beam and wet etching and a negative tone photoresist (SU8) as an insulating layer. The Si/Pt emitter and Si/Au collector Schottky barrier height do not depend on the transistor dimensions. The parasitic leakage current of the Si/Au collector is, however, proportional to its area. The relative collector current change with magnetic field is 240%, independent of size, while the transfer ratio starts to decrease for SVTs with an emitter area below 25 Ă— 25 Âżm2. The maximum input current is found to be limited by the maximum current density allowed in the base (1.7 Ă— 107 A/cm2), which is in agreement with the maximum current density for spin valve
Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation
Most task-oriented dialogue (TOD) benchmarks assume users that know exactly
how to use the system by constraining the user behaviors within the system's
capabilities via strict user goals, namely "user familiarity" bias. This data
bias deepens when it combines with data-driven TOD systems, as it is impossible
to fathom the effect of it with existing static evaluations. Hence, we conduct
an interactive user study to unveil how vulnerable TOD systems are against
realistic scenarios. In particular, we compare users with 1) detailed goal
instructions that conform to the system boundaries (closed-goal) and 2) vague
goal instructions that are often unsupported but realistic (open-goal). Our
study reveals that conversations in open-goal settings lead to catastrophic
failures of the system, in which 92% of the dialogues had significant issues.
Moreover, we conduct a thorough analysis to identify distinctive features
between the two settings through error annotation. From this, we discover a
novel "pretending" behavior, in which the system pretends to handle the user
requests even though they are beyond the system's capabilities. We discuss its
characteristics and toxicity while emphasizing transparency and a fallback
strategy for robust TOD systems
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Large language models (LLMs) provide a new way to build chatbots by accepting
natural language prompts. Yet, it is unclear how to design prompts to power
chatbots to carry on naturalistic conversations while pursuing a given goal,
such as collecting self-report data from users. We explore what design factors
of prompts can help steer chatbots to talk naturally and collect data reliably.
To this aim, we formulated four prompt designs with different structures and
personas. Through an online study (N = 48) where participants conversed with
chatbots driven by different designs of prompts, we assessed how prompt designs
and conversation topics affected the conversation flows and users' perceptions
of chatbots. Our chatbots covered 79% of the desired information slots during
conversations, and the designs of prompts and topics significantly influenced
the conversation flows and the data collection performance. We discuss the
opportunities and challenges of building chatbots with LLMs.Comment: 22 pages including Appendix, 7 figures, 7 tables. Accepted to PACM
HCI (CSCW 2024
QADiver: Interactive Framework for Diagnosing QA Models
Question answering (QA) extracting answers from text to the given question in
natural language, has been actively studied and existing models have shown a
promise of outperforming human performance when trained and evaluated with
SQuAD dataset. However, such performance may not be replicated in the actual
setting, for which we need to diagnose the cause, which is non-trivial due to
the complexity of model. We thus propose a web-based UI that provides how each
model contributes to QA performances, by integrating visualization and analysis
tools for model explanation. We expect this framework can help QA model
researchers to refine and improve their models.Comment: AAAI 2019 Demonstratio
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Questions in open-domain question answering are often ambiguous, allowing
multiple interpretations. One approach to handling them is to identify all
possible interpretations of the ambiguous question (AQ) and to generate a
long-form answer addressing them all, as suggested by Stelmakh et al., (2022).
While it provides a comprehensive response without bothering the user for
clarification, considering multiple dimensions of ambiguity and gathering
corresponding knowledge remains a challenge. To cope with the challenge, we
propose a novel framework, Tree of Clarifications (ToC): It recursively
constructs a tree of disambiguations for the AQ -- via few-shot prompting
leveraging external knowledge -- and uses it to generate a long-form answer.
ToC outperforms existing baselines on ASQA in a few-shot setup across the
metrics, while surpassing fully-supervised baselines trained on the whole
training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at
https://github.com/gankim/tree-of-clarifications.Comment: Accepted to EMNLP 202
Design of a High Performance Earth Imaging Microsatellite
Multispectral Earth Imaging Satellite (MEISAT) is SaTReCi’s 100 kg micro satellite which will be adequate to support a high resolution multispectral Earth imaging camera. MEISAT, whose development is currently in progress, is capable of achieving a ground resolution of 8.5 meters with a 47 km swath width and 400 km scan length from its intended orbit of 730 km. The low-cost satellite is also capable of storing 8 Gbits of image data which will be transmitted to ground at 10 Mbps with an X-band transmitter. This paper provides an overview of MEISAT mission operation concept. The mission requirements and the system specifications are also discussed. The paper then describes the overall system configuration. The design of the command and data handling system, the power system, the attitude control system, the RF system, and the payload system is presented
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Evaluation of Large Language Models (LLMs) is challenging because aligning to
human values requires the composition of multiple skills and the required set
of skills varies depending on the instruction. Recent studies have evaluated
the performance of LLMs in two ways, (1) automatic evaluation on several
independent benchmarks and (2) human or machined-based evaluation giving an
overall score to the response. However, both settings are coarse-grained
evaluations, not considering the nature of user instructions that require
instance-wise skill composition, which limits the interpretation of the true
capabilities of LLMs. In this paper, we introduce FLASK (Fine-grained Language
Model Evaluation based on Alignment SKill Sets), a fine-grained evaluation
protocol that can be used for both model-based and human-based evaluation which
decomposes coarse-level scoring to an instance-wise skill set-level.
Specifically, we define 12 fine-grained skills needed for LLMs to follow
open-ended user instructions and construct an evaluation set by allocating a
set of skills for each instance. Additionally, by annotating the target domains
and difficulty level for each instance, FLASK provides a holistic view with a
comprehensive analysis of a model's performance depending on skill, domain, and
difficulty. Through using FLASK, we compare multiple open-sourced and
proprietary LLMs and observe highly-correlated findings between model-based and
human-based evaluations. FLASK enables developers to more accurately measure
the model performance and how it can be improved by analyzing factors that make
LLMs proficient in particular skills. For practitioners, FLASK can be used to
recommend suitable models for particular situations through comprehensive
comparison among various LLMs. We release the evaluation data and code
implementation at https://github.com/kaistAI/FLASK