104 research outputs found
Is GPT-4 a Good Data Analyst?
As large language models (LLMs) have demonstrated their powerful capabilities
in plenty of domains and tasks, including context understanding, code
generation, language generation, data storytelling, etc., many data analysts
may raise concerns if their jobs will be replaced by AI. This controversial
topic has drawn a lot of attention in public. However, we are still at a stage
of divergent opinions without any definitive conclusion. Motivated by this, we
raise the research question of "is GPT-4 a good data analyst?" in this work and
aim to answer it by conducting head-to-head comparative studies. In detail, we
regard GPT-4 as a data analyst to perform end-to-end data analysis with
databases from a wide range of domains. We propose a framework to tackle the
problems by carefully designing the prompts for GPT-4 to conduct experiments.
We also design several task-specific evaluation metrics to systematically
compare the performance between several professional human data analysts and
GPT-4. Experimental results show that GPT-4 can achieve comparable performance
to humans. We also provide in-depth discussions about our results to shed light
on further studies before we reach the conclusion that GPT-4 can replace data
analysts.Comment: 11 pages, 2 figure
Exploring the Potential of Large Language Models in Computational Argumentation
Computational argumentation has become an essential tool in various fields,
including artificial intelligence, law, and public policy. It is an emerging
research field in natural language processing (NLP) that attracts increasing
attention. Research on computational argumentation mainly involves two types of
tasks: argument mining and argument generation. As large language models (LLMs)
have demonstrated strong abilities in understanding context and generating
natural language, it is worthwhile to evaluate the performance of LLMs on
various computational argumentation tasks. This work aims to embark on an
assessment of LLMs, such as ChatGPT, Flan models and LLaMA2 models, under
zero-shot and few-shot settings within the realm of computational
argumentation. We organize existing tasks into 6 main classes and standardise
the format of 14 open-sourced datasets. In addition, we present a new benchmark
dataset on counter speech generation, that aims to holistically evaluate the
end-to-end performance of LLMs on argument mining and argument generation.
Extensive experiments show that LLMs exhibit commendable performance across
most of these datasets, demonstrating their capabilities in the field of
argumentation. We also highlight the limitations in evaluating computational
argumentation and provide suggestions for future research directions in this
field
The Structure of Coronal Mass Ejections Recorded by the K-Coronagraph at Mauna Loa Solar Observatory
Previous survey studies reported that coronal mass ejections (CMEs) can
exhibit various structures in white-light coronagraphs, and 30\% of them
have the typical three-part feature in the high corona (e.g., 2--6 ),
which has been taken as the prototypical structure of CMEs. It is widely
accepted that CMEs result from eruption of magnetic flux ropes (MFRs), and the
three-part structure can be understood easily by means of the MFR eruption. It
is interesting and significant to answer why only 30\% of CMEs have the
three-part feature in previous studies. Here we conduct a synthesis of the CME
structure in the field of view (FOV) of K-Coronagraph (1.05--3 ). In
total, 369 CMEs are observed from 2013 September to 2022 November. After
inspecting the CMEs one by one through joint observations of the AIA,
K-Coronagraph and LASCO/C2, we find 71 events according to the criteria: 1)
limb event; 2) normal CME, i.e., angular width 30; 3)
K-Coronagraph caught the early eruption stage. All (or more than 90\%
considering several ambiguous events) of the 71 CMEs exhibit the three-part
feature in the FOV of K-Coronagraph, while only 30--40\% have the feature in
the C2 FOV (2--6 ). For the first time, our studies show that
90--100\% and 30--40\% of normal CMEs possess the three-part structure in the
low and high corona, respectively, which demonstrates that many CMEs can lose
the three-part feature during their early evolutions, and strongly supports
that most (if not all) CMEs have the MFR structures.Comment: 10 pages, 4 figures, accepted for publication in ApJ
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation
When directly using existing text generation datasets for controllable
generation, we are facing the problem of not having the domain knowledge and
thus the aspects that could be controlled are limited. A typical example is
when using CNN/Daily Mail dataset for controllable text summarization, there is
no guided information on the emphasis of summary sentences. A more useful text
generator should leverage both the input text and the control signal to guide
the generation, which can only be built with a deep understanding of the domain
knowledge. Motivated by this vision, our paper introduces a new text generation
dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its
45k meta-review sentences are manually annotated with one of the 9 carefully
defined categories, including abstract, strength, decision, etc. We present
experimental results on start-of-the-art summarization models, and propose
methods for structure-controlled generation with both extractive and
abstractive models using our annotated data. By exploring various settings and
analyzing the model behavior with respect to the control signal, we demonstrate
the challenges of our proposed task and the values of our dataset MReD.
Meanwhile, MReD also allows us to have a better understanding of the
meta-review domain.Comment: 15 pages, 5 figures, accepted at ACL 202
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Large language models (LLMs) have made significant progress in natural
language processing (NLP), and are utilized extensively in various
applications. Recent works, such as chain-of-thought (CoT), have shown that
intermediate reasoning steps can improve the performance of LLMs for complex
reasoning tasks, such as math problems and symbolic question-answering tasks.
However, we notice the challenge that LLMs face when it comes to temporal
reasoning. Our preliminary experiments show that generating intermediate
reasoning steps does not always boost the performance of complex temporal
question-answering tasks. Therefore, we propose a novel framework that combines
the extraction capability of LLMs and the logical reasoning capability of a
Python solver to tackle this issue. Extensive experiments and analysis
demonstrate the effectiveness of our framework in handling intricate time-bound
reasoning tasks
High‐Performance Pseudocubic Thermoelectric Materials from Non‐cubic Chalcopyrite Compounds
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/107585/1/adma201400058-sup-0001-S1.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/107585/2/adma201400058.pd
SPEEK Membrane of Ultrahigh Stability Enhanced by Functionalized Carbon Nanotubes for Vanadium Redox Flow Battery
Proton exchange membrane is the key factor of vanadium redox flow battery (VRB) as their stability largely determine the lifetime of the VRB. In this study, a SPEEK/MWCNTs-OH composite membrane with ultrahigh stability is constructed by blending sulfonated poly(ether ether ketone) (SPEEK) with multi-walled carbon nanotubes toward VRB application. The carbon nanotubes disperse homogeneously in the SPEEK matrix with the assistance of hydroxyl group. The blended membrane exhibits 94.2 and 73.0% capacity retention after 100 and 500 cycles, respectively in a VRB single cell with coulombic efficiency of over 99.4% at 60 mA cm−2 indicating outstanding capability of reducing the permeability of vanadium ions and enhancing the transport of protons. The ultrahigh stability and low cost of the composite membrane make it a competent candidate for the next generation larger-scale vanadium redox flow battery
SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
With the growing model size, deep neural networks (DNN) are increasingly
trained over massive GPU accelerators, which demands a proper parallelization
plan that transforms a DNN model into fine-grained tasks and then schedules
them to GPUs for execution. Due to the large search space, the contemporary
parallelization plan generators often rely on empirical rules that couple
transformation and scheduling, and fall short in exploring more flexible
schedules that yield better memory usage and compute efficiency. This tension
can be exacerbated by the emerging models with increasing complexity in their
structure and model size. SuperScaler is a system that facilitates the design
and generation of highly flexible parallelization plans. It formulates the plan
design and generation into three sequential phases explicitly: model
transformation, space-time scheduling, and data dependency preserving. Such a
principled approach decouples multiple seemingly intertwined factors and
enables the composition of highly flexible parallelization plans. As a result,
SuperScaler can not only generate empirical parallelization plans, but also
construct new plans that achieve up to 3.5X speedup compared to
state-of-the-art solutions like DeepSpeed, Megatron and Alpa, for emerging DNN
models like Swin-Transformer and AlphaFold2, as well as well-optimized models
like GPT-3
- …