104 research outputs found

    Is GPT-4 a Good Data Analyst?

    Full text link
    As large language models (LLMs) have demonstrated their powerful capabilities in plenty of domains and tasks, including context understanding, code generation, language generation, data storytelling, etc., many data analysts may raise concerns if their jobs will be replaced by AI. This controversial topic has drawn a lot of attention in public. However, we are still at a stage of divergent opinions without any definitive conclusion. Motivated by this, we raise the research question of "is GPT-4 a good data analyst?" in this work and aim to answer it by conducting head-to-head comparative studies. In detail, we regard GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We propose a framework to tackle the problems by carefully designing the prompts for GPT-4 to conduct experiments. We also design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans. We also provide in-depth discussions about our results to shed light on further studies before we reach the conclusion that GPT-4 can replace data analysts.Comment: 11 pages, 2 figure

    Exploring the Potential of Large Language Models in Computational Argumentation

    Full text link
    Computational argumentation has become an essential tool in various fields, including artificial intelligence, law, and public policy. It is an emerging research field in natural language processing (NLP) that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models (LLMs) have demonstrated strong abilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on various computational argumentation tasks. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models and LLaMA2 models, under zero-shot and few-shot settings within the realm of computational argumentation. We organize existing tasks into 6 main classes and standardise the format of 14 open-sourced datasets. In addition, we present a new benchmark dataset on counter speech generation, that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of these datasets, demonstrating their capabilities in the field of argumentation. We also highlight the limitations in evaluating computational argumentation and provide suggestions for future research directions in this field

    The Structure of Coronal Mass Ejections Recorded by the K-Coronagraph at Mauna Loa Solar Observatory

    Full text link
    Previous survey studies reported that coronal mass ejections (CMEs) can exhibit various structures in white-light coronagraphs, and \sim30\% of them have the typical three-part feature in the high corona (e.g., 2--6 RR_\odot), which has been taken as the prototypical structure of CMEs. It is widely accepted that CMEs result from eruption of magnetic flux ropes (MFRs), and the three-part structure can be understood easily by means of the MFR eruption. It is interesting and significant to answer why only \sim30\% of CMEs have the three-part feature in previous studies. Here we conduct a synthesis of the CME structure in the field of view (FOV) of K-Coronagraph (1.05--3 RR_\odot). In total, 369 CMEs are observed from 2013 September to 2022 November. After inspecting the CMEs one by one through joint observations of the AIA, K-Coronagraph and LASCO/C2, we find 71 events according to the criteria: 1) limb event; 2) normal CME, i.e., angular width \geq 30^{\circ}; 3) K-Coronagraph caught the early eruption stage. All (or more than 90\% considering several ambiguous events) of the 71 CMEs exhibit the three-part feature in the FOV of K-Coronagraph, while only 30--40\% have the feature in the C2 FOV (2--6 RR_\odot). For the first time, our studies show that 90--100\% and 30--40\% of normal CMEs possess the three-part structure in the low and high corona, respectively, which demonstrates that many CMEs can lose the three-part feature during their early evolutions, and strongly supports that most (if not all) CMEs have the MFR structures.Comment: 10 pages, 4 figures, accepted for publication in ApJ

    MReD: A Meta-Review Dataset for Structure-Controllable Text Generation

    Full text link
    When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and the control signal to guide the generation, which can only be built with a deep understanding of the domain knowledge. Motivated by this vision, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated with one of the 9 carefully defined categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for structure-controlled generation with both extractive and abstractive models using our annotated data. By exploring various settings and analyzing the model behavior with respect to the control signal, we demonstrate the challenges of our proposed task and the values of our dataset MReD. Meanwhile, MReD also allows us to have a better understanding of the meta-review domain.Comment: 15 pages, 5 figures, accepted at ACL 202

    Unlocking Temporal Question Answering for Large Language Models Using Code Execution

    Full text link
    Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challenge that LLMs face when it comes to temporal reasoning. Our preliminary experiments show that generating intermediate reasoning steps does not always boost the performance of complex temporal question-answering tasks. Therefore, we propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue. Extensive experiments and analysis demonstrate the effectiveness of our framework in handling intricate time-bound reasoning tasks

    High‐Performance Pseudocubic Thermoelectric Materials from Non‐cubic Chalcopyrite Compounds

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/107585/1/adma201400058-sup-0001-S1.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/107585/2/adma201400058.pd

    SPEEK Membrane of Ultrahigh Stability Enhanced by Functionalized Carbon Nanotubes for Vanadium Redox Flow Battery

    Get PDF
    Proton exchange membrane is the key factor of vanadium redox flow battery (VRB) as their stability largely determine the lifetime of the VRB. In this study, a SPEEK/MWCNTs-OH composite membrane with ultrahigh stability is constructed by blending sulfonated poly(ether ether ketone) (SPEEK) with multi-walled carbon nanotubes toward VRB application. The carbon nanotubes disperse homogeneously in the SPEEK matrix with the assistance of hydroxyl group. The blended membrane exhibits 94.2 and 73.0% capacity retention after 100 and 500 cycles, respectively in a VRB single cell with coulombic efficiency of over 99.4% at 60 mA cm−2 indicating outstanding capability of reducing the permeability of vanadium ions and enhancing the transport of protons. The ultrahigh stability and low cost of the composite membrane make it a competent candidate for the next generation larger-scale vanadium redox flow battery

    SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

    Full text link
    With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execution. Due to the large search space, the contemporary parallelization plan generators often rely on empirical rules that couple transformation and scheduling, and fall short in exploring more flexible schedules that yield better memory usage and compute efficiency. This tension can be exacerbated by the emerging models with increasing complexity in their structure and model size. SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans. It formulates the plan design and generation into three sequential phases explicitly: model transformation, space-time scheduling, and data dependency preserving. Such a principled approach decouples multiple seemingly intertwined factors and enables the composition of highly flexible parallelization plans. As a result, SuperScaler can not only generate empirical parallelization plans, but also construct new plans that achieve up to 3.5X speedup compared to state-of-the-art solutions like DeepSpeed, Megatron and Alpa, for emerging DNN models like Swin-Transformer and AlphaFold2, as well as well-optimized models like GPT-3
    corecore