509 research outputs found

    Deanthropomorphising NLP: Can a Language Model Be Conscious?

    Full text link
    This work is intended as a voice in the discussion over the recent claims that LaMDA, a pretrained language model based on the Transformer model architecture, is sentient. This claim, if confirmed, would have serious ramifications in the Natural Language Processing (NLP) community due to wide-spread use of similar models. However, here we take the position that such a language model cannot be sentient, or conscious, and that LaMDA in particular exhibits no advances over other similar models that would qualify it. We justify this by analysing the Transformer architecture through Integrated Information Theory. We see the claims of consciousness as part of a wider tendency to use anthropomorphic language in NLP reporting. Regardless of the veracity of the claims, we consider this an opportune moment to take stock of progress in language modelling and consider the ethical implications of the task. In order to make this work helpful for readers outside the NLP community, we also present the necessary background in language modelling

    PaLM: Scaling Language Modeling with Pathways

    Full text link
    Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies

    LLM for SoC Security: A Paradigm Shift

    Full text link
    As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding, advanced reasoning, and program synthesis tasks. Recognizing an opportunity, our research delves into leveraging the emergent capabilities of Generative Pre-trained Transformers (GPTs) to address the existing gaps in SoC security, aiming for a more efficient, scalable, and adaptable methodology. By integrating LLMs into the SoC security verification paradigm, we open a new frontier of possibilities and challenges to ensure the security of increasingly complex SoCs. This paper offers an in-depth analysis of existing works, showcases practical case studies, demonstrates comprehensive experiments, and provides useful promoting guidelines. We also present the achievements, prospects, and challenges of employing LLM in different SoC security verification tasks.Comment: 42 page

    A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

    Full text link
    Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores

    Multiphysics Modeling And Simulation Process To Develop Thin Piezoelectric Film Sensors To Measure The Vibration Of Structures With Complex Shapes And Boundary Conditions.

    Get PDF
    Piezoelectricity was discovered in 1880 by Jacques and Pierre Curie. Its application has since been extended to actuators and sensors, widely used in industry, automotive, and aerospace applications. The last two decades have seen intensive research in piezoelectric theory in an effort to effectively capture and control the distinctive coupling of electricity and elasticity. However, due to the complexity of the theory involved, finite element and numerical methods are often used in the process. Limited analytical exact solutions are also found in literature. The objective of this work is to devise a multiphysics modeling and simulation process to develop thin piezoelectric film sensors to measure the vibration of structures with complex shapes and boundary conditions. First, the output charge of generic piezoelectric films, respectively attached to a beam and a plate, is modeled using ANSYS and experimentally verified. Second, the modeling method is extended to a cylindrical shell followed by experimental verifications. Appropriate material properties obtained from past researches were incorporated as required. Finally, shaped sensors for the measurement of specific dynamic characteristics of a beam, plate and cylindrical shell respectively, are developed and experimentally validated. The results show that Multiphysics modeling can be an efficient design tool and be effectively used to simulate complex systems. This tool can be also used to detect or simulate design flaws and errors

    Improved Instruction Ordering in Recipe-Grounded Conversation

    Full text link
    In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions in the correct order. We hypothesize that this is due to the model's lack of understanding of user intent and inability to track the instruction state (i.e., which step was last instructed). Therefore, we propose to explore two auxiliary subtasks, namely User Intent Detection and Instruction State Tracking, to support Response Generation with improved instruction grounding. Experimenting with our newly collected dataset, ChattyChef, shows that incorporating user intent and instruction state information helps the response generation model mitigate the incorrect order issue. Furthermore, to investigate whether ChatGPT has completely solved this task, we analyze its outputs and find that it also makes mistakes (10.7% of the responses), about half of which are out-of-order instructions. We will release ChattyChef to facilitate further research in this area at: https://github.com/octaviaguo/ChattyChef.Comment: Accepted at ACL 2023 main conferenc

    Conversational Agents in Education – A Systematic Literature Review

    Get PDF
    Conversational Agents (CAs) are widely spread in a variety of domains, such as health and customer service. There is a recent trend of increasing publications and implementations of CAs in education. We conduct a systematic literature review to identify common methodologies, pedagogical CA roles, addressed target groups, the technologies and theories behind, as well as human-like design aspects. The initially found 3329 records were systematically reduced to 252 fully coded articles. Based on the analysis of the codings, we derive further research streams. Our results reveal a research gap for long-term studies on the use of CAs in education, and there is insufficient holistic design knowledge for pedagogical CAs. Moreover, target groups other than academic students are rarely considered. We condense our findings in a morphological box and conclude that pedagogical CAs have not yet reached their full potential of long-term practical application in education
    corecore