Search CORE

509 research outputs found

Deanthropomorphising NLP: Can a Language Model Be Conscious?

Author: Przybyła Piotr
Shardlow Matthew
Publication venue
Publication date: 21/11/2022
Field of study

This work is intended as a voice in the discussion over the recent claims that LaMDA, a pretrained language model based on the Transformer model architecture, is sentient. This claim, if confirmed, would have serious ramifications in the Natural Language Processing (NLP) community due to wide-spread use of similar models. However, here we take the position that such a language model cannot be sentient, or conscious, and that LaMDA in particular exhibits no advances over other similar models that would qualify it. We justify this by analysing the Transformer architecture through Integrated Information Theory. We see the claims of consciousness as part of a wider tendency to use anthropomorphic language in NLP reporting. Regardless of the veracity of the claims, we consider this an opportune moment to take stock of progress in language modelling and consider the ethical implications of the task. In order to make this work helpful for readers outside the NLP community, we also present the necessary background in language modelling

arXiv.org e-Print Archive

Shopping behaviour forecasts: experiments based on a fuzzy learning technique in the Spanish food retailing industry

Author: Casabayó Bonàs Mònica
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

PaLM: Scaling Language Modeling with Pathways

Author: Agrawal Shivani
Austin Jacob
Barham Paul
Barnes Parker
Bosma Maarten
Bradbury James
Catasta Michele
Child Rewon
Chowdhery Aakanksha
Chung Hyung Won
Dai Andrew M.
Dean Jeff
Dev Sunipa
Devlin Jacob
Diaz Mark
Dohan David
Du Nan
Duke Toju
Eck Douglas
Fedus Liam
Fiedel Noah
Firat Orhan
Garcia Xavier
Gehrmann Sebastian
Ghemawat Sanjay
Gur-Ari Guy
Hutchinson Ben
Ippolito Daphne
Isard Michael
Lee Katherine
Levskaya Anselm
Lewkowycz Aitor
Lim Hyeontaek
Luan David
Maynez Joshua
Meier-Hellstern Kathy
Michalewski Henryk
Mishra Gaurav
Misra Vedant
Moreira Erica
Narang Sharan
Omernick Mark
Pellat Marie
Petrov Slav
Pillai Thanumalayan Sankaranarayana
Polozov Oleksandr
Pope Reiner
Prabhakaran Vinodkumar
Rao Abhishek
Reif Emily
Roberts Adam
Robinson Kevin
Saeta Brennan
Schuh Parker
Sepassi Ryan
Shazeer Noam
Shi Kensen
Spiridonov Alexander
Sutton Charles
Tay Yi
Tsvyashchenko Sasha
Wang Xuezhi
Wei Jason
Yin Pengcheng
Zhou Denny
Zhou Zongwei
Zoph Barret
Publication venue
Publication date: 19/04/2022
Field of study

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies

arXiv.org e-Print Archive

LLM for SoC Security: A Paradigm Shift

Author: Farahmandi Farimah
Saha Dipayan
Saha Sujan Kumar
Tarek Shams
Tehranipoor Mark
Yahyaei Katayoon
Zhou Jingbo
Publication venue
Publication date: 09/10/2023
Field of study

As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding, advanced reasoning, and program synthesis tasks. Recognizing an opportunity, our research delves into leveraging the emergent capabilities of Generative Pre-trained Transformers (GPTs) to address the existing gaps in SoC security, aiming for a more efficient, scalable, and adaptable methodology. By integrating LLMs into the SoC security verification paradigm, we open a new frontier of possibilities and challenges to ensure the security of increasingly complex SoCs. This paper offers an in-depth analysis of existing works, showcases practical case studies, demonstrates comprehensive experiments, and provides useful promoting guidelines. We also present the achievements, prospects, and challenges of employing LLM in different SoC security verification tasks.Comment: 42 page

arXiv.org e-Print Archive

A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Author: Boughanem Mohand
Dkaki Taoufik
Prasad Nishchal
Publication venue
Publication date: 25/09/2023
Field of study

Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores

arXiv.org e-Print Archive

OPEN DOMAIN TARGETED SENTIMENT POLARITY DETECTION FOR MICRO-BLOGS IN SOCIAL MEDIA

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Multiphysics Modeling And Simulation Process To Develop Thin Piezoelectric Film Sensors To Measure The Vibration Of Structures With Complex Shapes And Boundary Conditions.

Author: Rahim Md. Razib
Publication venue: UND Scholarly Commons
Publication date: 01/01/2012
Field of study

Piezoelectricity was discovered in 1880 by Jacques and Pierre Curie. Its application has since been extended to actuators and sensors, widely used in industry, automotive, and aerospace applications. The last two decades have seen intensive research in piezoelectric theory in an effort to effectively capture and control the distinctive coupling of electricity and elasticity. However, due to the complexity of the theory involved, finite element and numerical methods are often used in the process. Limited analytical exact solutions are also found in literature. The objective of this work is to devise a multiphysics modeling and simulation process to develop thin piezoelectric film sensors to measure the vibration of structures with complex shapes and boundary conditions. First, the output charge of generic piezoelectric films, respectively attached to a beam and a plate, is modeled using ANSYS and experimentally verified. Second, the modeling method is extended to a cylindrical shell followed by experimental verifications. Appropriate material properties obtained from past researches were incorporated as required. Finally, shaped sensors for the measurement of specific dynamic characteristics of a beam, plate and cylindrical shell respectively, are developed and experimentally validated. The results show that Multiphysics modeling can be an efficient design tool and be effectively used to simulate complex systems. This tool can be also used to detect or simulate design flaws and errors

UND Scholarly Commons (University of North Dakota)

Improved Instruction Ordering in Recipe-Grounded Conversation

Author: Guo Ruohao
Le Duong Minh
Ritter Alan
Xu Wei
Publication venue
Publication date: 26/05/2023
Field of study

In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions in the correct order. We hypothesize that this is due to the model's lack of understanding of user intent and inability to track the instruction state (i.e., which step was last instructed). Therefore, we propose to explore two auxiliary subtasks, namely User Intent Detection and Instruction State Tracking, to support Response Generation with improved instruction grounding. Experimenting with our newly collected dataset, ChattyChef, shows that incorporating user intent and instruction state information helps the response generation model mitigate the incorrect order issue. Furthermore, to investigate whether ChatGPT has completely solved this task, we analyze its outputs and find that it also makes mistakes (10.7% of the responses), about half of which are out-of-order instructions. We will release ChattyChef to facilitate further research in this area at: https://github.com/octaviaguo/ChattyChef.Comment: Accepted at ACL 2023 main conferenc

arXiv.org e-Print Archive

Conversational Agents in Education – A Systematic Literature Review

Author: Gebbing Pia
Khosrawi-Rad Bijan
Lattemann Christoph
Markgraf Daniel
Rinn Heidi
Robra-Bissantz Susanne
Schlimbach Ricarda
Yang Xingyue
Publication venue: AIS Electronic Library (AISeL)
Publication date: 18/06/2022
Field of study

Conversational Agents (CAs) are widely spread in a variety of domains, such as health and customer service. There is a recent trend of increasing publications and implementations of CAs in education. We conduct a systematic literature review to identify common methodologies, pedagogical CA roles, addressed target groups, the technologies and theories behind, as well as human-like design aspects. The initially found 3329 records were systematically reduced to 252 fully coded articles. Based on the analysis of the codings, we derive further research streams. Our results reveal a research gap for long-term studies on the use of CAs in education, and there is insufficient holistic design knowledge for pedagogical CAs. Moreover, target groups other than academic students are rarely considered. We condense our findings in a morphological box and conclude that pedagogical CAs have not yet reached their full potential of long-term practical application in education

AIS Electronic Library (AISeL)