45,250 research outputs found
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
Large-scale language models (LLMs) has shown remarkable capability in various
of Natural Language Processing (NLP) tasks and attracted lots of attention
recently. However, some studies indicated that large language models fail to
achieve promising result beyond the state-of-the-art models in English
grammatical error correction (GEC) tasks. In this report, we aim to explore the
how large language models perform on Chinese grammatical error correction tasks
and provide guidance for future work. We conduct experiments with 3 different
LLMs of different model scale on 4 Chinese GEC dataset. Our experimental
results indicate that the performances of LLMs on automatic evaluation metrics
falls short of the previous sota models because of the problem of
over-correction. Furthermore, we also discover notable variations in the
performance of LLMs when evaluated on different data distributions. Our
findings demonstrates that further investigation is required for the
application of LLMs on Chinese GEC task
Automated Assessment of Students' Code Comprehension using LLMs
Assessing student's answers and in particular natural language answers is a
crucial challenge in the field of education. Advances in machine learning,
including transformer-based models such as Large Language Models(LLMs), have
led to significant progress in various natural language tasks. Nevertheless,
amidst the growing trend of evaluating LLMs across diverse tasks, evaluating
LLMs in the realm of automated answer assesment has not received much
attention. To address this gap, we explore the potential of using LLMs for
automated assessment of student's short and open-ended answer. Particularly, we
use LLMs to compare students' explanations with expert explanations in the
context of line-by-line explanations of computer programs.
For comparison purposes, we assess both Large Language Models (LLMs) and
encoder-based Semantic Textual Similarity (STS) models in the context of
assessing the correctness of students' explanation of computer code. Our
findings indicate that LLMs, when prompted in few-shot and chain-of-thought
setting perform comparable to fine-tuned encoder-based models in evaluating
students' short answers in programming domain
Several categories of Large Language Models (LLMs): A Short Survey
Large Language Models(LLMs)have become effective tools for natural language
processing and have been used in many different fields. This essay offers a
succinct summary of various LLM subcategories. The survey emphasizes recent
developments and efforts made for various LLM kinds, including task-based
financial LLMs, multilingual language LLMs, biomedical and clinical LLMs,
vision language LLMs, and code language models. The survey gives a general
summary of the methods, attributes, datasets, transformer models, and
comparison metrics applied in each category of LLMs. Furthermore, it highlights
unresolved problems in the field of developing chatbots and virtual assistants,
such as boosting natural language processing, enhancing chatbot intelligence,
and resolving moral and legal dilemmas. The purpose of this study is to provide
readers, developers, academics, and users interested in LLM-based chatbots and
virtual intelligent assistant technologies with useful information and future
directions
Benchmarking Large Language Models in Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating
the hallucination of large language models (LLMs). However, existing research
lacks rigorous evaluation of the impact of retrieval-augmented generation on
different large language models, which make it challenging to identify the
potential bottlenecks in the capabilities of RAG for different LLMs. In this
paper, we systematically investigate the impact of Retrieval-Augmented
Generation on large language models. We analyze the performance of different
large language models in 4 fundamental abilities required for RAG, including
noise robustness, negative rejection, information integration, and
counterfactual robustness. To this end, we establish Retrieval-Augmented
Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and
Chinese. RGB divides the instances within the benchmark into 4 separate
testbeds based on the aforementioned fundamental abilities required to resolve
the case. Then we evaluate 6 representative LLMs on RGB to diagnose the
challenges of current LLMs when applying RAG. Evaluation reveals that while
LLMs exhibit a certain degree of noise robustness, they still struggle
significantly in terms of negative rejection, information integration, and
dealing with false information. The aforementioned assessment outcomes indicate
that there is still a considerable journey ahead to effectively apply RAG to
LLMs
The Role of Large Language Models in Enhancing Cybersecurity Measures: Empirical Evidence from Regional Banking Institutions
The rapid advancements in artificial intelligence (AI) and machine learning (ML) have significantly influenced the cybersecurity landscape, particularly in the banking sector, where threats are increasingly sophisticated. Large Language Models (LLMs) such as OpenAI’s GPT-4 and Google’s BERT, offer novel approaches to threat detection, fraud prevention, and automated risk assessment. This paper explores the integration of Large Language Models (LLMs) in cybersecurity frameworks within financial institutions, highlighting their role in real-time anomaly detection, predictive analytics, and intelligent automation of security operations. By leveraging LLMs, banks can enhance their cybersecurity resilience, mitigate cyber threats, and improve regulatory compliance. However, challenges such as data privacy concerns, adversarial attacks, and computational resource demands must be addressed to ensure the secure and ethical deployment of these models. This study provides insights into the current applications, benefits, and limitations of Large Language Models (LLMs) in strengthening cybersecurity measures in the banking sector
After-School Care and Parents' Labor Supply
Does after-school care provision promote mothers' employment and balance the allocation of paid work among parents of schoolchildren? We address this question by exploiting variation in cantonal (state) regulations of after-school care provision in Switzerland. To establish exogeneity of cantonal regulations with respect to employment opportunities and preferences of the population, we restrict our analysis to confined regions along cantonal borders. Using semi-parametric instrumental variable methods, we find a positive impact of after-school care provision on mothers' full-time employment, but a negative impact on fathers' full-time employment. Thus, the supply of after-school care fosters a convergence of parental working hours
Visual Literacy and New Technologies
This body of research addresses the connection between arts, identity and new technology, and investigates the impact of images on adolescent identities, the relationship between online modes of communication and cyber-bullying, the increasing visualization of information and explores the way drawing and critical analysis of imagery develops visual literacy.
Commissioned by Adobe Systems Pty Ltd, Australia (2003) to compile the Visual Literacy White Paper, Bamford’s report defines visual literacy and highlights its importance in the learning of such skill as problem solving and critical thinking. Providing strategies to promote visual literacy and emphasizing the role of technology in visual communication, this report has become a major reference for policy on visual literacy and cyber-bullying in the UK, USA and Asia
After-School Care and Parents' Labor Supply
Does after-school care provision promote mothers' employment and balance the allocation of paid work among parents of schoolchildren? We address this question by exploiting variation in cantonal (state) regulations of after-school care provision in Switzerland. To establish exogeneity of cantonal regulations with respect to employment opportunities and preferences of the population, we restrict our analysis to confined regions along cantonal borders. Using semi-parametric instrumental variable methods, we find a positive impact of after-school care provision on mothers' full-time employment, but a negative impact on fathers' full-time employment. Thus, the supply of after-school care fosters a convergence of parental working hours
Supervised Knowledge Makes Large Language Models Better In-context Learners
Large Language Models (LLMs) exhibit emerging in-context learning abilities
through prompt engineering. The recent progress in large-scale generative
models has further expanded their use in real-world language applications.
However, the critical challenge of improving the generalizability and
factuality of LLMs in natural language understanding and question answering
remains under-explored. While previous in-context learning research has focused
on enhancing models to adhere to users' specific instructions and quality
expectations, and to avoid undesired outputs, little to no work has explored
the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs'
in-context learning during the inference stage. Our primary contribution is the
establishment of a simple yet effective framework that enhances the reliability
of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs
benefit from discriminative models, and 3) minimizes hallucinations in
generative tasks. Using our proposed plug-in method, enhanced versions of Llama
2 and ChatGPT surpass their original versions regarding generalizability and
factuality. We offer a comprehensive suite of resources, including 16 curated
datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks.
The code and data are released at:
https://github.com/YangLinyi/Supervised-Knowledge-Makes-Large-Language-Models-Better-In-context-Learners.
Our empirical analysis sheds light on the advantages of incorporating
discriminative models into LLMs and highlights the potential of our methodology
in fostering more reliable LLMs.Comment: Accepted to ICLR 202
- …
