51 research outputs found
Machine Learning for High-entropy Alloys: Progress, Challenges and Opportunities
High-entropy alloys (HEAs) have attracted extensive interest due to their
exceptional mechanical properties and the vast compositional space for new
HEAs. However, understanding their novel physical mechanisms and then using
these mechanisms to design new HEAs are confronted with their high-dimensional
chemical complexity, which presents unique challenges to (i) the theoretical
modeling that needs accurate atomic interactions for atomistic simulations and
(ii) constructing reliable macro-scale models for high-throughput screening of
vast amounts of candidate alloys. Machine learning (ML) sheds light on these
problems with its capability to represent extremely complex relations. This
review highlights the success and promising future of utilizing ML to overcome
these challenges. We first introduce the basics of ML algorithms and
application scenarios. We then summarize the state-of-the-art ML models
describing atomic interactions and atomistic simulations of thermodynamic and
mechanical properties. Special attention is paid to phase predictions,
planar-defect calculations, and plastic deformation simulations. Next, we
review ML models for macro-scale properties, such as lattice structures, phase
formations, and mechanical properties. Examples of machine-learned
phase-formation rules and order parameters are used to illustrate the workflow.
Finally, we discuss the remaining challenges and present an outlook of research
directions, including uncertainty quantification and ML-guided inverse
materials design.Comment: This review paper has been accepted by Progress in Materials Scienc
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Large language models (LLMs) have been shown to perform well at a variety of
syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed
in many forms including conversational agents that interact with humans, we
lack a grounded benchmark to measure how well LLMs understand \textit{social}
language. Here, we introduce a new theory-driven benchmark, SocKET, that
contains 58 NLP tasks testing social knowledge which we group into five
categories: humor & sarcasm, offensiveness, sentiment & emotion, and
trustworthiness. In tests on the benchmark, we demonstrate that current models
attain only moderate performance but reveal significant potential for task
transfer among different types and categories of tasks, which were predicted
from theory. Through zero-shot evaluations, we show that pretrained models
already possess some innate but limited capabilities of social language
understanding and training on one category of tasks can improve zero-shot
testing on others. Our benchmark provides a systematic way to analyze model
performance on an important dimension of language and points to clear room for
improvement to build more socially-aware LLMs. The associated resources are
released at https://github.com/minjechoi/SOCKET.Comment: 24 pages, 7 tables, 5 figure
Unveiling the Implicit Toxicity in Large Language Models
The open-endedness of large language models (LLMs) combined with their
impressive capabilities may lead to new safety issues when being exploited for
malicious use. While recent studies primarily focus on probing toxic outputs
that can be easily detected with existing toxicity classifiers, we show that
LLMs can generate diverse implicit toxic outputs that are exceptionally
difficult to detect via simply zero-shot prompting. Moreover, we propose a
reinforcement learning (RL) based attacking method to further induce the
implicit toxicity in LLMs. Specifically, we optimize the language model with a
reward that prefers implicit toxic outputs to explicit toxic and non-toxic
ones. Experiments on five widely-adopted toxicity classifiers demonstrate that
the attack success rate can be significantly improved through RL fine-tuning.
For instance, the RL-finetuned LLaMA-13B model achieves an attack success rate
of 90.04% on BAD and 62.85% on Davinci003. Our findings suggest that LLMs pose
a significant threat in generating undetectable implicit toxic outputs. We
further show that fine-tuning toxicity classifiers on the annotated examples
from our attacking method can effectively enhance their ability to detect
LLM-generated implicit toxic language. The code is publicly available at
https://github.com/thu-coai/Implicit-Toxicity.Comment: EMNLP 2023 Main Conferenc
POTATO: The Portable Text Annotation Tool
We present POTATO, the Portable text annotation tool, a free, fully
open-sourced annotation system that 1) supports labeling many types of text and
multimodal data; 2) offers easy-to-configure features to maximize the
productivity of both deployers and annotators (convenient templates for common
ML/NLP tasks, active learning, keypress shortcuts, keyword highlights,
tooltips); and 3) supports a high degree of customization (editable UI,
inserting pre-screening questions, attention and qualification tests).
Experiments over two annotation tasks suggest that POTATO improves labeling
speed through its specially-designed productivity features, especially for long
documents and complex tasks. POTATO is available at
https://github.com/davidjurgens/potato and will continue to be updated.Comment: EMNLP 2022 DEM
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Despite its relevance, the maturity of NLP for social media pales in
comparison with general-purpose models, metrics and benchmarks. This fragmented
landscape makes it hard for the community to know, for instance, given a task,
which is the best performing model and how it compares with others. To
alleviate this issue, we introduce a unified benchmark for NLP evaluation in
social media, SuperTweetEval, which includes a heterogeneous set of tasks and
datasets combined, adapted and constructed from scratch. We benchmarked the
performance of a wide range of models on SuperTweetEval and our results suggest
that, despite the recent advances in language modelling, social media remains
challenging.Comment: EMNLP 2023 Finding
- …