Search CORE

2,195 research outputs found

BYOC: Personalized Few-Shot Classification with Co-Authored Class Descriptions

Author: Bohra Arth
Campagna Giovanni
Harutyunyan Artem
Verkes Govert
Weinberger Pascal
Publication venue
Publication date: 09/10/2023
Field of study

Text classification is a well-studied and versatile building block for many NLP applications. Yet, existing approaches require either large annotated corpora to train a model with or, when using large language models as a base, require carefully crafting the prompt as well as using a long context that can fit many examples. As a result, it is not possible for end-users to build classifiers for themselves. To address this issue, we propose a novel approach to few-shot text classification using an LLM. Rather than few-shot examples, the LLM is prompted with descriptions of the salient features of each class. These descriptions are coauthored by the user and the LLM interactively: while the user annotates each few-shot example, the LLM asks relevant questions that the user answers. Examples, questions, and answers are summarized to form the classification prompt. Our experiments show that our approach yields high accuracy classifiers, within 82% of the performance of models trained with significantly larger datasets while using only 1% of their training sets. Additionally, in a study with 30 participants, we show that end-users are able to build classifiers to suit their specific needs. The personalized classifiers show an average accuracy of 90%, which is 15% higher than the state-of-the-art approach.Comment: Accepted at EMNLP 2023 (Findings

arXiv.org e-Print Archive

Characterizing and Predicting Email Deferral Behavior

Author: Awadallah Ahmed Hassan
Dumais Susan T.
Lee Chia-Jung
Lin Christopher H.
Sarrafzadeh Bahareh
Shokouhi Milad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/01/2019
Field of study

Email triage involves going through unhandled emails and deciding what to do with them. This familiar process can become increasingly challenging as the number of unhandled email grows. During a triage session, users commonly defer handling emails that they cannot immediately deal with to later. These deferred emails, are often related to tasks that are postponed until the user has more time or the right information to deal with them. In this paper, through qualitative interviews and a large-scale log analysis, we study when and what enterprise email users tend to defer. We found that users are more likely to defer emails when handling them involves replying, reading carefully, or clicking on links and attachments. We also learned that the decision to defer emails depends on many factors such as user's workload and the importance of the sender. Our qualitative results suggested that deferring is very common, and our quantitative log analysis confirms that 12% of triage sessions and 16% of daily active users had at least one deferred email on weekdays. We also discuss several deferral strategies such as marking emails as unread and flagging that are reported by our interviewees, and illustrate how such patterns can be also observed in user logs. Inspired by the characteristics of deferred emails and contextual factors involved in deciding if an email should be deferred, we train a classifier for predicting whether a recently triaged email is actually deferred. Our experimental results suggests that deferral can be classified with modest effectiveness. Overall, our work provides novel insights about how users handle their emails and how deferral can be modeled

arXiv.org e-Print Archive

Crossref

Genres in young learner L2 English writing: A genre typology for the TRAWL (Tracking Written Learner Language) corpus

Author: Hasund Ingrid Kristine
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

In learner corpus research, it is well known that one should control for genre when collecting and analysing written L2 (second language) English data, as genre is one factor that has been shown to account for language variation. This article presents a genre typology for annotating learner texts from the lower secondary level in Norway (ages 13-15, school years 8-10). The data are drawn from TRAWL (Tracking Written Learner Language), a new learner corpus currently under compilation. As the TRAWL corpus will be openly available for research, it is important that the typology is clearly described, which is the primary aim of the present study. Little research has been carried out on younger learners, and no detailed genre typology exists for classifying learner texts at the lower secondary level. Therefore, a genre typology developed by Ørevik (2019) for the upper secondary level was tested on data from TRAWL using a functional, social semiotic perspective and a mixed-methods (quantitative and qualitative) approach. The analysis showed that Ørevik’s typology was largely suitable for annotating the selected TRAWL data and only had to be slightly modified. By highlighting some of the theoretical and methodological challenges with the genre typology, the analysis may inform discussions about genre in L2 English teaching, which was a secondary aim of the present study. Not only do the results mirror the tensions in the international debate within genre research, they also mirror the everyday challenges of lower secondary school teachers/examiners, who seem to adopt an eclectic approach to genre.publishedVersio

Agder University Research Archive

CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

Author: Aghaei Ehsan
Al-Shaer Ehab
Publication venue
Publication date: 06/09/2023
Field of study

This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE categorization and proactive countermeasure initiation. The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation. TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions. It initially extracts threat actions from unstructured cyber threat reports using Semantic Role Labeling (SRL) techniques. These actions, along with their contextual attributes, are correlated with MITRE's attack functionality classes. This automated correlation facilitates the creation of labeled data, essential for categorizing novel threat actions into threat functionality classes and TTPs. The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques. TTPpredictor outperforms state-of-the-art language model tools like ChatGPT. Overall, this paper offers a robust solution for linking CVEs to potential attack techniques, enhancing cybersecurity practitioners' ability to proactively identify and mitigate threats

arXiv.org e-Print Archive

Caring for the patient, caring for the record: an ethnographic study of 'back office' work in upholding quality of care in general practice

Author: A Majeed
C Geertz
C Pope
D Lusignan
D Swinglehurst
D Swinglehurst
D Swinglehurst
D Swinglehurst
D Swinglehurst
Deborah Swinglehurst
E Goffman
G Bowker
G Rose
Health and Social Care Information Centre
J Blomberg
J Mitchell
JD Robinson
K Jordan
K Thiru
L Suchman
L Suchman
M Berg
M Berg
M Berg
M Hammersley
N Cochran
R Iedema
S Barley
S Lusignan de
S Lusignan de
S Lusignan de
S Ormrod
SL Star
T Greenhalgh
T Greenhalgh
Trisha Greenhalgh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© 2015 Swinglehurst and Greenhalgh; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Additional file 1: Box 1. Field notes on summarising (Clover Surgery). Box 2. Extract of document prepared for GPs by summarisers at Clover Surgery. Box 3. Fieldnotes on coding incoming post, Clover (original notes edited for brevity).This work was funded by a research grant from the UK Medical Research Council (Healthcare Electronic Records in Organisations 07/133) and a National Institute of Health Research doctoral fellowship award for DS (RDA/03/07/076). The funders were not involved in the selection or analysis of data nor did they make any contribution to the content of the final manuscript

Crossref

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

Queen Mary Research Online

Nanoinformatics 2010 Program

Author: Baker Nathan A
Chaka Anne
Cohen Yoram
Colvin Vicki
Fritts Martin
Geraci Charles L.
Hoover Mark D
Ku Sharon
Kulinowski Kristen M
Lippell Phil
Luo James
McLennan Michael
Morse Jeffrey
Ostraat Michele L
Rajan Krishna
Reznik-Zellen Rebecca
Schad Peter
Tuominen Mark T.
Publication venue
Publication date: 01/11/2010
Field of study

InterNano Nanomanufacturing Repository

Argument Diagramming: Annotation and Evaluation

Author: Daniela Quintas Fernandes de Sá
Publication venue
Publication date: 11/07/2019
Field of study

Repositório Aberto da Universidade do Porto

Delivering Behaviour Change Interventions: Development of a Mode of Delivery Ontology [version 1; peer review: 1 approved, 1 approved with reservations]

Author: Carey RN
Evans F
Finnerty AN
Hastings J
Jenkins E
Johnston M
Marques MM
Michie S
Norris E
West R
Publication venue: 'F1000 Research Ltd'
Publication date: 10/06/2020
Field of study

Background: Investigating and improving the effects of behaviour change interventions requires detailed and consistent specification of all aspects of interventions. An important feature of interventions is the way in which these are delivered, i.e. their mode of delivery. This paper describes an ontology for specifying the mode of delivery of interventions, which forms part of the Behaviour Change Intervention Ontology, currently being developed in the Wellcome Trust funded Human Behaviour-Change Project. / Methods: The Mode of Delivery Ontology was developed in an iterative process of annotating behaviour change interventions evaluation reports, and consulting with expert stakeholders. It consisted of seven steps: 1) annotation of 110 intervention reports to develop a preliminary classification of modes of delivery; 2) open review from international experts (n=25); 3) second round of annotations with 55 reports to test inter-rater reliability and identify limitations; 4) second round of expert review feedback (n=16); 5) final round of testing of the refined ontology by two annotators familiar and two annotators unfamiliar with the ontology; 6) specification of ontological relationships between entities; and 7) transformation into a machine-readable format using the Web Ontology Language (OWL) language and publishing online. / Results: The resulting ontology is a four-level hierarchical structure comprising 65 unique modes of delivery, organised by 15 upper-level classes: Informational, Environmental change, Somatic, Somatic alteration, Individual-based/ Pair-based /Group-based, Uni-directional/Interactional, Synchronous/ Asynchronous, Push/ Pull, Gamification, Arts feature. Relationships between entities consist of is_a. Inter-rater reliability of the Mode of Delivery Ontology for annotating intervention evaluation reports was a=0.80 (very good) for those familiar with the ontology and a= 0.58 (acceptable) for those unfamiliar with it. / Conclusion: The ontology can be used for both annotating and writing behaviour change intervention evaluation reports in a consistent and coherent manner, thereby improving evidence comparison, synthesis, replication, and implementation of effective interventions

UCL Discovery

Recommended from our members

The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology

Author: Mayor Charlie
Publication venue
Publication date
Field of study

Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology. Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented. Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology. Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve. Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled ‘open ontologies’ for gene products that can work alongside more traditional, ‘top-down’ developed vocabularies

City Research Online