3,412 research outputs found
Obfuscating Authorship: Results of a User Study on Nondescript, a Digital Privacy Tool
For those who write anonymously, particularly for safety reasons, authorship attribution poses a threat. Nondescript, my web app, guides writers in achieving stylometric obfuscation in order to preserve anonymity. The app runs simulations of authorship attribution scenarios by analyzing the userâs linguistic features. In this paper, I will describe the conception of the Nondescript app; discuss related work; and present the results of a user study. Most users in the study were able to anonymize their writing in at least 5 out of 10 authorship attribution scenarios. Users rated the anonymization process an average of 3.6 out of 5 in terms of ease of use. This work-in-progress project is situated in two domains: privacy technologies and computational linguistics
Artificial exam scorer for efficient marking and grading of short essay tests
Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore UniversityLearning is an integral aspect to the development of students as well as progressing of a society. The process is always marked with milestones from class work to semester projects and eventually examinations. Students are always required, as a standard, to sit for an instructor set exam paper. The grade and scores that the student garners indicator of progress, amount of knowledge acquired as well as whether or not the student is qualified for the next academic level. Exams are thus an imperative aspect in the academic life cycle and a critical one for that matter. However, the examinations marking and grading process has been marred with inefficiencies, irregularities and unethical practices over the years. This study aimed at achieving the automation of the exam marking process. This approach seeks to introduce efficiencies cutting down time and cost involved in examinations marking in addition to eliminating human bias in the marking process. Research objectives were centered around studying accuracy levels of past exam papers marked by human instructors, reviewing challenges linked to the examination marking process, reviewing existing models, frameworks, architectures and algorithms that have tried exam marking automation, to develop an improved algorithm-based solution that is efficient for the marking problem and performing of experiments to validate the algorithm. The research engaged experimental research experimenting the relation between keywords, synonyms and their related words involvement in artificial marking and marking accuracy. The outcome is an algorithm that mines related words and counts between scheme and student answer to mark exams. The findings were that the model achieves an improved marking accuracy by a margin of 16% from 73% to 89%. The model achieved more accuracy when grading lower mark answers achieving 99.9% when marking 1-mark answers
Deep Learning for Text Style Transfer: A Survey
Text style transfer is an important task in natural language generation,
which aims to control certain attributes in the generated text, such as
politeness, emotion, humor, and many others. It has a long history in the field
of natural language processing, and recently has re-gained significant
attention thanks to the promising performance brought by deep neural models. In
this paper, we present a systematic survey of the research on neural text style
transfer, spanning over 100 representative articles since the first neural text
style transfer work in 2017. We discuss the task formulation, existing datasets
and subtasks, evaluation, as well as the rich methodologies in the presence of
parallel and non-parallel data. We also provide discussions on a variety of
important topics regarding the future development of this task. Our curated
paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Linguistic Refactoring of Business Process Models
In the past decades, organizations had to face numerous challenges due to intensifying globalization and internationalization, shorter innovation cycles and growing IT support for business. Business process management is seen as a comprehensive approach to align business strategy, organization, controlling, and business activities to react flexibly to market changes. For this purpose, business process models are increasingly utilized to document and redesign relevant parts of the organization's business operations. Since companies tend to have a growing number of business process models stored in a process model repository, analysis techniques are required that assess the quality of these process models in an automatic fashion. While available techniques can easily check the formal content of a process model, there are only a few techniques available that analyze the natural language content of a process model. Therefore, techniques are required that address linguistic issues caused by the actual use of natural language. In order to close this gap, this doctoral thesis explicitly targets inconsistencies caused by natural language and investigates the potential of automatically detecting and resolving them under a linguistic perspective. In particular, this doctoral thesis provides the following contributions. First, it defines a classification framework that structures existing work on process model analysis and refactoring. Second, it introduces the notion of atomicity, which implements a strict consistency condition between the formal content and the textual content of a process model. Based on an explorative investigation, we reveal several reoccurring violation patterns are not compliant with the notion of atomicity. Third, this thesis proposes an automatic refactoring technique that formalizes the identified patterns to transform a non-atomic process models into an atomic one. Fourth, this thesis defines an automatic technique for detecting and refactoring synonyms and homonyms in process models, which is eventually useful to unify the terminology used in an organization. Fifth and finally, this thesis proposes a recommendation-based refactoring approach that addresses process models suffering from incompleteness and leading to several possible interpretations. The efficiency and usefulness of the proposed techniques is further evaluated by real-world process model repositories from various industries. (author's abstract
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-4) are uniquely capable of producing highly
rated text simplification, yet current human evaluation methods fail to provide
a clear understanding of systems' specific strengths and weaknesses. To address
this limitation, we introduce SALSA, an edit-based human annotation framework
that enables holistic and fine-grained text simplification evaluation. We
develop twenty one linguistically grounded edit types, covering the full
spectrum of success and failure across dimensions of conceptual, syntactic and
lexical simplicity. Using SALSA, we collect 19K edit annotations on 840
simplifications, revealing discrepancies in the distribution of simplification
strategies performed by fine-tuned models, prompted LLMs and humans, and find
GPT-3.5 performs more quality edits than humans, but still exhibits frequent
errors. Using our fine-grained annotations, we develop LENS-SALSA, a
reference-free automatic simplification metric, trained to predict sentence-
and word-level quality simultaneously. Additionally, we introduce word-level
quality estimation for simplification and report promising baseline results.
Our data, new metric, and annotation toolkit are available at
https://salsa-eval.com.Comment: Accepted to EMNLP 202
- âŠ