Search CORE

3,412 research outputs found

Obfuscating Authorship: Results of a User Study on Nondescript, a Digital Privacy Tool

Author: Davis Robin Camille
Publication venue: CUNY Academic Works
Publication date: 01/02/2019
Field of study

For those who write anonymously, particularly for safety reasons, authorship attribution poses a threat. Nondescript, my web app, guides writers in achieving stylometric obfuscation in order to preserve anonymity. The app runs simulations of authorship attribution scenarios by analyzing the user’s linguistic features. In this paper, I will describe the conception of the Nondescript app; discuss related work; and present the results of a user study. Most users in the study were able to anonymize their writing in at least 5 out of 10 authorship attribution scenarios. Users rated the anonymization process an average of 3.6 out of 5 in terms of ease of use. This work-in-progress project is situated in two domains: privacy technologies and computational linguistics

City University of New York

Artificial exam scorer for efficient marking and grading of short essay tests

Author: Menya Edmond Odhiambo
Publication venue: Strathmore University
Publication date: 01/01/2018
Field of study

Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore UniversityLearning is an integral aspect to the development of students as well as progressing of a society. The process is always marked with milestones from class work to semester projects and eventually examinations. Students are always required, as a standard, to sit for an instructor set exam paper. The grade and scores that the student garners indicator of progress, amount of knowledge acquired as well as whether or not the student is qualified for the next academic level. Exams are thus an imperative aspect in the academic life cycle and a critical one for that matter. However, the examinations marking and grading process has been marred with inefficiencies, irregularities and unethical practices over the years. This study aimed at achieving the automation of the exam marking process. This approach seeks to introduce efficiencies cutting down time and cost involved in examinations marking in addition to eliminating human bias in the marking process. Research objectives were centered around studying accuracy levels of past exam papers marked by human instructors, reviewing challenges linked to the examination marking process, reviewing existing models, frameworks, architectures and algorithms that have tried exam marking automation, to develop an improved algorithm-based solution that is efficient for the marking problem and performing of experiments to validate the algorithm. The research engaged experimental research experimenting the relation between keywords, synonyms and their related words involvement in artificial marking and marking accuracy. The outcome is an algorithm that mines related words and counts between scheme and student answer to mark exams. The findings were that the model achieves an improved marking accuracy by a margin of 16% from 73% to 89%. The model achieved more accuracy when grading lower mark answers achieving 99.9% when marking 1-mark answers

SU+ Digital Repository

Deep Learning for Text Style Transfer: A Survey

Author: Hu Zhiting
Jin Di
Jin Zhijing
Mihalcea Rada
Vechtomova Olga
Publication venue
Publication date: 16/12/2021
Field of study

Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task. Our curated paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Automatic Image Captioning with Style

Author: Mathews Alexander Patrick
Publication venue
Publication date: 01/01/2018
Field of study

This thesis connects two core topics in machine learning, vision and language. The problem of choice is image caption generation: automatically constructing natural language descriptions of image content. Previous research into image caption generation has focused on generating purely descriptive captions; I focus on generating visually relevant captions with a distinct linguistic style. Captions with style have the potential to ease communication and add a new layer of personalisation. First, I consider naming variations in image captions, and propose a method for predicting context-dependent names that takes into account visual and linguistic information. This method makes use of a large-scale image caption dataset, which I also use to explore naming conventions and report naming conventions for hundreds of animal classes. Next I propose the SentiCap model, which relies on recent advances in artificial neural networks to generate visually relevant image captions with positive or negative sentiment. To balance descriptiveness and sentiment, the SentiCap model dynamically switches between two recurrent neural networks, one tuned for descriptive words and one for sentiment words. As the first published model for generating captions with sentiment, SentiCap has influenced a number of subsequent works. I then investigate the sub-task of modelling styled sentences without images. The specific task chosen is sentence simplification: rewriting news article sentences to make them easier to understand. For this task I design a neural sequence-to-sequence model that can work with limited training data, using novel adaptations for word copying and sharing word embeddings. Finally, I present SemStyle, a system for generating visually relevant image captions in the style of an arbitrary text corpus. A shared term space allows a neural network for vision and content planning to communicate with a network for styled language generation. SemStyle achieves competitive results in human and automatic evaluations of descriptiveness and style. As a whole, this thesis presents two complete systems for styled caption generation that are first of their kind and demonstrate, for the first time, that automatic style transfer for image captions is achievable. Contributions also include novel ideas for object naming and sentence simplification. This thesis opens up inquiries into highly personalised image captions; large scale visually grounded concept naming; and more generally, styled text generation with content control

The Australian National University

Linguistic Refactoring of Business Process Models

Author: Pittke Fabian
Publication venue
Publication date: 01/11/2015
Field of study

In the past decades, organizations had to face numerous challenges due to intensifying globalization and internationalization, shorter innovation cycles and growing IT support for business. Business process management is seen as a comprehensive approach to align business strategy, organization, controlling, and business activities to react flexibly to market changes. For this purpose, business process models are increasingly utilized to document and redesign relevant parts of the organization's business operations. Since companies tend to have a growing number of business process models stored in a process model repository, analysis techniques are required that assess the quality of these process models in an automatic fashion. While available techniques can easily check the formal content of a process model, there are only a few techniques available that analyze the natural language content of a process model. Therefore, techniques are required that address linguistic issues caused by the actual use of natural language. In order to close this gap, this doctoral thesis explicitly targets inconsistencies caused by natural language and investigates the potential of automatically detecting and resolving them under a linguistic perspective. In particular, this doctoral thesis provides the following contributions. First, it defines a classification framework that structures existing work on process model analysis and refactoring. Second, it introduces the notion of atomicity, which implements a strict consistency condition between the formal content and the textual content of a process model. Based on an explorative investigation, we reveal several reoccurring violation patterns are not compliant with the notion of atomicity. Third, this thesis proposes an automatic refactoring technique that formalizes the identified patterns to transform a non-atomic process models into an atomic one. Fourth, this thesis defines an automatic technique for detecting and refactoring synonyms and homonyms in process models, which is eventually useful to unify the terminology used in an organization. Fifth and finally, this thesis proposes a recommendation-based refactoring approach that addresses process models suffering from incompleteness and leading to several possible interpretations. The efficiency and usefulness of the proposed techniques is further evaluated by real-world process model repositories from various industries. (author's abstract

Elektronische Publikationen der Wirtschaftsuniversität Wien

CERN Document Server

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

Author: Dou Yao
Heineman David
Maddela Mounica
Xu Wei
Publication venue
Publication date: 22/10/2023
Field of study

Large language models (e.g., GPT-4) are uniquely capable of producing highly rated text simplification, yet current human evaluation methods fail to provide a clear understanding of systems' specific strengths and weaknesses. To address this limitation, we introduce SALSA, an edit-based human annotation framework that enables holistic and fine-grained text simplification evaluation. We develop twenty one linguistically grounded edit types, covering the full spectrum of success and failure across dimensions of conceptual, syntactic and lexical simplicity. Using SALSA, we collect 19K edit annotations on 840 simplifications, revealing discrepancies in the distribution of simplification strategies performed by fine-tuned models, prompted LLMs and humans, and find GPT-3.5 performs more quality edits than humans, but still exhibits frequent errors. Using our fine-grained annotations, we develop LENS-SALSA, a reference-free automatic simplification metric, trained to predict sentence- and word-level quality simultaneously. Additionally, we introduce word-level quality estimation for simplification and report promising baseline results. Our data, new metric, and annotation toolkit are available at https://salsa-eval.com.Comment: Accepted to EMNLP 202

arXiv.org e-Print Archive