Search CORE

5 research outputs found

Automatic Text Simplification for People with Intellectual Disabilities

Author: Chen Ping
Djamasbi Soussan
Fay Peter
Kennedy David N.
Rochford John
Scott Will
Publication venue: eScholarship@UMassChan
Publication date: 17/07/2017
Field of study

Text simplification (TS) aims to reduce the lexical and structural complexity of a text, while still retaining the semantic meaning. Current automatic TS techniques are limited to either lexical-level applications or manually defining a large amount of rules. In this paper, we propose to simplify text from both level of lexicons and sentences. We conduct preliminary experiments to find that our approach shows promising results

eScholarship@UMMS

Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules

Author: Angrosh M.A.
Siddharthan Advaith
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We present an approach to text simplification based on synchronous dependency grammars. The higher level of abstraction afforded by dependency representations allows for a linguistically sound treatment of complex constructs requiring reordering and morphological change, such as conversion of passive voice to active. We present a synchronous grammar formalism in which it is easy to write rules by hand and also acquire them automatically from dependency parses of aligned English and Simple English sentences. The grammar formalism is optimised for monolingual translation in that it reuses ordering information from the source sentence where appropriate. We demonstrate the superiority of our approach over a leading contemporary system based on quasi-synchronous tree substitution grammars, both in terms of expressivity and performance

Aberdeen University Research

Crossref

Open Research Online (The Open University)

Sentence simplification, compression, and disaggregation for summarization of sophisticated documents

Author: Finegan‐dollak Catherine
Radev Dragomir R.
Publication venue: 'Wiley'
Publication date: 23/09/2015
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134176/1/asi23576.pd

Crossref

Deep Blue Documents at the University of Michigan

Complex word identification model for lexical simplification in the Malay language for non-native speakers

Author: Salehah Omar
Publication venue
Publication date: 01/01/2023
Field of study

Text Simplification (TS) is the process of converting complex text into more easily understandable text. Lexical Simplification (LS), a method in TS, is the task of converting words into simpler words. Past studies have shown weaknesses in the LS first task, called Complex Word Identification (CWI), where simple and complex words have been misidentified in previous CWI model. The main objective of this study is to produce a Malay CWI model with three sub-objectives, i) To propose a dataset based on the state-of-the-art Malay corpus, ii) To produce a Malay CWI model, and iii) To perform an evaluation based on the standard statistical metrics; accuracy, precision, recall, F1-score, and G1-score. This model is constructed based on the development of the CWI model outlined by the previous researcher. This study consists of three modules, i) A Malay CWI dataset, ii) Malay CWI features with the new enhanced stemmer rules, and iii) A CWI model based on the Gradient Boosted Tree (GB) algorithm. The model is evaluated based on a state-of-the-art Malay corpus. This corpus is divided into training and testing data using k-fold cross-validation, where k=10. A series of tests were performed to ensure the best model was produced, including feature selection, generation of an improved stemmer algorithm, data imbalances, and classifier testing. The best model using the Gradient Boost algorithm showed an average accuracy of 92.55%, F1- score of 92.09% and G1-score of 89.7%. The F1-score was better than the English standard baseline score, with an increased difference of 16.3%. Three linguistic experts verified the results for 38 unseen sentences, and the results showed significantly positive results between the model built and the linguistic experts’ assessment. The proposed CWI model has improved the F1- score that has been obtained in second CWI shared task and positively affected non-native speakers and researchers

Universiti Utara Malaysia: UUM eTheses