15 research outputs found

    Multilingual Unsupervised Sentence Simplification

    Full text link
    Progress in Sentence Simplification has been hindered by the lack of supervised data, particularly in languages other than English. Previous work has aligned sentences from original and simplified corpora such as English Wikipedia and Simple English Wikipedia, but this limits corpus size, domain, and language. In this work, we propose using unsupervised mining techniques to automatically create training corpora for simplification in multiple languages from raw Common Crawl web data. When coupled with a controllable generation mechanism that can flexibly adjust attributes such as length and lexical complexity, these mined paraphrase corpora can be used to train simplification systems in any language. We further incorporate multilingual unsupervised pretraining methods to create even stronger models and show that by training on mined data rather than supervised corpora, we outperform the previous best results. We evaluate our approach on English, French, and Spanish simplification benchmarks and reach state-of-the-art performance with a totally unsupervised approach. We will release our models and code to mine the data in any language included in Common Crawl

    Literature Survey on Interaction Design and Existing Software Applications for Dyslectic Users

    Get PDF

    Enabling text comprehensibility assessment for people with intellectual disabilities using a mobile application

    Full text link
    In research on Easy Language and automatic text simplification, it is imperative to evaluate the comprehensibility of texts by presenting them to target users and assessing their level of comprehension. Target readers often include people with intellectual or other disabilities, which renders conducting experiments more challenging and time-consuming. In this paper, we introduce Okra, an openly available touchscreen-based application to facilitate the inclusion of people with disabilities in studies of text comprehensibility. It implements several tasks related to reading comprehension and cognition and its user interface is optimized toward the needs of people with intellectual disabilities (IDs). We used Okra in a study with 16 participants with IDs and tested for effects of modality, comparing reading comprehension results when texts are read on paper and on an iPad. We found no evidence of such an effect on multiple-choice comprehension questions and perceived difficulty ratings, but reading time was significantly longer on paper. We also tested the feasibility of assessing cognitive skill levels of participants in Okra, and discuss problems and possible improvements. We will continue development of the application and use it for evaluating automatic text simplification systems in the future

    Reference-less Quality Estimation of Text Simplification Systems

    Get PDF
    International audienceThe evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics

    Text Simplifier Based on Syntax Analysis

    Get PDF
    Bakalureusetöö kirjeldab teksti lihtsustamist, keskendudes peamiselt süntaktilisele lihtsustamisele. Inglise keele puhul on süntaktilise lihtsustamise probleemi käsitletud arvukates teadustöödes. Neid tulemusi rakendatakse bakalaureusetöös eesti keelele. Töö eesmärgiks oli luua veebirakendusena teksti lihtsustaja, mille peamiseks lihtsustamismeetodiks oleks lihtlausestamine, s.t liitlausete jagamine lihtlauseteks. Lihtsustaja kasutab süntaksianalüüsiks loomuliku keele töötluse paketti EstNLTK.This bachelor’s thesis gives an overview of text simplification, focusing specifically on syntactic simplification to bring it’s well-researched theory in English over into Estonian. The purpose of the thesis is to create a web-based text simplification application with it’s main method of simplification being sentence splitting. For syntax analysis, the simplifier uses the Estonian natural language toolkit – EstNLTK

    Does splitting make sentence easier?

    Get PDF
    In this study, we focus on sentence splitting, a subfield of text simplification, motivated largely by an unproven idea that if you divide a sentence in pieces, it should become easier to understand. Our primary goal in this study is to find out whether this is true. In particular, we ask, does it matter whether we break a sentence into two, three, or more? We report on our findings based on Amazon Mechanical Turk. More specifically, we introduce a Bayesian modeling framework to further investigate to what degree a particular way of splitting the complex sentence affects readability, along with a number of other parameters adopted from diverse perspectives, including clinical linguistics, and cognitive linguistics. The Bayesian modeling experiment provides clear evidence that bisecting the sentence leads to enhanced readability to a degree greater than when we create simplification with more splits

    Ylen uutissivuston käytettävyysanalyysi lukivaikeuksisten käyttäjien näkökulmasta

    Get PDF
    Lukivaikeus on neurobiologinen erityistarve, joka liittyy esimerkiksi tekstin lukemisen ja kirjoittamisen ongelmiin. Suomen Yleisradio tai Yle on suomen valtakunnallinen yleisradioyhtiö, joka tavoittaa noin puoli miljoonaa uniikkia käyttäjää viikoittain. Ylen roolin vuoksi sen on pystyttävä palvelemaan kaikkia käyttäjiään heidän erityistarpeistaan riippumatta, ja sen yleinen käytettävyys on huomioitu Esteettömyys Huomioitu-leimalla. Tämän tutkimuksen tarkoituksena oli testata Ylen Uutispalvelun käytettävyyttä lukivaikeuksisten henkilöiden näkökulmasta, kartoittaa kirjallisuuden pohjalta kerättyjen ratkaisujen subjektiivista ja objektiivista luettavuutta katseenseurantatestien ja nettikyselyn avulla. Nettikyselyssä ja katseenseurantatesteissä kartoitettiin muun muassa kirjasimen koon, värin, tyypin, rivivälin ja merkkivälin subjektiivista ja objektiivista luettavuutta. Testeissä verrattiin myös Ylen normaaleja uutisartikkeleita ja niiden helpotettuja versioita, niin sanottuja selkouutisia toisiinsa muutamassa kategoriassa. Ylen sivusto suoriutui kirjallisuuskatsauksen pohjalta suoritetusta tutkimuksesta hyvin, joskin sen sivuissa löytyi jonkin verran parannettavaa esimerkiksi sivuston elementtien kontrastien ja sivuston yleisen asettelun, eli layoutin osalta. Katseenseuranta- ja internet-kyselytestien datan pohjalta ei löydetty montaa tilastollisesti merkittävää subjektiivista tai objektiivista parannusta, mutta niistä saatu data ja vapaa käyttäjäpalaute antoivat useita mahdollisia suuntia jatkotutkimuksia ajatellen

    Creating New Pathways to Justice Using Simple Artificial Intelligence and Online Dispute Resolution

    Get PDF
    Access to justice in can be improved significantly through implementation of simple artificial intelligence (AI) based expert systems deployed within a broader online dispute resolution (ODR) framework. Simple expert systems can bridge the ‘implementation gap’ that continues to impede the adoption of AI in the justice domain. This gap can be narrowed further through the design of multi-disciplinary expert systems that address user needs through simple, non-legalistic user interfaces. This article provides a non-technical conceptual description of an expert system designed to enhance access to justice for non-experts. The system’s knowledge base would be populated with expert knowledge from the justice and dispute resolution domains. A conditional logic rule-based system forms the basis of the inference engine located between the knowledge base and a questionnaire-based user interface. The expert system’s functions include problem diagnosis, delivery of customized information, self-help support, triage and streaming into subsequent ODR processes. Its usability is optimized through the engagement of human computer interaction (HCI) and effective computing techniques that engage the social and emotional sides of technology. The conceptual descriptions offered in this article draw support from empirical observations of an innovative project aimed at creating an expert system for an ODR-enabled civil justice tribunal
    corecore