Influence of Term Familiarity in Readability of Spanish e-Government Web Information

Abstract

It is well known that linguistic features of a written text affect its readability, understanding readability as the ease with which a reader can understand the text. This paper is focused on the analysing of the influence of some linguistic features on the readability of current Spanish e-Government websites. Specifically, the “familiarity” of the terms on web pages, as well as the “frequency” of these terms are studied, among others. Firstly, this research has analysed a corpus extracted from the current information websites of the Spanish eGovernment and its simplified counterparts. Then, using machine learning methods, a supervised model is built on the influence of different term familiarity lists on text readability in the corpus. Different term lists have been tested and it has been concluded that the differences between them have a great impact on their performance. An accuracy of 81% has been achieved with a combination of frequency lists. As a conclusion, term lists and the frequencies of the terms allow to determine to a high degree the difficulty of understanding the text.Work supported by the Spanish Ministry of Economy, Industry and Competitiveness, (CSO2017-86747-R)

    Similar works