3 research outputs found
All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch
Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts
and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten
different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information
Enhancing the Communication of Law: a cross-disciplinary investigation applying information technology
Law is pervasive in culture. It is a form of
communication between government and citizens. When effective,
it is a tool of government policy. If poorly designed,law
results in unnecessary costs to society. Impediments to
understanding of the law limits and distorts democratic
participation. Yet, historically, the law has been
inaccessible to most. Thus enhancing the communication of
law is an important and standing problem. Much work has
been done (for example through the plain language
movement) to improve the communication of law.
Nonetheless, the law remains largely unreadable to non-legal
users. This thesis applies information technology to investigate
and enhance the communication of law. To this end, this thesis
focusses on four main areas.To improve the readability of law, it
must be better described as a form of language. Corpus
linguistics is applied for this purpose. A linguistic
description of contract language arose from this work, which,
along with the corpus itself, has been made available to the
research community. The thesis also describes work for the
automatic classification of text in legal contracts by legal
function.Reliable measures for the readability of law are needed,
but they do not exist. To develop such measures, gold standard
data is needed to evaluate possible measures.To create this
gold standard data, the research engaged citizen
scientists, in the form of the online “crowd”.
However, methods for creating and using such user
assessments for readability are rudimentary. The research
therefore investigated,developed and applied a number of methods
for collecting user ratings of readability in an online
environment. Also, the research applied machine learning to
investigate and identify linguistic factors that are specifically
associated with language difficulty of legislative sentences.
This resulted in recommendations for improving legislative
readability. A parallel line of investigation concerned the
application of visualization to enhance the communication of law.
Visualization engages human visual perception and its parallel
processing capacities for the communication of law. The
research applied computational tools: natural language
processing, graph characteristics and data driven algorithms.
It resulted in prototype tools for automatically visualizing
definition networks and automating the visualization of selected
contract clauses. Also, the work has fostered an investigation
of the nature of law itself. A “law as” framework is used to
query the nature of law and illuminate law in new ways. The
framework is re-assessed as a tool for the experimental
investigation of law. This results in an enhanced description of
law, applying a number of investigatory frames:law;
communication; document; information; computation; design
and complex systems theory. It also provides a
contrastive study with traditional theories of law -
demonstrating how traditional theories can be extended in
the light of these multidisciplinary results. In sum, this
thesis reports a body of work advancing the existing
knowledge base and state of the art in respect of
application of computational techniques to enhancing the
communication of law