2,134 research outputs found

    A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law

    Get PDF
    The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities ofthe platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences)

    Citizen Science for Citizen Access to Law

    Get PDF
    This papers sits at the intersection of citizen access to law, legal informatics and plain language. The paper reports the results of a joint project of the Cornell University Legal Information Institute and the Australian National University which collected thousands of crowdsourced assessments of the readability of law through the Cornell LII site. The aim of the project is to enhance accuracy in the prediction of the readability of legal sentences. The study requested readers on legislative pages of the LII site to rate passages from the United States Code and the Code of Federal Regulations and other texts for readability and other characteristics. The research provides insight into who uses legal rules and how they do so. The study enables conclusions to be drawn as to the current readability of law and spread of readability among legal rules. The research is intended to enable the creation of a dataset of legal rules labelled by human judges as to readability. Such a dataset, in combination with machine learning, will assist in identifying factors in legal language which impede readability and access for citizens. As far as we are aware, this research is the largest ever study of readability and usability of legal language and the first research which has applied crowdsourcing to such an investigation. The research is an example of the possibilities open for enhancing access to law through engagement of end users in the online legal publishing environment for enhancement of legal accessibility and through collaboration between legal publishers and researchers

    Legislative Language For Success

    Get PDF
    Legislative committee meetings are an integral part of the lawmaking process for local and state bills. The testimony presented during these meetings is a large factor in the outcome of the proposed bill. This research uses Natural Language Processing and Machine Learning techniques to analyze testimonies from California Legislative committee meetings from 2015-2016 in order to identify what aspects of a testimony makes it successful. A testimony is considered successful if the alignment of the testimony matches the bill outcome (alignment is For and the bill passes or alignment is Against and the bill fails). The process of finding what makes a testimony successful was accomplished through data filtration, feature extraction, implementation of classification models, and feature analysis. Several features were extracted and tested to find those that had the greatest impact on the bill outcome. The features chosen provided information on the sentence complexity and type of words used (adjective, verb, nouns) for each testimony. Additionally all the testimonies were analyzed to find common phrases used within successful testimonies. Two types of classification models were implemented: ones that used the manually extracted feature as input and ones that used their own feature extraction process. The results from the classification models and feature analysis show that certain aspects within a testimony such as sentence complexity and using specific phrases significantly impact the bill outcome. The most successful models, Support Vector Machine and Multinomial Naive Bayes, achieved an accuracy of 91.79\% and 91.22\% respectivel

    Neural Discourse Structure for Text Categorization

    Full text link
    We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization. Our approach uses a recursive neural network and a newly proposed attention mechanism to compute a representation of the text that focuses on salient content, from the perspective of both RST and the task. Experiments consider variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio

    Machine Learning for Readability Assessment and Text Simplification in Crisis Communication: A Systematic Review

    Get PDF
    In times of social media, crisis managers can interact with the citizens in a variety of ways. Since machine learning has already been used to classify messages from the population, the question is, whether such technologies can play a role in the creation of messages from crisis managers to the population. This paper focuses on an explorative research revolving around selected machine learning solutions for crisis communication. We present systematic literature reviews of readability assessment and text simplification. Our research suggests that readability assessment has the potential for an effective use in crisis communication, but there is a lack of sufficient training data. This also applies to text simplification, where an exact assessment is only partly possible due to unreliable or non-existent training data and validation measures

    Automated Readability Assessment for Spanish e-Government Information

    Get PDF
    This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government"s websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.Work supported by the Spanish Ministry of Economy, Industry and Competitiveness, (CSO2017-86747-R)

    The Impact of Cross-References on the Readability of the U.S. Internal Revenue Code

    Get PDF
    Scholars and practitioners have long argued that U.S. income tax law (“the Tax Code”) is excessively complex and difficult to understand, and hence imposes non-trivial adjudication, administration, planning, and compliance costs across the spectrum of income tax stakeholders: the courts, the Internal Revenue Service, tax practitioners, business managers, and individual taxpayers. Hence, there is considerable interest in reducing the effort needed to accurately understand and apply the provisions of income tax law. Prior scholarly work has strongly argued that exceptions to Tax Code provisions as expressed by cross-references embedded in the Tax Code text constitute a major source of reading complexity. The goal of the study was to gain a first empirical understanding about the readability impacts on users who encounter cross-references while reading Tax Code provisions. The study included a human subjects task performance experiment with 75 undergraduate and graduate accounting student participants who were completing or had completed an introductory level course in federal income taxation. Participants were presented with integrated tax scenarios and accompanying sets of scenario questions. Copies of several Tax Code sections were the only reference materials available to the study participants. The study was based on a within-subjects experimental design. To investigate the prior work argument, cross-references embedded in the Tax Code reference materials provided to study participants that expressed exceptions were all assigned to one cross-reference category, and all other cross-references that served different purposes were assigned to a second category. As responses to scenario questions were binary (correct/incorrect), logistic regression was used to test study hypotheses. The study’s major finding was that reading cross-references assigned to the exceptions category had a very strong negative effect on task performance, while reading cross-references assigned to the second category had a modest positive effect on task performance. The finding thus supports decades of analysis and argument that cross-references related to expressing exceptions are a major source of Tax Code reading complexity. This outcome warrants further research into statutory exception language, that subset of statutory language used to express exceptions. Such a subset will include cross-references as one of many language elements that are available for the purpose of expressing exceptions
    corecore