16 research outputs found
Visualising a Visual Movement – Reflections on a Growing Body of Research
In the two editions of the Journal of Open Access to Law dedi- cated to Visual Law we traverse a delightful panorama. We observe a diverse and maturing body, not only of scholarship, but also of prac- tical application of “Visual Law”. Closely allied to it are the themes and disciplines of Legal Design which are woven through much of its unfoldment
Corpus based classification of text in Australian contracts
Written contracts are a fundamental
framework for commercial and cooperative transactions and relationships. Limited research has been published on the application of machine learning and natural language processing (NLP) to contracts.
In this paper we report the classification of components of contract texts using machine learning and hand-coded methods.
Authors studying a range of domains have found that combining machine learning and rule based approaches increases accuracy of machine learning. We find similar results which suggest the utility of considering leveraging hand coded classification rules for machine learning. We attained an average accuracy of 83.48% on a multiclass labelling task on 20 contracts combining machine learning and rule based approaches, increasing performance over machine learning alone
A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law
The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities ofthe platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences)
Enhancing the Communication of Law: a cross-disciplinary investigation applying information technology
Law is pervasive in culture. It is a form of
communication between government and citizens. When effective,
it is a tool of government policy. If poorly designed,law
results in unnecessary costs to society. Impediments to
understanding of the law limits and distorts democratic
participation. Yet, historically, the law has been
inaccessible to most. Thus enhancing the communication of
law is an important and standing problem. Much work has
been done (for example through the plain language
movement) to improve the communication of law.
Nonetheless, the law remains largely unreadable to non-legal
users. This thesis applies information technology to investigate
and enhance the communication of law. To this end, this thesis
focusses on four main areas.To improve the readability of law, it
must be better described as a form of language. Corpus
linguistics is applied for this purpose. A linguistic
description of contract language arose from this work, which,
along with the corpus itself, has been made available to the
research community. The thesis also describes work for the
automatic classification of text in legal contracts by legal
function.Reliable measures for the readability of law are needed,
but they do not exist. To develop such measures, gold standard
data is needed to evaluate possible measures.To create this
gold standard data, the research engaged citizen
scientists, in the form of the online “crowd”.
However, methods for creating and using such user
assessments for readability are rudimentary. The research
therefore investigated,developed and applied a number of methods
for collecting user ratings of readability in an online
environment. Also, the research applied machine learning to
investigate and identify linguistic factors that are specifically
associated with language difficulty of legislative sentences.
This resulted in recommendations for improving legislative
readability. A parallel line of investigation concerned the
application of visualization to enhance the communication of law.
Visualization engages human visual perception and its parallel
processing capacities for the communication of law. The
research applied computational tools: natural language
processing, graph characteristics and data driven algorithms.
It resulted in prototype tools for automatically visualizing
definition networks and automating the visualization of selected
contract clauses. Also, the work has fostered an investigation
of the nature of law itself. A “law as” framework is used to
query the nature of law and illuminate law in new ways. The
framework is re-assessed as a tool for the experimental
investigation of law. This results in an enhanced description of
law, applying a number of investigatory frames:law;
communication; document; information; computation; design
and complex systems theory. It also provides a
contrastive study with traditional theories of law -
demonstrating how traditional theories can be extended in
the light of these multidisciplinary results. In sum, this
thesis reports a body of work advancing the existing
knowledge base and state of the art in respect of
application of computational techniques to enhancing the
communication of law
Citizen Science for Citizen Access to Law
This papers sits at the intersection of citizen access to law, legal informatics and plain language. The paper reports the results of a joint project of the Cornell University Legal Information Institute and the Australian National University which collected thousands of crowdsourced assessments of the readability of law through the Cornell LII site. The aim of the project is to enhance accuracy in the prediction of the readability of legal sentences. The study requested readers on legislative pages of the LII site to rate passages from the United States Code and the Code of Federal Regulations and other texts for readability and other characteristics. The research provides insight into who uses legal rules and how they do so. The study enables conclusions to be drawn as to the current readability of law and spread of readability among legal rules. The research is intended to enable the creation of a dataset of legal rules labelled by human judges as to readability. Such a dataset, in combination with machine learning, will assist in identifying factors in legal language which impede readability and access for citizens. As far as we are aware, this research is the largest ever study of readability and usability of legal language and the first research which has applied crowdsourcing to such an investigation. The research is an example of the possibilities open for enhancing access to law through engagement of end users in the online legal publishing environment for enhancement of legal accessibility and through collaboration between legal publishers and researchers
A Right to Access Implies a Right to Know: An Open Online Platform for Research on the Readability of Law
The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities ofthe platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences)
A Corpus of Australian Contract Language
Written contracts are a fundamental framework for economic and cooperative transactions in society. Little work has been reported on the application of natural language processing or corpus linguistics to contracts. In this paper we report the design, profiling and initial analysis of a corpus of Australian contract language. This corpus enables a quantitative and qualitative characterisation of Australian contract language as an input to the development of contract drafting tools. Profiling of the corpus is consistent with its suitability for use in language engineering applications. We provide descriptive statistics for the corpus and show that document length and document vocabulary size approximate to log normal distributions. The corpus conforms to Zipf's law and comparative type to token ratios are consistent with lower term sparsity (an expectation for legal language). We highlight distinctive term usage in Australian contract language. Results derived from the corpus indicate a longer prepositional phrase depth in sentences in contract rules extracted from the corpus, as compared to other corpora
Software Tools for the Visualization of Definition Networks in Legal Contracts
This paper describes the development of prototype software-based tools for visualizing definitions within legal contracts. The tools demonstrate visualization techniques for enhancing the readability and comprehension of definitions and their associated