2,278 research outputs found
Readability in the Canadian Tax System
This paper reports the results of a readability analysis of various parts of the Canadian tax system, with a particular focus on Canada’s income tax. The results indicate that Canada’s Income Tax Act is significantly more difficult to read than the taxation statutes of several comparable jurisdictions and more difficult to read than other Canadian legislation governing economic relationships. The guidance published by the Canada Revenue Agency for the use of tax professionals and the public appears more accessible. While it may be hoped that the statutory provisions that apply to low- and middle-income individuals would be more readable than the Income Tax Act as whole, the study found evidence to the contrary. Although the readability of the statute can only be one part of a program toward making the tax system more accessible, this paper argues that it is a project worth pursuing
Exploring Linguistic Features for Turkish Text Readability
This paper presents the first comprehensive study on automatic readability
assessment of Turkish texts. We combine state-of-the-art neural network models
with linguistic features at lexical, morphosyntactic, syntactic and discourse
levels to develop an advanced readability tool. We evaluate the effectiveness
of traditional readability formulas compared to modern automated methods and
identify key linguistic features that determine the readability of Turkish
texts
A Web-Based Medical Text Simplification Tool
With the increasing demand for improved health literacy, better tools are needed to produce personalized health information efficiently that is readable and understandable by the patient. In this paper, we introduce a web-based text simplification tool that helps content-producers simplify existing text materials to make them more broadly accessible. The tool uses features that provide concrete suggestions and all features have been shown individually to improve the understandability of text in previous research. We provide an overview of the tool along with a quantitative analysis of the impact on medical texts. On a medical corpus, the tool provides good coverage with suggestions on over a third of the words and over a third of the sentences. These suggestions are over 40% accurate, though the accuracy varies by text source
Into the Vortex of Legal Precision : Access to Justice, Complexity, and the Canadian Tax System
This thesis is an exploration of access to justice issues in the Canadian tax system. Drawing on the work of Roderick Macdonald, it argues for a broad conception of access to justice based on the empowerment of individuals in all of the sites, processes, institutions where law is made, administered, and applied. It argues that tax law shows the usefulness of this comprehensive approach to access to justice. Using the comprehensive approach to access to justice, the thesis goes on to argue that legal complexity should be seen as an important access to justice issue in tax law. It lays out a pragmatic, access to justice-oriented framework for talking about complexity in tax law. Applying this framework, it examines the sources of complexity in Canadian tax law and suggests a path toward simplification. The final chapters of the thesis contain examples of the type of access to justice-oriented research and advocacy that these frameworks facilitate
DEPLAIN: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification
Text simplification is an intralingual translation task in which documents,
or sentences of a complex source text are simplified for a target audience. The
success of automatic text simplification systems is highly dependent on the
quality of parallel data used for training and evaluation. To advance sentence
simplification and document simplification in German, this paper presents
DEplain, a new dataset of parallel, professionally written and manually aligned
simplifications in plain German ("plain DE" or in German: "Einfache Sprache").
DEplain consists of a news domain (approx. 500 document pairs, approx. 13k
sentence pairs) and a web-domain corpus (approx. 150 aligned documents, approx.
2k aligned sentence pairs). In addition, we are building a web harvester and
experimenting with automatic alignment methods to facilitate the integration of
non-aligned and to be published parallel documents. Using this approach, we are
dynamically increasing the web domain corpus, so it is currently extended to
approx. 750 document pairs and approx. 3.5k aligned sentence pairs. We show
that using DEplain to train a transformer-based seq2seq text simplification
model can achieve promising results. We make available the corpus, the adapted
alignment methods for German, the web harvester and the trained models here:
https://github.com/rstodden/DEPlain.Comment: Accepted to ACL 202
Sentence Simplification for Text Processing
A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which
affect the ability of humans and machines to process them effectively. In this
thesis, I present a new approach to automatic sentence simplification which processes
sentences containing compound clauses and complex noun phrases (NPs)
and converts them into sequences of simple sentences which contain fewer of these
constituents and have reduced per sentence propositional density and syntactic
complexity.
My overall approach is iterative and relies on both machine learning and handcrafted
rules. It implements a small set of sentence transformation schemes, each
of which takes one sentence containing compound clauses or complex NPs and
converts it one or two simplified sentences containing fewer of these constituents
(Chapter 5). The iterative algorithm applies the schemes repeatedly and is able
to simplify sentences which contain arbitrary numbers of compound clauses and
complex NPs. The transformation schemes rely on automatic detection of these
constituents, which may take a variety of forms in input sentences. In the thesis, I
present two new shallow syntactic analysis methods which facilitate the detection
process.
The first of these identifies various explicit signs of syntactic complexity in
input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and
evaluate this sign tagger (Chapter 2) and the machine learning method used to
implement it (Chapter 3). The second syntactic analysis method exploits the sign
tagger and identifies the spans of compound clauses and complex NPs in input
sentences. In Chapter 4 of the thesis, I describe the development and evaluation
of a machine learning approach performing this task. This chapter also presents
a new annotated dataset supporting this activity.
In the thesis, I present two implementations of my approach to sentence simplification.
One of these exploits handcrafted rule activation patterns to detect
different parts of input sentences which are relevant to the simplification process.
The other implementation uses my machine learning method to identify
compound clauses and complex NPs for this purpose.
Intrinsic evaluation of the two implementations is presented in Chapter 6 together
with a comparison of their performance with several baseline systems. The
evaluation includes comparisons of system output with human-produced simplifications,
automated estimations of the readability of system output, and surveys
of human opinions on the grammaticality, accessibility, and meaning of automatically
produced simplifications.
Chapter 7 presents extrinsic evaluation of the sentence simplification method
exploiting handcrafted rule activation patterns. The extrinsic evaluation involves
three NLP tasks: multidocument summarisation, semantic role labelling, and information
extraction. Finally, in Chapter 8, conclusions are drawn and directions
for future research considered
Measuring and improving the readability of network visualizations
Network data structures have been used extensively for modeling entities and their ties across such diverse disciplines as Computer Science, Sociology, Bioinformatics, Urban Planning, and Archeology. Analyzing networks involves understanding the complex relationships between entities as well as any attributes, statistics, or groupings associated with them. The widely used node-link visualization excels at showing the topology, attributes, and groupings simultaneously. However, many existing node-link visualizations are difficult to extract meaning from because of (1) the inherent complexity of the relationships, (2) the number of items designers try to render in limited screen space, and (3) for every network there are many potential unintelligible or even misleading visualizations. Automated layout algorithms have helped, but frequently generate ineffective visualizations even when used by expert analysts. Past work, including my own described herein, have shown there can be vast improvements in network visualizations, but no one can yet produce readable and meaningful visualizations for all networks.
Since there is no single way to visualize all networks effectively, in this dissertation I investigate three complimentary strategies. First, I introduce a technique called motif simplification that leverages the repeating patterns or motifs in a network to reduce visual complexity. I replace common, high-payoff motifs with easily understandable glyphs that require less screen space, can reveal otherwise hidden relationships, and improve user performance on many network analysis tasks. Next, I present new Group-in-a-Box layouts that subdivide large, dense networks using attribute- or topology-based groupings. These layouts take group membership into account to more clearly show the ties within groups as well as the aggregate relationships between groups. Finally, I develop a set of readability metrics to measure visualization effectiveness and localize areas needing improvement. I detail optimization recommendations for specific user tasks, in addition to leveraging the readability metrics in a user-assisted layout optimization technique.
This dissertation contributes an understanding of why some node-link visualizations are difficult to read, what measures of readability could help guide designers and users, and several promising strategies for improving readability which demonstrate that progress is possible. This work also opens several avenues of research, both technical and in user education
Natural Language Processing for Technology Foresight Summarization and Simplification: the case of patents
Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring.
Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon.
This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning).
We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless.
We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain.
We also explore transferring summarization methods from the scientific paper domain with limited success.
For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model.
This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts.Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring.
Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon.
This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning).
We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless.
We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain.
We also explore transferring summarization methods from the scientific paper domain with limited success.
For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model.
This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts
- …