257 research outputs found
Readability assessment and automatic text simplification, the analysis of basque complex structures
301 p.(eus); 217 (eng)Tesi-lan honetan, euskarazko testuen konplexutasuna eta sinplifikazioa automatikoki aztertzeko lehen urratsak egin ditugu. Testuen konplexutasuna aztertzeko, testuen sinplifikazio automatikoa helburu duten beste hizkuntzetako lanetan eta euskarazko corpusetan egindako azterketa linguistikoan oinarritu gara. Azterketa horietatik testuak automatikoki sinplifikatzeko oinarri linguistikoak ezarri ditugu. Konplexutasuna automatikoki analizatzeko, ezaugarri linguistikoetan eta ikasketa automatikoko tekniketan oinarrituta ErreXail sistema sortu eta inplementatu dugu.Horretaz gain, testuak automatikoki sinplifikatuko dituen Euskarazko Testuen Sinplifikatzailea (EuTS) sistemaren arkitektura diseinatu dugu, sistemaren modulu bakoitzean egingo diren eragiketak definituz eta, kasu-azterketa bezala,informazio biografikoa duten egitura parentetikoak sinplifikatuko dituen Biografix tresna eleaniztuna inplementatuz.Amaitzeko, Euskarazko Testu Sinplifikatuen Corpusa (ETSC) corpusa osatu dugu. Corpus hau baliatu dugu gure sinplifikaziorako azterketetatik ateratako hurbilpena beste batzuekin erkatzeko. Konparazio horiek egiteko, etiketatze-eskema bat ere definitu dugu
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
Design and Annotation of the First Italian Corpus for Text Simplification
In this paper, we present design and construction of the first Italian corpus for automatic and semi--automatic text simplification. In line with current approaches, we propose a new annotation scheme specifically conceived to identify the typology of changes an original sentence undergoes when it is manually simplified. Such a scheme has been applied to two aligned Italian corpora, containing original texts with corresponding simplified versions, selected as representative of two different manual simplification strategies and addressing different target reader populations. Each corpus was annotated with the operations foreseen in the annotation scheme, covering different levels of linguistic description. Annotation results were analysed with the final aim of capturing peculiarities and differences of the different simplification strategies pursued in the two corpora
Sentence Simplification for Text Processing
A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which
affect the ability of humans and machines to process them effectively. In this
thesis, I present a new approach to automatic sentence simplification which processes
sentences containing compound clauses and complex noun phrases (NPs)
and converts them into sequences of simple sentences which contain fewer of these
constituents and have reduced per sentence propositional density and syntactic
complexity.
My overall approach is iterative and relies on both machine learning and handcrafted
rules. It implements a small set of sentence transformation schemes, each
of which takes one sentence containing compound clauses or complex NPs and
converts it one or two simplified sentences containing fewer of these constituents
(Chapter 5). The iterative algorithm applies the schemes repeatedly and is able
to simplify sentences which contain arbitrary numbers of compound clauses and
complex NPs. The transformation schemes rely on automatic detection of these
constituents, which may take a variety of forms in input sentences. In the thesis, I
present two new shallow syntactic analysis methods which facilitate the detection
process.
The first of these identifies various explicit signs of syntactic complexity in
input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and
evaluate this sign tagger (Chapter 2) and the machine learning method used to
implement it (Chapter 3). The second syntactic analysis method exploits the sign
tagger and identifies the spans of compound clauses and complex NPs in input
sentences. In Chapter 4 of the thesis, I describe the development and evaluation
of a machine learning approach performing this task. This chapter also presents
a new annotated dataset supporting this activity.
In the thesis, I present two implementations of my approach to sentence simplification.
One of these exploits handcrafted rule activation patterns to detect
different parts of input sentences which are relevant to the simplification process.
The other implementation uses my machine learning method to identify
compound clauses and complex NPs for this purpose.
Intrinsic evaluation of the two implementations is presented in Chapter 6 together
with a comparison of their performance with several baseline systems. The
evaluation includes comparisons of system output with human-produced simplifications,
automated estimations of the readability of system output, and surveys
of human opinions on the grammaticality, accessibility, and meaning of automatically
produced simplifications.
Chapter 7 presents extrinsic evaluation of the sentence simplification method
exploiting handcrafted rule activation patterns. The extrinsic evaluation involves
three NLP tasks: multidocument summarisation, semantic role labelling, and information
extraction. Finally, in Chapter 8, conclusions are drawn and directions
for future research considered
New Data-Driven Approaches to Text Simplification
A thesis submitted in partial fulfilment of the requirements of the University of
Wolverhampton for the degree of Doctor of PhilosophyMany texts we encounter in our everyday lives are lexically and syntactically very complex. This makes them difficult to understand for people with intellectual or reading impairments, and difficult for various natural language processing systems to process. This motivated the need for text simplification (TS) which transforms texts into their simpler variants. Given that this is still a relatively new research area, many challenges are still remaining. The focus of this thesis is on better understanding the current problems in automatic text simplification (ATS) and proposing new data-driven approaches to solving them. We propose methods for learning sentence splitting and deletion decisions, built upon parallel corpora of original and manually simplified Spanish texts, which outperform the existing similar systems. Our experiments in adaptation of those methods to different text genres and target populations report promising results, thus offering one possible solution for dealing with the scarcity of parallel corpora for text simplification aimed at specific target populations, which is currently one of the main issues in ATS. The results of our extensive analysis of the phrase-based statistical machine translation (PB-SMT) approach to ATS reject the widespread assumption that the success of that approach largely depends on the size of the training and development datasets. They indicate more influential factors for the success of the PB-SMT approach to ATS, and reveal some important differences between cross-lingual MT and the monolingual v MT used in ATS. Our event-based system for simplifying news stories in English (EventSimplify) overcomes some of the main problems in ATS. It does not require a large number of handcrafted simplification rules nor parallel data, and it performs significant content reduction. The automatic and human evaluations conducted show that it produces grammatical text and increases readability, preserving and simplifying relevant content and reducing irrelevant content. Finally, this thesis addresses another important issue in TS which is how to automatically evaluate the performance of TS systems given that access to the target users might be difficult. Our experiments indicate that existing readability metrics can successfully be used for this task when enriched with human evaluation of grammaticality and preservation of meaning
Natural Language Processing and Language Technologies for the Basque Language
The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years. In order to develop advanced systems that are considered the basics for an efficient digital communication (e.g. machine translation systems, text-to-speech and speech-to-text converters and digital assistants), it is necessary to digitalise linguistic resources and create tools. In the case of Basque, scholars have studied the creation of digital linguistic resources and the tools that allow the development of those systems for the last forty years. In this paper, we present an overview of the natural language processing and language technology resources developed for Basque, their impact in the process of making Basque a “digital language” and the applications and challenges in multilingual communication. More precisely, we present the well-known products for Basque, the basic tools and the resources that are behind the products we use every day. Likewise, we would like that this survey serves as a guide for other minority languages that are making their way to digitalisation.
Received: 05 April 2022
Accepted: 20 May 202
Natural language generation as neural sequence learning and beyond
Natural Language Generation (NLG) is the task of generating natural language (e.g.,
English sentences) from machine readable input. In the past few years, deep neural networks
have received great attention from the natural language processing community
due to impressive performance across different tasks. This thesis addresses NLG problems
with deep neural networks from two different modeling views. Under the first
view, natural language sentences are modelled as sequences of words, which greatly
simplifies their representation and allows us to apply classic sequence modelling neural
networks (i.e., recurrent neural networks) to various NLG tasks. Under the second
view, natural language sentences are modelled as dependency trees, which are more expressive
and allow to capture linguistic generalisations leading to neural models which
operate on tree structures.
Specifically, this thesis develops several novel neural models for natural language
generation. Contrary to many existing models which aim to generate a single sentence,
we propose a novel hierarchical recurrent neural network architecture to represent and
generate multiple sentences. Beyond the hierarchical recurrent structure, we also propose
a means to model context dynamically during generation. We apply this model to
the task of Chinese poetry generation and show that it outperforms competitive poetry
generation systems.
Neural based natural language generation models usually work well when there is
a lot of training data. When the training data is not sufficient, prior knowledge for the
task at hand becomes very important. To this end, we propose a deep reinforcement
learning framework to inject prior knowledge into neural based NLG models and apply
it to sentence simplification. Experimental results show promising performance using
our reinforcement learning framework.
Both poetry generation and sentence simplification are tackled with models following
the sequence learning view, where sentences are treated as word sequences. In this
thesis, we also explore how to generate natural language sentences as tree structures.
We propose a neural model, which combines the advantages of syntactic structure and
recurrent neural networks. More concretely, our model defines the probability of a
sentence by estimating the generation probability of its dependency tree. At each time
step, a node is generated based on the representation of the generated subtree. We
show experimentally that this model achieves good performance in language modeling
and can also generate dependency trees
Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications
Reading plays an important role in the process of learning and knowledge acquisition
for both children and adults. However, not all texts are accessible to every
prospective reader. Reading difficulties can arise when there is a mismatch between
a reader’s language proficiency and the linguistic complexity of the text
they read. In such cases, simplifying the text in its linguistic form while retaining
all the content could aid reader comprehension. In this thesis, we study text
complexity and simplification from a computational linguistic perspective.
We propose a new approach to automatically predict the text complexity using
a wide range of word level and syntactic features of the text. We show that this
approach results in accurate, generalizable models of text readability that work
across multiple corpora, genres and reading scales. Moving from documents to
sentences, We show that our text complexity features also accurately distinguish
different versions of the same sentence in terms of the degree of simplification
performed. This is useful in evaluating the quality of simplification performed by
a human expert or a machine-generated output and for choosing targets to simplify
in a difficult text. We also experimentally show the effect of text complexity on
readers’ performance outcomes and cognitive processing through an eye-tracking
experiment.
Turning from analyzing text complexity and identifying sentential simplifications
to generating simplified text, one can view automatic text simplification as a
process of translation from English to simple English. In this thesis, we propose
a statistical machine translation based approach for text simplification, exploring
the role of focused training data and language models in the process.
Exploring the linguistic complexity analysis further, we show that our text
complexity features can be useful in assessing the language proficiency of English
learners. Finally, we analyze German school textbooks in terms of their
linguistic complexity, across various grade levels, school types and among different
publishers by applying a pre-existing set of text complexity features developed
for German
Analyzing, enhancing, optimizing and applying dependency analysis
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Ingeniería del Software e Inteligencia Artificial, leída el 19/12/2012Los analizadores de dependencias estadísticos han sido mejorados en gran medida durante los últimos años. Esto ha sido posible gracias a los sistemas basados en aprendizaje automático que muestran una gran precisión. Estos sistemas permiten la generación de parsers para idiomas en los que se disponga de un corpus adecuado sin causar, para ello, un gran esfuerzo en el usuario final. MaltParser es uno de estos sistemas. En esta tesis hemos usado sistemas del estado del arte, para mostrar una serie de contribuciones completamente relacionadas con el procesamiento de lenguaje natural (PLN) y análisis de dependencias: (i) Estudio del problema del análisis de dependencias demostrando la homogeneidad en la precisión y mostrando contribuciones interesantes sobre la longitud de las frases, el tamaño de los corpora de entrenamiento y como evaluamos los parsers. (ii) Hemos estudiado además algunas maneras de mejorar la precisión modificando el flujo de análisis de dos maneras distintas, analizando algunos segmentos de las frases de manera separada, y modificando el comportamiento interno de los algoritmos de parsing. (iii) Hemos investigado la selección automática de atributos para aprendizaje máquina para analizadores de dependencias basados en transiciones que consideramos un importante problema y algo que realmente es necesario resolver dado el estado de la cuestión, ya que además puede servir para resolver de mejor manera tareas relacionadas con el análisis de dependencias. (iv) Finalmente, hemos aplicado el análisis de dependencias para resolver algunos problemas, hoy en día importantes, para el procesamiento de lenguage natural (PLN) como son la simplificación de textos o la inferencia del alcance de señales de negación. Por último, añadir que el conocimiento adquirido en la realización de esta tesis puede usarse para implementar aplicaciones basadas en análisis de dependencias más robustas en PLN o en otras áreas relacionadas, como se demuestra a lo largo de la tesis.
[ABSTRACT] Statistical dependency parsing accuracy has been improved substantially during the last years. One of the main reasons is the inclusion of data- driven (or machine learning) based methods. Machine learning allows the development of parsers for every language that has an adequate training corpus without requiring a great effort. MaltParser is one of such systems. In the present thesis we have used state of the art systems (mainly Malt- Parser), to show some contributions in four different areas inherently related to natural language processing (NLP) and dependency parsing: (i) We stu- died the parsing problem demonstrating the homogeneity of the performance and showing interesting contributions about sentence length, corpora size and how we normally evaluate the parsers. (ii) We have also tried some ways of improving the parsing accuracy by modifying the flow of analysis, parsing some segments of the sentences separately by finally constructing a parsing combination problem. We also studied the modification of the inter- nal behavior of the parsers focusing on the root of dependency structures, which is an important part of what a dependency parser parses and worth studying. (iii) We have researched automatic feature selection and parsing optimization for transition based parsers which we consider an important problem and something that definitely needs to be done in dependency par- sing in order to solve parsing problems in a more successful way. And (iv) we have applied syntactic dependency structures and dependency parsing to solve some Natural Language Processing (NLP) problems such as text simplification and inferring the scope of negation cues. Furthermore, the knowledge acquired when developing this thesis could be used to implement more robust dependency parsing–based applications in different NLP (or related) areas, as we demonstrate in the present thesis.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu
- …