Search CORE

395 research outputs found

Structural Features for Predicting the Linguistic Quality of Text: Applications to Machine Translation, Automatic Summarization and Human-Authored Text

Author: Chae Jieun
Louis Annie
Nenkova Ani
Pitler Emily
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Manual assessments of sentence fluency for machine translation evaluation and text quality for summarization evaluation are used as gold-standard. We find that many structural features related to phrase length are weakly but significantly correlated with fluency and classifiers based on the entire suite of structural features can achieve high accuracy in pairwise comparison of sentence fluency and in distinguishing machine translations from human translations. We also test the hypothesis that the learned models capture general fluency properties applicable to human-authored text. The results from our experiments do not support the hypothesis. At the same time structural features and models based on them prove to be robust for automatic evaluation of the linguistic quality of multi-document summaries

ScholarlyCommons@Penn

Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

Author: Lamia Hadrich Belguith
Maher Jaoua
Samira Ellouze
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2017
Field of study

In this article, we propose a method of text summary\u27s content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Author: Gatt Albert
Krahmer Emiel
Publication venue
Publication date: 01/01/2017
Field of study

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

OAR@UM

Tilburg University Repository

A prior case study of natural language processing on different domain

Author: J. Shruthi
Swamy Suma
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2020
Field of study

In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model

ZENODO

Institute of Advanced Engineering and Science

NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

Author: Jha Rahul Kumar
Publication venue
Publication date: 01/01/2015
Field of study

This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

Deep Blue Documents at the University of Michigan

READ-IT: assessing readability of Italian texts with a view to text simplification

Author: Dell\u27Orletta Felice
Montemagni Simonetta
Venturi Giulia
Publication venue: Association for Computational Linguistics Stroudsburg, PA, USA
Publication date
Field of study

In this paper, we propose a new approach to readability assessment with a specific view to the task of text simplification: the intended audience includes people with low literacy skills and/or with mild cognitive impairment. READ-IT represents the first advanced readability assessment tool for what concerns Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences where the latter represents an important novelty of the proposed approach creating the prerequisites for aligning the readability assessment step with the text simplification process. READ-IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario

PUblication MAnagement

STYLENE : an environment for stylometry and readability research for Dutch

Author: Daelemans Walter
De Clercq Orphée
Hoste Veronique
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/01/2017
Field of study

We describe an educational demonstration interface and tools for stylometry (authorship attribution and profiling) and readability research for Dutch. The Stylene system consists of a popularisation interface for learning about stylometric analysis, and of web-based interfaces to software for readability and stylometry research aimed at researchers from the humanities and social sciences who do not want to develop or install such software themselves

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Semantification of text through summarisation

Author: Joshi Monika
Publication venue
Publication date: 01/03/2019
Field of study

Ulster University's Research Portal

Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

Author: Lamia Hadrich Belguith
Maher Jaoua
Samira Ellouze
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date
Field of study

Crossref