Search CORE

885 research outputs found

Exercises in computational linguistics

Author: Brandt Corstius H. (Hugo)
Publication venue
Publication date: 01/01/1970
Field of study

CWI's Institutional Repository

Is sentence compression an NLG task?

Author: Daelemans W.
Hendrickx I.
Krahmer E.J.
Marsi E.C.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Tilburg University Repository

Experiments with and Implementation of a Context Sensitive Text Summarizer

Author: Bocage Charles Joseph
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2016
Field of study

Automatic text summarization is the ability to obtain key ideas from a text passage using as few words as possible. With the increase in data on the web, manual summarization of web pages has become unfeasible, and the need for automatic text summarization has become ever greater. This project explored and implemented various parts of the automatic text summarization process for an open source search engine, Yioop. These parts included stemming, text segmentation, term frequency weighting, automatic sentence compression, and content management system detection. In addition, experiments were conducted on different pre-existing Yioop summarizers. These results served as a baseline for comparison with results obtained from two new ways to generate summaries which we implemented: A graph based approach and an average sentence approach. Summaries were evaluated using Recall-Oriented Understudy for Gisting Evaluation (ROUGE). Analyzing the ROUGE results of each summarizer showed that the new summarizers did not produce better summaries than Yioop’s pre-existing summarizers. During the course of conducting these experiments, it was noted that the location of useful information on a web page could often be obtained if one could determine the content management system that created the web page. An extensible detector for the content management system was written for the Yioop search engine. ROUGE results using this system were recomputed for the various summarizers. Using the content management system detector resulted in a ten to twenty percent increase in ROUGE scores across various page experiments

SJSU ScholarWorks

Efficient annotated terms

Author: H. A. de Jong
M. G. J. van den Brand
P. A. Olivier
P. Klint
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Introduction

Author: J. Belder De
J.F. Gemmeke
N. Oostdijk
V. Vandeghinste
Publication venue: Springer Berlin Heidelberg
Publication date
Field of study

Crossref

Springer - Publisher Connector

Inferring Morphological Rules from Small Examples using 0/1 Linear Programming

Author: Claessen Koen
Lilliestr\uf6m Ann
Smallbone Nicholas
Publication venue
Publication date: 01/01/2019
Field of study

We show how to express the problem of finding an optimal morpheme segmentation from a set of labelled words as a 0/1 linear programming problem, and how to build on this to analyse a language’s morphology. The result is an automatic method for segmentation and labelling that works well even when there is very little training data available

Chalmers Research

Exercises in computational linguistics

Author: Brandt Corstius H. (Hugo)
Publication venue: Centrum Voor Wiskunde en Informatica
Publication date: 01/01/1970
Field of study

CWI's Institutional Repository

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

Author: Peter Spyns Jan Odijk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

Directory of Open Access Books (DOAB)

Automatic Text Summarization

Author: Saraiva João Pedro Abrantes
Publication venue
Publication date: 20/07/2021
Field of study

Writing text was one of the first ever methods used by humans to represent their knowledge. Text can be of different types and have different purposes. Due to the evolution of information systems and the Internet, the amount of textual information available has increased exponentially in a worldwide scale, and many documents tend to have a percentage of unnecessary information. Due to this event, most readers have difficulty in digesting all the extensive information contained in multiple documents, produced on a daily basis. A simple solution to the excessive irrelevant information in texts is to create summaries, in which we keep the subject’s related parts and remove the unnecessary ones. In Natural Language Processing, the goal of automatic text summarization is to create systems that process text and keep only the most important data. Since its creation several approaches have been designed to create better text summaries, which can be divided in two separate groups: extractive approaches and abstractive approaches. In the first group, the summarizers decide what text elements should be in the summary. The criteria by which they are selected is diverse. After they are selected, they are combined into the summary. In the second group, the text elements are generated from scratch. Abstractive summarizers are much more complex so they still need a lot of research, in order to represent good results. During this thesis, we have investigated the state of the art approaches, implemented our own versions and tested them in conventional datasets, like the DUC dataset. Our first approach was a frequencybased approach, since it analyses the frequency in which the text’s words/sentences appear in the text. Higher frequency words/sentences automatically receive higher scores which are then filtered with a compression rate and combined in a summary. Moving on to our second approach, we have improved the original TextRank algorithm by combining it with word embedding vectors. The goal was to represent the text’s sentences as nodes from a graph and with the help of word embeddings, determine how similar are pairs of sentences and rank them by their similarity scores. The highest ranking sentences were filtered with a compression rate and picked for the summary. In the third approach, we combined feature analysis with deep learning. By analysing certain characteristics of the text sentences, one can assign scores that represent the importance of a given sentence for the summary. With these computed values, we have created a dataset for training a deep neural network that is capable of deciding if a certain sentence must be or not in the summary. An abstractive encoderdecoder summarizer was created with the purpose of generating words related to the document subject and combining them into a summary. Finally, every single summarizer was combined into a full system. Each one of our approaches was evaluated with several evaluation metrics, such as ROUGE. We used the DUC dataset for this purpose and the results were fairly similar to the ones in the scientific community. As for our encoderdecode, we got promising results.O texto é um dos utensílios mais importantes de transmissão de ideias entre os seres humanos. Pode ser de vários tipos e o seu conteúdo pode ser mais ou menos fácil de interpretar, conforme a quantidade de informação relevante sobre o assunto principal. De forma a facilitar o processamento pelo leitor existe um mecanismo propositadamente criado para reduzir a informação irrelevante num texto, chamado sumarização de texto. Através da sumarização criamse versões reduzidas do text original e mantémse a informação do assunto principal. Devido à criação e evolução da Internet e outros meios de comunicação, surgiu um aumento exponencial de documentos textuais, evento denominado de sobrecarga de informação, que têm na sua maioria informação desnecessária sobre o assunto que retratam. De forma a resolver este problema global, surgiu dentro da área científica de Processamento de Linguagem Natural, a sumarização automática de texto, que permite criar sumários automáticos de qualquer tipo de texto e de qualquer lingua, através de algoritmos computacionais. Desde a sua criação, inúmeras técnicas de sumarização de texto foram idealizadas, podendo ser classificadas em dois tipos diferentes: extractivas e abstractivas. Em técnicas extractivas, são transcritos elementos do texto original, como palavras ou frases inteiras que sejam as mais ilustrativas do assunto do texto e combinadas num documento. Em técnicas abstractivas, os algoritmos geram elementos novos. Nesta dissertação pesquisaramse, implementaramse e combinaramse algumas das técnicas com melhores resultados de modo a criar um sistema completo para criar sumários. Relativamente às técnicas implementadas, as primeiras três são técnicas extractivas enquanto que a ultima é abstractiva. Desta forma, a primeira incide sobre o cálculo das frequências dos elementos do texto, atribuindose valores às frases que sejam mais frequentes, que por sua vez são escolhidas para o sumário através de uma taxa de compressão. Outra das técnicas incide na representação dos elementos textuais sob a forma de nodos de um grafo, sendo atribuidos valores de similaridade entre os mesmos e de seguida escolhidas as frases com maiores valores através de uma taxa de compressão. Uma outra abordagem foi criada de forma a combinar um mecanismo de análise das caracteristicas do texto com métodos baseados em inteligência artificial. Nela cada frase possui um conjunto de caracteristicas que são usadas para treinar um modelo de rede neuronal. O modelo avalia e decide quais as frases que devem pertencer ao sumário e filtra as mesmas através deu uma taxa de compressão. Um sumarizador abstractivo foi criado para para gerar palavras sobre o assunto do texto e combinar num sumário. Cada um destes sumarizadores foi combinado num só sistema. Por fim, cada uma das técnicas pode ser avaliada segundo várias métricas de avaliação, como por exemplo a ROUGE. Segundo os resultados de avaliação das técnicas, com o conjunto de dados DUC, os nossos sumarizadores obtiveram resultados relativamente parecidos com os presentes na comunidade cientifica, com especial atenção para o codificadordescodificador que em certos casos apresentou resultados promissores

UBibliorum repositorio digital da ubi