Search CORE

17 research outputs found

Do Dependency Relations Help in the Task of Stance Detection?

Author: Bosco Cristina
Cignarella ALESSANDRA TERESA
Rosso Paolo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry

Author: Gorman Robert J.
Gorman Vanessa
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9–10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybius’ excerptor, who leaves conspicuous syntactic indicators of his modifications

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry

Author: Gorman Robert J.
Gorman Vanessa
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

Программно-алгоритмическое обеспечение создания синтаксическо-статистической модели русского языка по текстовому корпусу

Author: Кипяткова Ирина Сергеевна
Publication venue: СПб ФИЦ РАН
Publication date: 01/02/2013
Field of study

Creation of the language model is one of the stages of training of a continuous speech recognition system. In the paper, the developed software for creation of syntactic-statistical Russian language model based on a text corpus is described. The main stages of the algorithm are preliminary text material processing, creation of statistical n-gram language model, extension of the statistical model by n-grams obtained by syntactical analysis. Syntactical analysis permits to increase the quantity of different bigrams created during text processing and to improve the quality of the language model by extracting grammatically-connected word pairs. The results of the testing of the language models created with the help of the software module are presented.Создание модели языка является одним из этапов обучения системы распознавания слитной речи. В статье описаны алгоритм и разработанные программные средства для создания синтаксическо-статистической модели русского языка по текстовому корпусу. Основными этапами в работе алгоритма являются предварительная обработка текстового материала, создание статистической n-граммной модели языка, дополнение статистической модели n-граммами, полученными в результате синтаксического анализа. Синтаксический анализ позволяет увеличить количество создаваемых в результате обработки текста различных биграмм и тем самым повысить качество модели языка за счет выявления грамматически связанных пар слов. Приводятся результаты тестирования созданных с помощью программного модуля моделей языка по показателям информационной энтропии, коэффициента неопределенности, относительного количества внесловарных слов и совпадений n-грамм

Информатика и автоматизация

Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection

Author: Abdul Aziz N. A.
Maarof M. A.
Zainal A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Social media has changed the world and play an important role in people lives. Social media platforms like Twitter, Facebook and YouTube create a new dimension of communication by providing channels to express and exchange ideas freely. Although the evolution brings numerous benefits, the dynamic environment and the allowable of anonymous posts could expose the uglier side of humanity. Irresponsible people would abuse the freedom of speech by aggressively express opinion or idea that incites hatred. This study performs hate speech and offensive language detection. The problem of this task is there is no clear boundary between hate speech and offensive language. In this study, a selected new features set is proposed for detecting hate speech and offensive language. Using Twitter dataset, the experiments are performed by considering the combination of word n-gram and enhanced syntactic n-gram. To reduce the feature set, filter-embedded combining feature selection is used. The experimental results indicate that the combination of word n-gram and enhanced syntactic n-gram with feature selection to classify the data into three classes: hate speech, offensive language or neither could give good performance. The result reaches 91% for accuracy and the averages of precision, recall and F1

Universiti Teknologi Malaysia Institutional Repository

Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

Author: Sennrich Rico
Publication venue
Publication date: 01/01/2015
Field of study

Edinburgh Research Explorer