Search CORE

8 research outputs found

Sentiment classification of long newspaper articles based on automatically generated thesaurus with various semantic relationships

Author: Ilya Paramonov
Ivan Shchitov
Ksenia Lagutina
Nadezhda Lagutina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

The paper describes a new approach for sentiment classification of long texts from newspapers using an automatically generated thesaurus. An important part of the proposed approach is specialized thesaurus creation and computation of term's sentiment polarities based on relationships between terms. The approach's efficiency has been proved on a corpus of articles about American immigrants. The experiments showed that the automatically created thesaurus provides better classification quality than manual ones, and generally for this task our approach outperforms existing ones

Directory of Open Access Journals

A survey on thesauri application in automatic natural language processing

Author: Andrey Vasilyev
Ilya Paramonov
Ivan Shchitov
Ksenia Lagutina
Nadezhda Lagutina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

This paper is devoted to investigate efficiency of thesauri use in popular natural language processing (NLP) fields: information retrieval and analysis of texts and subject areas. A thesaurus is a natural language resource that models a subject area and can reflect human expert's knowledge in many NLP tasks. The main target of this survey is to determine how much thesauri affect processing quality and where they can provide better performance. We describe studies that use different types of thesauri, discuss contribution of the thesaurus into achieved results, and propose directions for future research in the thesaurus field

Directory of Open Access Journals

Sentiment Classification into Three Classes Applying Multinomial Bayes Algorithm, N-grams, and Thesaurus

Author: Ilya Paramonov
Ivan Shchitov
Ksenia Lagutina
Nadezhda Lagutina
Vladislav Larionov
Vladislav Petryakov
Publication venue: FRUCT
Publication date: 01/04/2019
Field of study

The paper is devoted to development of the method that classi?es texts in English and Russian by sentiments into positive, negative, and neutral. The proposed method is based on the Multinomial Naive Bayes classi?er with additional n-grams application. The classi?er is trained either on three classes, or on two contrasting classes with a threshold to separate neutral texts. Experiments with texts on various topics showed signi?cant improvement of classification quality for reviews from a particular domain. Besides, the analysis of thesaurus relationships application to sentiment classification into three classes was done, however it did not show significant improvement of the classification results

Crossref

Directory of Open Access Journals

Анализ использования различных типов связей между терминами тезауруса, сгенерированного с помощью гибридных методов, в задачах классификации текстов

Author: Ilya Paramonov V.
Ivan Shchitov A.
Ksenia Lagutina V.
Nadezhda Lagutina S.
Иван Щитов Андреевич
Илья Парамонов Вячеславович
Ксения Лагутина Владимировна
Надежда Лагутина Станиславовна
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 18/12/2017
Field of study

The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose terms weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification. Цель данной статьи — проанализировать, насколько эффективно могут применяться различные типы тезаурусных связей в задачах классификации текстов. Основой исследования является автоматически сгенерированный тезаурус предметной области, содержащий три типа связей: синонимические, иерархические и ассоциативные. Для генерации тезауруса используется гибридный метод, основанный на нескольких лингвистических и статистических алгоритмах выделения семантических связей и позволяющий создать тезаурус с достаточно большим числом терминов и связей между ними. Авторы рассматривают две задачи: тематическая классификация текстов и классификация больших новостных статей по тональности. Для решения каждой из них авторами были использованы два подхода, каждый из которых дополняет стандартные алгоритмы процедурой, применяющей связи тезауруса для определения семантических особенностей текстов. Подход к тематической классификации включает в себя стандартный алгоритм BM25 вида «обучение без учителя» и процедуру, использующую синонимические и иерархические связи тезауруса предметной области. Подход к классификации по тональности состоит из двух шагов. На первом шаге создается тезаурус, тональные веса терминов которого считаются в зависимости от частоты встречаемости в обучаемой выборке или от веса соседей по тезаурусу. На втором шаге тезаурус применяется для вычисления признаков слов из текстов и классификации текстов методом опорных векторов или наивным байесовским классификатором. В экспериментах с корпусами BBCSport, Reuters, PubMed и корпусом статей об американских иммигрантах авторы варьировали типы связей, которые участвуют в классификации, и степень их использования. Результаты экспериментов позволяют оценить эффективность применения тезаурусных связей для классификации текстов на естественном языке и определить, при каких условиях те или иные связи имеют большую значимость. В частности, наиболее полезными тезаурусными связями оказались синонимические и иерархические, так как они обеспечивает лучшее качество классификации.

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

Design of Diary Applications for Vital Sign Registration Targeted at Multiple Android Application Stores

Author: Eldar Mamedov
Ilya Paramonov
Ivan Shchitov
Publication venue: FRUCT
Publication date: 01/03/2014
Field of study

The paper considers two aspects of expanding the user base of mobile applications for vital sign registration: making one application easily accessible from the others and targeting at multiple application stores. We provide a special design solution that allows to resolve these issues in a maintainable way

Directory of Open Access Journals

Analysis of relation extraction methods for automatic generation of specialized thesauri: Prospect of hybrid methods

Author: Eldar Mamedov
Ilya Paramonov
Ivan Shchitov
Ksenia Lagutina
Nadezhda Lagutina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2016
Field of study

The paper is devoted to analysis of methods that can be used for automatic generation of specialized thesauri. The authors developed a test bench that allows to estimate most popular methods for relation extraction that constitute the main part of such generation. On the basis of experiments conducted on the test bench the idea of hybrid thesaurus generation methods that combine the algorithms showed the best performance was proposed. Its efficiency was illustrated by creation of the thesaurus for the medical domain with its subsequent estimation on the test bench

Directory of Open Access Journals

Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems

Author: Ilya V. Paramonov
Ivan A. Shchitov
Ksenia V. Lagutina
Nadezhda S. Lagutina
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 01/12/2017
Field of study

Directory of Open Access Journals

Increasing the Efficiency in Blending of Source Components in the Production of Granulated Compositions

Author: Evgeniy E Kuznetsov
Ivan V Bumbar
Maksim V Shevchenko
Olga V Shchegorets
Seraphima P Prisyazhnaya
Sergei V Shchitov
Sergey N Voyakin
Yuri B Kurkov
Publication venue: 'American Scientific Publishers'
Publication date
Field of study

Crossref