8 research outputs found
Sentiment classification of long newspaper articles based on automatically generated thesaurus with various semantic relationships
The paper describes a new approach for sentiment classification of long texts from newspapers using an automatically generated thesaurus. An important part of the proposed approach is specialized thesaurus creation and computation of term's sentiment polarities based on relationships between terms. The approach's efficiency has been proved on a corpus of articles about American immigrants. The experiments showed that the automatically created thesaurus provides better classification quality than manual ones, and generally for this task our approach outperforms existing ones
A survey on thesauri application in automatic natural language processing
This paper is devoted to investigate efficiency of thesauri use in popular natural language processing (NLP) fields: information retrieval and analysis of texts and subject areas. A thesaurus is a natural language resource that models a subject area and can reflect human expert's knowledge in many NLP tasks. The main target of this survey is to determine how much thesauri affect processing quality and where they can provide better performance. We describe studies that use different types of thesauri, discuss contribution of the thesaurus into achieved results, and propose directions for future research in the thesaurus field
Sentiment Classification into Three Classes Applying Multinomial Bayes Algorithm, N-grams, and Thesaurus
The paper is devoted to development of the method that classi?es texts in English and Russian by sentiments into positive, negative, and neutral. The proposed method is based on the Multinomial Naive Bayes classi?er with additional n-grams application. The classi?er is trained either on three classes, or on two contrasting classes with a threshold to separate neutral texts. Experiments with texts on various topics showed signi?cant improvement of classification quality for reviews from a particular domain. Besides, the analysis of thesaurus relationships application to sentiment classification into three classes was done, however it did not show significant improvement of the classification results
ΠΠ½Π°Π»ΠΈΠ· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΡΠ°Π·Π»ΠΈΡΠ½ΡΡ ΡΠΈΠΏΠΎΠ² ΡΠ²ΡΠ·Π΅ΠΉ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ΅ΡΠΌΠΈΠ½Π°ΠΌΠΈ ΡΠ΅Π·Π°ΡΡΡΡΠ°, ΡΠ³Π΅Π½Π΅ΡΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠ³ΠΎ Ρ ΠΏΠΎΠΌΠΎΡΡΡ Π³ΠΈΠ±ΡΠΈΠ΄Π½ΡΡ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ², Π² Π·Π°Π΄Π°ΡΠ°Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠΎΠ²
The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose termsΒ weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification.Β Π¦Π΅Π»Ρ Π΄Π°Π½Π½ΠΎΠΉ ΡΡΠ°ΡΡΠΈ β ΠΏΡΠΎΠ°Π½Π°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°ΡΡ, Π½Π°ΡΠΊΠΎΠ»ΡΠΊΠΎ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎ ΠΌΠΎΠ³ΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΡΡΡΡΡ ΡΠ°Π·Π»ΠΈΡΠ½ΡΠ΅ ΡΠΈΠΏΡ ΡΠ΅Π·Π°ΡΡΡΡΠ½ΡΡ
ΡΠ²ΡΠ·Π΅ΠΉ Π² Π·Π°Π΄Π°ΡΠ°Ρ
ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠΎΠ². ΠΡΠ½ΠΎΠ²ΠΎΠΉ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΡΠ²Π»ΡΠ΅ΡΡΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ ΡΠ³Π΅Π½Π΅ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠΉ ΡΠ΅Π·Π°ΡΡΡΡ ΠΏΡΠ΅Π΄ΠΌΠ΅ΡΠ½ΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ, ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠΈΠΉ ΡΡΠΈ ΡΠΈΠΏΠ° ΡΠ²ΡΠ·Π΅ΠΉ: ΡΠΈΠ½ΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅, ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΈ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΠ²Π½ΡΠ΅. ΠΠ»Ρ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ Π³ΠΈΠ±ΡΠΈΠ΄Π½ΡΠΉ ΠΌΠ΅ΡΠΎΠ΄, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΠΉ Π½Π° Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΡ
Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΈ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π°Π»Π³ΠΎΡΠΈΡΠΌΠ°Ρ
Π²ΡΠ΄Π΅Π»Π΅Π½ΠΈΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠ²ΡΠ·Π΅ΠΉ ΠΈ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡΠΈΠΉ ΡΠΎΠ·Π΄Π°ΡΡ ΡΠ΅Π·Π°ΡΡΡΡ Ρ Π΄ΠΎΡΡΠ°ΡΠΎΡΠ½ΠΎ Π±ΠΎΠ»ΡΡΠΈΠΌ ΡΠΈΡΠ»ΠΎΠΌ ΡΠ΅ΡΠΌΠΈΠ½ΠΎΠ² ΠΈ ΡΠ²ΡΠ·Π΅ΠΉ ΠΌΠ΅ΠΆΠ΄Ρ Π½ΠΈΠΌΠΈ. ΠΠ²ΡΠΎΡΡ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡ Π΄Π²Π΅ Π·Π°Π΄Π°ΡΠΈ: ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠ°Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΡΠ΅ΠΊΡΡΠΎΠ² ΠΈ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ Π±ΠΎΠ»ΡΡΠΈΡ
Π½ΠΎΠ²ΠΎΡΡΠ½ΡΡ
ΡΡΠ°ΡΠ΅ΠΉ ΠΏΠΎ ΡΠΎΠ½Π°Π»ΡΠ½ΠΎΡΡΠΈ. ΠΠ»Ρ ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΊΠ°ΠΆΠ΄ΠΎΠΉ ΠΈΠ· Π½ΠΈΡ
Π°Π²ΡΠΎΡΠ°ΠΌΠΈ Π±ΡΠ»ΠΈ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½Ρ Π΄Π²Π° ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π°, ΠΊΠ°ΠΆΠ΄ΡΠΉ ΠΈΠ· ΠΊΠΎΡΠΎΡΡΡ
Π΄ΠΎΠΏΠΎΠ»Π½ΡΠ΅Ρ ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΡΠ΅ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ ΠΏΡΠΎΡΠ΅Π΄ΡΡΠΎΠΉ, ΠΏΡΠΈΠΌΠ΅Π½ΡΡΡΠ΅ΠΉ ΡΠ²ΡΠ·ΠΈ ΡΠ΅Π·Π°ΡΡΡΡΠ° Π΄Π»Ρ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΎΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΠ΅ΠΉ ΡΠ΅ΠΊΡΡΠΎΠ². ΠΠΎΠ΄Ρ
ΠΎΠ΄ ΠΊ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ Π²ΠΊΠ»ΡΡΠ°Π΅Ρ Π² ΡΠ΅Π±Ρ ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΡΠΉ Π°Π»Π³ΠΎΡΠΈΡΠΌ BM25 Π²ΠΈΠ΄Π° Β«ΠΎΠ±ΡΡΠ΅Π½ΠΈΠ΅ Π±Π΅Π· ΡΡΠΈΡΠ΅Π»ΡΒ» ΠΈ ΠΏΡΠΎΡΠ΅Π΄ΡΡΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡΡΡΡ ΡΠΈΠ½ΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΈ ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΡΠ²ΡΠ·ΠΈ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΠΏΡΠ΅Π΄ΠΌΠ΅ΡΠ½ΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ. ΠΠΎΠ΄Ρ
ΠΎΠ΄ ΠΊ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΠΏΠΎ ΡΠΎΠ½Π°Π»ΡΠ½ΠΎΡΡΠΈ ΡΠΎΡΡΠΎΠΈΡ ΠΈΠ· Π΄Π²ΡΡ
ΡΠ°Π³ΠΎΠ². ΠΠ° ΠΏΠ΅ΡΠ²ΠΎΠΌ ΡΠ°Π³Π΅ ΡΠΎΠ·Π΄Π°Π΅ΡΡΡ ΡΠ΅Π·Π°ΡΡΡΡ, ΡΠΎΠ½Π°Π»ΡΠ½ΡΠ΅ Π²Π΅ΡΠ° ΡΠ΅ΡΠΌΠΈΠ½ΠΎΠ² ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ ΡΡΠΈΡΠ°ΡΡΡΡ Π² Π·Π°Π²ΠΈΡΠΈΠΌΠΎΡΡΠΈ ΠΎΡ ΡΠ°ΡΡΠΎΡΡ Π²ΡΡΡΠ΅ΡΠ°Π΅ΠΌΠΎΡΡΠΈ Π² ΠΎΠ±ΡΡΠ°Π΅ΠΌΠΎΠΉ Π²ΡΠ±ΠΎΡΠΊΠ΅ ΠΈΠ»ΠΈ ΠΎΡ Π²Π΅ΡΠ° ΡΠΎΡΠ΅Π΄Π΅ΠΉ ΠΏΠΎ ΡΠ΅Π·Π°ΡΡΡΡΡ. ΠΠ° Π²ΡΠΎΡΠΎΠΌ ΡΠ°Π³Π΅ ΡΠ΅Π·Π°ΡΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΡΠ΅ΡΡΡ Π΄Π»Ρ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² ΡΠ»ΠΎΠ² ΠΈΠ· ΡΠ΅ΠΊΡΡΠΎΠ² ΠΈ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠΎΠ² ΠΌΠ΅ΡΠΎΠ΄ΠΎΠΌ ΠΎΠΏΠΎΡΠ½ΡΡ
Π²Π΅ΠΊΡΠΎΡΠΎΠ² ΠΈΠ»ΠΈ Π½Π°ΠΈΠ²Π½ΡΠΌ Π±Π°ΠΉΠ΅ΡΠΎΠ²ΡΠΊΠΈΠΌ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΎΡΠΎΠΌ. Π ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Ρ
Ρ ΠΊΠΎΡΠΏΡΡΠ°ΠΌΠΈ BBCSport, Reuters, PubMed ΠΈ ΠΊΠΎΡΠΏΡΡΠΎΠΌ ΡΡΠ°ΡΠ΅ΠΉ ΠΎΠ± Π°ΠΌΠ΅ΡΠΈΠΊΠ°Π½ΡΠΊΠΈΡ
ΠΈΠΌΠΌΠΈΠ³ΡΠ°Π½ΡΠ°Ρ
Π°Π²ΡΠΎΡΡ Π²Π°ΡΡΠΈΡΠΎΠ²Π°Π»ΠΈ ΡΠΈΠΏΡ ΡΠ²ΡΠ·Π΅ΠΉ, ΠΊΠΎΡΠΎΡΡΠ΅ ΡΡΠ°ΡΡΠ²ΡΡΡ Π² ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ, ΠΈ ΡΡΠ΅ΠΏΠ΅Π½Ρ ΠΈΡ
ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ. Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡ ΠΎΡΠ΅Π½ΠΈΡΡ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΡΠ΅Π·Π°ΡΡΡΡΠ½ΡΡ
ΡΠ²ΡΠ·Π΅ΠΉ Π΄Π»Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° Π΅ΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠΌ ΡΠ·ΡΠΊΠ΅ ΠΈ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΡ, ΠΏΡΠΈ ΠΊΠ°ΠΊΠΈΡ
ΡΡΠ»ΠΎΠ²ΠΈΡΡ
ΡΠ΅ ΠΈΠ»ΠΈ ΠΈΠ½ΡΠ΅ ΡΠ²ΡΠ·ΠΈ ΠΈΠΌΠ΅ΡΡ Π±ΠΎΠ»ΡΡΡΡ Π·Π½Π°ΡΠΈΠΌΠΎΡΡΡ. Π ΡΠ°ΡΡΠ½ΠΎΡΡΠΈ, Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΠΏΠΎΠ»Π΅Π·Π½ΡΠΌΠΈ ΡΠ΅Π·Π°ΡΡΡΡΠ½ΡΠΌΠΈ ΡΠ²ΡΠ·ΡΠΌΠΈ ΠΎΠΊΠ°Π·Π°Π»ΠΈΡΡ ΡΠΈΠ½ΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΈ ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠΈΠ΅, ΡΠ°ΠΊ ΠΊΠ°ΠΊ ΠΎΠ½ΠΈ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠΈΠ²Π°Π΅Ρ Π»ΡΡΡΠ΅Π΅ ΠΊΠ°ΡΠ΅ΡΡΠ²ΠΎ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ.
Design of Diary Applications for Vital Sign Registration Targeted at Multiple Android Application Stores
The paper considers two aspects of expanding the user base of mobile applications for vital sign registration: making one application easily accessible from the others and targeting at multiple application stores. We provide a special design solution that allows to resolve these issues in a maintainable way
Analysis of relation extraction methods for automatic generation of specialized thesauri: Prospect of hybrid methods
The paper is devoted to analysis of methods that can be used for automatic generation of specialized thesauri. The authors developed a test bench that allows to estimate most popular methods for relation extraction that constitute the main part of such generation. On the basis of experiments conducted on the test bench the idea of hybrid thesaurus generation methods that combine the algorithms showed the best performance was proposed. Its efficiency was illustrated by creation of the thesaurus for the medical domain with its subsequent estimation on the test bench
Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems
The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose termsΒ weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification