8 research outputs found

    Sentiment classification of long newspaper articles based on automatically generated thesaurus with various semantic relationships

    Get PDF
    The paper describes a new approach for sentiment classification of long texts from newspapers using an automatically generated thesaurus. An important part of the proposed approach is specialized thesaurus creation and computation of term's sentiment polarities based on relationships between terms. The approach's efficiency has been proved on a corpus of articles about American immigrants. The experiments showed that the automatically created thesaurus provides better classification quality than manual ones, and generally for this task our approach outperforms existing ones

    A survey on thesauri application in automatic natural language processing

    Get PDF
    This paper is devoted to investigate efficiency of thesauri use in popular natural language processing (NLP) fields: information retrieval and analysis of texts and subject areas. A thesaurus is a natural language resource that models a subject area and can reflect human expert's knowledge in many NLP tasks. The main target of this survey is to determine how much thesauri affect processing quality and where they can provide better performance. We describe studies that use different types of thesauri, discuss contribution of the thesaurus into achieved results, and propose directions for future research in the thesaurus field

    Sentiment Classification into Three Classes Applying Multinomial Bayes Algorithm, N-grams, and Thesaurus

    Get PDF
    The paper is devoted to development of the method that classi?es texts in English and Russian by sentiments into positive, negative, and neutral. The proposed method is based on the Multinomial Naive Bayes classi?er with additional n-grams application. The classi?er is trained either on three classes, or on two contrasting classes with a threshold to separate neutral texts. Experiments with texts on various topics showed signi?cant improvement of classification quality for reviews from a particular domain. Besides, the analysis of thesaurus relationships application to sentiment classification into three classes was done, however it did not show significant improvement of the classification results

    Анализ использования Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Ρ… Ρ‚ΠΈΠΏΠΎΠ² связСй ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π΅Ρ€ΠΌΠΈΠ½Π°ΠΌΠΈ тСзауруса, сгСнСрированного с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½Ρ‹Ρ… ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ², Π² Π·Π°Π΄Π°Ρ‡Π°Ρ… классификации тСкстов

    Get PDF
    The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose termsΒ weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification. ЦСль Π΄Π°Π½Π½ΠΎΠΉ ΡΡ‚Π°Ρ‚ΡŒΠΈ β€” ΠΏΡ€ΠΎΠ°Π½Π°Π»ΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ, насколько эффСктивно ΠΌΠΎΠ³ΡƒΡ‚ ΠΏΡ€ΠΈΠΌΠ΅Π½ΡΡ‚ΡŒΡΡ Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Π΅ Ρ‚ΠΈΠΏΡ‹ тСзаурусных связСй Π² Π·Π°Π΄Π°Ρ‡Π°Ρ… классификации тСкстов. Основой исслСдования являСтся автоматичСски сгСнСрированный тСзаурус ΠΏΡ€Π΅Π΄ΠΌΠ΅Ρ‚Π½ΠΎΠΉ области, содСрТащий Ρ‚Ρ€ΠΈ Ρ‚ΠΈΠΏΠ° связСй: синонимичСскиС, иСрархичСскиС ΠΈ ассоциативныС. Для Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ тСзауруса ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ Π³ΠΈΠ±Ρ€ΠΈΠ΄Π½Ρ‹ΠΉ ΠΌΠ΅Ρ‚ΠΎΠ΄, основанный Π½Π° Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… лингвистичСских ΠΈ статистичСских Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ°Ρ… выдСлСния сСмантичСских связСй ΠΈ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‰ΠΈΠΉ ΡΠΎΠ·Π΄Π°Ρ‚ΡŒ тСзаурус с достаточно большим числом Ρ‚Π΅Ρ€ΠΌΠΈΠ½ΠΎΠ² ΠΈ связСй ΠΌΠ΅ΠΆΠ΄Ρƒ Π½ΠΈΠΌΠΈ. Авторы Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ Π΄Π²Π΅ Π·Π°Π΄Π°Ρ‡ΠΈ: тСматичСская классификация тСкстов ΠΈ классификация Π±ΠΎΠ»ΡŒΡˆΠΈΡ… новостных статСй ΠΏΠΎ Ρ‚ΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ. Для Ρ€Π΅ΡˆΠ΅Π½ΠΈΡ ΠΊΠ°ΠΆΠ΄ΠΎΠΉ ΠΈΠ· Π½ΠΈΡ… Π°Π²Ρ‚ΠΎΡ€Π°ΠΌΠΈ Π±Ρ‹Π»ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Π½Ρ‹ Π΄Π²Π° ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Π°, ΠΊΠ°ΠΆΠ΄Ρ‹ΠΉ ΠΈΠ· ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… дополняСт стандартныС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€ΠΎΠΉ, ΠΏΡ€ΠΈΠΌΠ΅Π½ΡΡŽΡ‰Π΅ΠΉ связи тСзауруса для опрСдСлСния сСмантичСских особСнностСй тСкстов. ΠŸΠΎΠ΄Ρ…ΠΎΠ΄ ΠΊ тСматичСской классификации Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Π² сСбя стандартный Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ BM25 Π²ΠΈΠ΄Π° Β«ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π±Π΅Π· учитСля» ΠΈ ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€Ρƒ, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‰ΡƒΡŽ синонимичСскиС ΠΈ иСрархичСскиС связи тСзауруса ΠΏΡ€Π΅Π΄ΠΌΠ΅Ρ‚Π½ΠΎΠΉ области. ΠŸΠΎΠ΄Ρ…ΠΎΠ΄ ΠΊ классификации ΠΏΠΎ Ρ‚ΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ состоит ΠΈΠ· Π΄Π²ΡƒΡ… шагов. На ΠΏΠ΅Ρ€Π²ΠΎΠΌ шагС создаСтся тСзаурус, Ρ‚ΠΎΠ½Π°Π»ΡŒΠ½Ρ‹Π΅ вСса Ρ‚Π΅Ρ€ΠΌΠΈΠ½ΠΎΠ² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ ΡΡ‡ΠΈΡ‚Π°ΡŽΡ‚ΡΡ Π² зависимости ΠΎΡ‚ частоты встрСчаСмости Π² ΠΎΠ±ΡƒΡ‡Π°Π΅ΠΌΠΎΠΉ Π²Ρ‹Π±ΠΎΡ€ΠΊΠ΅ ΠΈΠ»ΠΈ ΠΎΡ‚ вСса сосСдСй ΠΏΠΎ тСзаурусу. На Π²Ρ‚ΠΎΡ€ΠΎΠΌ шагС тСзаурус примСняСтся для вычислСния ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² слов ΠΈΠ· тСкстов ΠΈ классификации тСкстов ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠΌ ΠΎΠΏΠΎΡ€Π½Ρ‹Ρ… Π²Π΅ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΈΠ»ΠΈ Π½Π°ΠΈΠ²Π½Ρ‹ΠΌ байСсовским классификатором. Π’ экспСримСнтах с корпусами BBCSport, Reuters, PubMed ΠΈ корпусом статСй ΠΎΠ± амСриканских ΠΈΠΌΠΌΠΈΠ³Ρ€Π°Π½Ρ‚Π°Ρ… Π°Π²Ρ‚ΠΎΡ€Ρ‹ Π²Π°Ρ€ΡŒΠΈΡ€ΠΎΠ²Π°Π»ΠΈ Ρ‚ΠΈΠΏΡ‹ связСй, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΡƒΡ‡Π°ΡΡ‚Π²ΡƒΡŽΡ‚ Π² классификации, ΠΈ ΡΡ‚Π΅ΠΏΠ΅Π½ΡŒ ΠΈΡ… использования. Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹ экспСримСнтов ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‚ ΠΎΡ†Π΅Π½ΠΈΡ‚ΡŒ ΡΡ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ примСнСния тСзаурусных связСй для классификации тСкстов Π½Π° СстСствСнном языкС ΠΈ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚ΡŒ, ΠΏΡ€ΠΈ ΠΊΠ°ΠΊΠΈΡ… условиях Ρ‚Π΅ ΠΈΠ»ΠΈ ΠΈΠ½Ρ‹Π΅ связи ΠΈΠΌΠ΅ΡŽΡ‚ Π±ΠΎΠ»ΡŒΡˆΡƒΡŽ Π·Π½Π°Ρ‡ΠΈΠΌΠΎΡΡ‚ΡŒ. Π’ частности, Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌΠΈ тСзаурусными связями оказались синонимичСскиС ΠΈ иСрархичСскиС, Ρ‚Π°ΠΊ ΠΊΠ°ΠΊ ΠΎΠ½ΠΈ обСспСчиваСт Π»ΡƒΡ‡ΡˆΠ΅Π΅ качСство классификации.

    Design of Diary Applications for Vital Sign Registration Targeted at Multiple Android Application Stores

    No full text
    The paper considers two aspects of expanding the user base of mobile applications for vital sign registration: making one application easily accessible from the others and targeting at multiple application stores. We provide a special design solution that allows to resolve these issues in a maintainable way

    Analysis of relation extraction methods for automatic generation of specialized thesauri: Prospect of hybrid methods

    No full text
    The paper is devoted to analysis of methods that can be used for automatic generation of specialized thesauri. The authors developed a test bench that allows to estimate most popular methods for relation extraction that constitute the main part of such generation. On the basis of experiments conducted on the test bench the idea of hybrid thesaurus generation methods that combine the algorithms showed the best performance was proposed. Its efficiency was illustrated by creation of the thesaurus for the medical domain with its subsequent estimation on the test bench

    Analysis of Influence of Different Relations Types on the Quality of Thesaurus Application to Text Classification Problems

    No full text
    The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose termsΒ weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification
    corecore