10 research outputs found

    Building sentiment Lexicons applying graph theory on information from three Norwegian thesauruses

    Get PDF
    Sentiment lexicons are the most used tool to automatically predict sentiment in text. To the best of our knowledge, there exist no openly available sentiment lexicons for the Norwegian language. Thus in this paper we applied two different strategies to automatically generate sentiment lexicons for the Norwegian language. The first strategy used machine translation to translate an English sentiment lexicon to Norwegian and the other strategy used information from three different thesauruses to build several sentiment lexicons. The lexicons based on thesauruses were built using the Label propagation algorithm from graph theory. The lexicons were evaluated by classifying product and movie reviews. The results show satisfying classification performances. Different sentiment lexicons perform well on product and on movie reviews. Overall the lexicon based on machine translation performed the best, showing that linguistic resources in English can be translated to Norwegian without losing significant value

    Automatically generating a sentiment lexicon for the Malay language

    Get PDF
    This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a foundation for further research for the Malay language in this area

    ANALISIS SENTIMEN PENGGUNA INSTAGRAM TERHADAP KEBIJAKAN KEMDIKBUD MENGENAI BANTUAN KUOTA INTERNET DENGAN METODE SUPPORT VECTOR MACHINE (SVM)

    Get PDF
    COVID-19 merupakan suatu pandemi baru yang disebabkan oleh coronavirus dan banyak memberikan dampak salah satunya pada dunia pendidikan sehingga mengharuskan menggunakan sistem pembelajaran jarak jauh. Untuk mendukung sistem tersebut, pemerintah Indonesia melalui Kemdikbud memberikan bantuan kepada peserta didik dan tenaga pendidik berupa bantuan kuota internet. Sebagian masyarakat menyampaikan tanggapan dan opininya mengenai bantuan kuota yang disediakan pemerintah di media sosial salah satunya Instagram. Opini-opini tersebut dimanfaatkan untuk mengetahui penilaian masyarakat terhadap bantuan kuota apakah positif atau negatif dengan menggunakan analisis sentimen. Data yang digunakan pada penelitian ini adalah data komentar pengguna instagram di 7 unggahan akun @kemdikbud.ri yang berkaitan dengan bantuan kuota internet mulai tanggal 27 Agustus – 30 September 2020 yang diperoleh melalui scraping sehingga didapatkan sebanyak 4520 komentar yang kemudian diolah dengan melakukan text preprocessing dan diklasifikasikan menggunakan algoritma support vector machine. Hasil dari tahapan preprocessing sebanyak 32.81% (1483 komentar) data siap digunakan untuk analisis sentimen. Setelah dilakukan analisis klasifikasi didapatkan model yang digunakan yaitu tipe C-Classification, dimana model pendekatan yang digunakan adalah SVM-Kernel Radial (Radial Basis Function) dan menghasilkan persentase komentar berupa sentimen positif sebanyak 61.5%. Model SVM Radian (RBF) mampu melakukan pengklasifikasian respons pengguna Instagram terkait pemberian bantuan kuota internet dengan cukup baik. Hal tersebut dibuktikan dengan nilai evaluasi model berupa tingkat akurasi seebsar 79.67%, sensitivitas sebesar 78.89%, dan spesifisitas sebesar 81.82%

    Identifikasi Cyberbullying pada Komentar Instagram menggunakan Metode Lexicon-Based dan Naïve Bayes Classifier (Studi kasus: Pemilihan Presiden Indonesia Tahun 2019)

    Get PDF
    Tahun 2019 Indonesia diwarnai dengan semarak demokrasi. Masyarakat menyambut dengan gembira dan antusiasme yang tinggi pada Pemilihan Umum Presiden yang dilaksanakan April 2019. Pilpres ini ramai diperbincangkan di dunia nyata maupun dunia maya, khususnya di media sosial Instagram. Semua orang bebas berpendapat atau beropini tentang masing-masing calon Presiden. Tetapi, yang menjadi persoalan adalah ketika berpendapat tidak berlandaskan etika, sehingga membuat pertentangan antara masing-masing pendukung pasangan calon presiden. Perang komentar yang membully, menjelekkan, atau menjatuhkan lawan mewarnai situasi tersebut. Untuk itu, perlu dilakukan identifikasi cyberbullying pada komentar Instagram untuk mengklasifikasikan komentar yang mengandung cyberbullying atau non cyberbullying. Metode yang digunakan dalam penelitian ini adalah metode berbasis lexicon dan metode berbasis learning yaitu naïve bayes classifier. Proses sistem dimulai dari text preprocessing dengan tahapan cleaning, casefolding, dan stemming. Kemudian dilakukan proses klasifikasi menggunakan metode Lexicon based dan naïve bayes classifier, dan hasil keluaran sistem berupa identifikasi apakah komentar termasuk cyberbullying atau non cyberbullying. Pada penelitian ini didapatkan hasil performansi dari metode Lexicon-Based menghasilkan akurasi sebesar 58%, presisi 52%, recall 75% dan F-score 61%. Sedangkan naïve bayes classifier didapatkan akurasi 97%, presisi 94%, recall 100%, dan F1-score 97%. Kata kunci : cyberbullying, instagram, Lexicon-Based , naïve bayes classifier

    Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

    No full text
    Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the latter requires the huge challenge of obtaining enough human-labeled training data for extremely large-scale and diverse social opinion data. The lexicon-based approach requires a sentiment dictionary to determine opinion polarity. This dictionary can also provide useful features for any supervised learning method of the machine learning approach. However, many benchmark sentiment dictionaries do not cover the many informal and spoken words used in social media. In addition, they are not able to update frequently to include newly generated words online. In this paper, we present an automatic sentiment dictionary generation method, called Constrained Symmetric Nonnegative Matrix Factorization (CSNMF) algorithm, to assign polarity scores to each word in the dictionary, on a large social media corpus — digg.com. Moreover, we will demonstrate our study of Amazon Mechanical Turk (AMT) on social media word polarity, using both the human-labeled dictionaries from AMT and the General Inquirer Lexicon to compare our generated dictionary with. In our experiment, we show that combining links from both WordNet and the corpus to generate sentiment dictionaries does outperform using only one of them, and the words with higher sentiment scores yield better precision. Finally, we conducted a lexicon-based sentiment analysis on human-labeled social comments using our generated sentiment dictionary to show the effectiveness of our method

    Three Essays on Trust Mining in Online Social Networks

    Get PDF
    This dissertation research consists of three essays on studying trust in online social networks. Trust plays a critical role in online social relationships, because of the high levels of risk and uncertainty involved. Guided by relevant social science and computational graph theories, I develop conceptual and predictive models to gain insights into trusting behaviors in online social relationships. In the first essay, I propose a conceptual model of trust formation in online social networks. This is the first study that integrates the existing graph-based view of trust formation in social networks with socio-psychological theories of trust to provide a richer understanding of trusting behaviors in online social networks. I introduce new behavioral antecedents of trusting behaviors and redefine and integrate existing graph-based concepts to develop the proposed conceptual model. The empirical findings indicate that both socio-psychological and graph-based trust-related factors should be considered in studying trust formation in online social networks. In the second essay, I propose a theory-based predictive model to predict trust and distrust links in online social networks. Previous trust prediction models used limited network structural data to predict future trust/distrust relationships, ignoring the underlying behavioral trust-inducing factors. I identify a comprehensive set of behavioral and structural predictors of trust/distrust links based on related theories, and then build multiple supervised classification models to predict trust/distrust links in online social networks. The empirical results confirm the superior fit and predictive performance of the proposed model over the baselines. In the third essay, I propose a lexicon-based text mining model to mine trust related user-generated content (UGC). This is the first theory-based text mining model to examine important factors in online trusting decisions from UGC. I build domain-specific trustworthiness lexicons for online social networks based on related behavioral foundations and text mining techniques. Next, I propose a lexicon-based text mining model that automatically extracts and classifies trustworthiness characteristics from trust reviews. The empirical evaluations show the superior performance of the proposed text mining system over the baselines

    Analisis Sentimen Cyberbullying Pada Komentar Instagram Dengan Metode Klasifikasi Support Vector Machine

    Get PDF
    Instagram merupakan media sosial yang paling populer pada zaman sekarang. Pengguna yang dimulai dari anak-anak, remaja hingga orang dewasa turut mendongkrak popularitas Instagram. Namun, media sosial ini tidak lepas dari bahaya cyberbullying yang sering dilakukan oleh pengguna khususnya pada kolom komentar. Dengan data statistik yang telah didapatkan, bahwa 42% remaja berusia 12-20 tahun telah menjadi korban cyberbullying. Bahaya cyberbullying tentunya meresahkan banyak orang dikarenakan dampak yang ditimbulkan, maka dari itu dapat dilakukan suatu analisis sentimen pada kolom komentar Instagram yang berupaya untuk mengetahui sentimen dari setiap komentar. Analisis sentimen merupakan suatu cabang ilmu dari text mining yang digunakan untuk mengekstrak, memahami, dan mengolah data teks. Untuk mengetahui setiap sentimen pada komentar digunakan fitur Term Frequency-Inverse Document Frequency (TF-IDF) dan metode klasifikasi Support Vector Machine (SVM). Dokumen yang berisi 400 data yang diambil secara luring (offline) dengan total fitur 1799. Dokumen komentar dibagi menjadi 70% data latih dan 30% data uji. Berdasarkan pengujian yang dilakukan didapatkan parameter terbaik pada metode SVM yaitu dengan nilai degree kernel polynomial sebesar 2, nilai learning rate sebesar 0,0001, dan jumlah iterasi maksimum yang digunakan adalah 200 kali. Dari pengujian tersebut didapatkan hasil akurasi tertinggi sebesar 90% pada komposisi data latih 50% dan komposisi data uji 50%

    Learning domain-specific sentiment lexicons with applications to recommender systems

    Get PDF
    Search is now going beyond looking for factual information, and people wish to search for the opinions of others to help them in their own decision-making. Sentiment expressions or opinion expressions are used by users to express their opinion and embody important pieces of information, particularly in online commerce. The main problem that the present dissertation addresses is how to model text to find meaningful words that express a sentiment. In this context, I investigate the viability of automatically generating a sentiment lexicon for opinion retrieval and sentiment classification applications. For this research objective we propose to capture sentiment words that are derived from online users’ reviews. In this approach, we tackle a major challenge in sentiment analysis which is the detection of words that express subjective preference and domain-specific sentiment words such as jargon. To this aim we present a fully generative method that automatically learns a domain-specific lexicon and is fully independent of external sources. Sentiment lexicons can be applied in a broad set of applications, however popular recommendation algorithms have somehow been disconnected from sentiment analysis. Therefore, we present a study that explores the viability of applying sentiment analysis techniques to infer ratings in a recommendation algorithm. Furthermore, entities’ reputation is intrinsically associated with sentiment words that have a positive or negative relation with those entities. Hence, is provided a study that observes the viability of using a domain-specific lexicon to compute entities reputation. Finally, a recommendation system algorithm is improved with the use of sentiment-based ratings and entities reputation

    MELex: a new lexicon for sentiment analysis in mining public opinion of Malaysia affordable housing projects

    Get PDF
    Sentiment analysis has the potential as an analytical tool to understand the preferences of the public. It has become one of the most active and progressively popular areas in information retrieval and text mining. However, in the Malaysia context, the sentiment analysis is still limited due to the lack of sentiment lexicon. Thus, the focus of this study is to a new lexicon and enhance the classification accuracy of sentiment analysis in mining public opinion for Malaysia affordable housing project. The new lexicon for sentiment analysis is constructed by using a bilingual and domain-specific sentiment lexicon approach. A detailed review of existing approaches has been conducted and a new bilingual sentiment lexicon known as MELex (Malay-English Lexicon) has been generated. The developed approach is able to analyze text for two most widely used languages in Malaysia, Malay and English, with better accuracy. The process of constructing MELex involves three activities: seed words selection, polarity assignment and synonym expansions, with four different experiments have been implemented. It is evaluated based on the experimentation and case study approaches where PR1MA and PPAM are selected as case projects. Based on the comparative results over 2,230 testing data, the study reveals that the classification using MELex outperforms the existing approaches with the accuracy achieved for PR1MA and PPAM projects are 90.02% and 89.17%, respectively. This indicates the capabilities of MELex in classifying public sentiment towards PRIMA and PPAM housing projects. The study has shown promising and better results in property domain as compared to the previous research. Hence, the lexicon-based approach implemented in this study can reflect the reliability of the sentiment lexicon in classifying public sentiments
    corecore