1,170 research outputs found

    Text Catagorization Using Hybrid Na�ve Bayes Algorithm

    Get PDF
    Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers .Text classification is a growing interest in the research of text mining. Correctly identifying the Text into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the present classifying approaches, Na�ve Bayes is probably smart at serving as a document classification model thanks to its simplicity. The aim of this Project is to spotlight the performance of Text categorization and sophistication prediction Na�ve Bayes in Text classification

    An Overview on Implementation Using Hybrid Na�ve Bayes Algorithm for Text Categorization

    Get PDF
    Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers .Text classification is a growing interest in the research of text mining. Correctly identifying the Text into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the present classifying approaches, Na�ve Bayes is probably smart at serving as a document classification model thanks to its simplicity. The aim of this Project is to spotlight the performance of Text categorization and sophistication prediction Na�ve Bayes in Text classification

    Klasifikasi Keluhan Masyarakat Terhadap Layanan Publik pada Harian Radar Tarakan

    Get PDF
    Website koran harian Radar Tarakan memiliki kolom dengan judul “Warga Menulis” di mana menu ini merupakan sarana bagi pembaca untuk menyampaikan keluhan ataupun aspirasi mereka.  Yang menjadi permasalahan, pesan pembaca atau opini yang ditampilkan bersifat to the point, hanya isi opini sesuai yang dikirim pembaca tanpa informasi tambahan kepada siapa opini tersebut ditujukan. Tujuan dari penelitian ini adalah melakukan klasifikasi data opini pada website koran harian Radar Tarakan khususnya opini yang berkaitan dengan fasilitas dan pelayanan publik. Klasifikasi merupakan suatu proses pengelompokkan data sesuai dengan kelas atau kategori yang telah ditentukan sebelumnya. Hipotesis yang dapat diambil adalah hasil klasifikasi diharapkan memiliki akurasi hingga 70%. Tahap awal dari proses klasifikasi yaitu preprocessing di mana pada tahap ini hal-hal yang dilakukan antara lain case folding, tokenizing, convert word, stopword removal (filtering) dan stemming. Algoritma yang digunakan dalam penelitian ini adalah Frequency Ratio Accumulation Method (FRAM). Pembuatan aplikasi menggunakan bahasa pemrograman PHP dan database MySQL. Hasil uji coba dari penelitian ini menunjukkan rata-rata akurasi yang diperoleh pada proses klasifikasi opini menggunakan algoritma FRAM adalah 60%. Besar kecilnya prosentase akurasi tergantung dari jumlah data latih yang digunakan. Semakin banyak jumlahnya dapat meningkatkan nilai akurasi akan tetapi hal ini akan berpengaruh terhadap efisiensi kinerja sistem

    Klasifikasi Keluhan Masyarakat Terhadap Layanan Publik pada Harian Radar Tarakan

    Get PDF
    Website koran harian Radar Tarakan memiliki kolom dengan judul “Warga Menulis” di mana menu ini merupakan sarana bagi pembaca untuk menyampaikan keluhan ataupun aspirasi mereka.  Yang menjadi permasalahan, pesan pembaca atau opini yang ditampilkan bersifat to the point, hanya isi opini sesuai yang dikirim pembaca tanpa informasi tambahan kepada siapa opini tersebut ditujukan. Tujuan dari penelitian ini adalah melakukan klasifikasi data opini pada website koran harian Radar Tarakan khususnya opini yang berkaitan dengan fasilitas dan pelayanan publik. Klasifikasi merupakan suatu proses pengelompokkan data sesuai dengan kelas atau kategori yang telah ditentukan sebelumnya. Hipotesis yang dapat diambil adalah hasil klasifikasi diharapkan memiliki akurasi hingga 70%. Tahap awal dari proses klasifikasi yaitu preprocessing di mana pada tahap ini hal-hal yang dilakukan antara lain case folding, tokenizing, convert word, stopword removal (filtering) dan stemming. Algoritma yang digunakan dalam penelitian ini adalah Frequency Ratio Accumulation Method (FRAM). Pembuatan aplikasi menggunakan bahasa pemrograman PHP dan database MySQL. Hasil uji coba dari penelitian ini menunjukkan rata-rata akurasi yang diperoleh pada proses klasifikasi opini menggunakan algoritma FRAM adalah 60%. Besar kecilnya prosentase akurasi tergantung dari jumlah data latih yang digunakan. Semakin banyak jumlahnya dapat meningkatkan nilai akurasi akan tetapi hal ini akan berpengaruh terhadap efisiensi kinerja sistem

    Holy Tweets: Exploring the Sharing of the Quran on Twitter

    Get PDF
    While social media offer users a platform for self-expression, identity exploration, and community management, among other functions, they also offer space for religious practice and expression. In this paper, we explore social media spaces as they subtend new forms of religious experiences and rituals. We present a mixed-method study to understand the practice of sharing Quran verses on Arabic Twitter in their cultural context by combining a quantitative analysis of the most shared Quran verses, the topics covered by these verses, and the modalities of sharing, with a qualitative study of users' goals. This analysis of a set of 2.6 million tweets containing Quran verses demonstrates that online religious expression in the form of sharing Quran verses both extends offline religious life and supports new forms of religious expression including goals such as doing good deeds, giving charity, holding memorials, and showing solidarity. By analysing the responses on a survey, we found that our Arab Muslim respondents conceptualize social media platforms as everlasting, at least beyond their lifetimes, where they consider them to be effective for certain religious practices, such as reciting Quran, supplication (dua), and ceaseless charity. Our quantitative analysis of the most shared verses of the Quran underlines this commitment to religious expression as an act of worship, highlighting topics such as the hereafter, God's mercy, and sharia law. We note that verses on topics such as jihad are shared much less often, contradicting some media representation of Muslim social media use and practice.Comment: Paper accepted to The 23rd ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 202

    Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

    Get PDF
    abstract: Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Geospatial Analysis and Modeling of Textual Descriptions of Pre-modern Geography

    Get PDF
    Textual descriptions of pre-modern geography offer a different view of classical geography. The descriptions have been produced when none of the modern geographical concepts and tools were available. In this dissertation, we study pre-modern geography by primarily finding the existing structures of the descriptions and different cases of geographical data. We first explain four major geographical cases in pre-modern Arabic sources: gazetteer, administrative hierarchies, routes, and toponyms associated with people. Focusing on hierarchical divisions and routes, we offer approaches for manual annotation of administrative hierarchies and route sections as well as a semi-automated toponyms annotation. The latter starts with a fuzzy search of toponyms from an authority list and applies two different extrapolation models to infer true or false values, based on the context, for disambiguating the automatically annotated toponyms. Having the annotated data, we introduce mathematical models to shape and visualize regions based on the description of administrative hierarchies. Moreover, we offer models for comparing hierarchical divisions and route networks from different sources. We also suggest approaches to approximate geographical coordinates for places that do not have geographical coordinates - we call them unknown places - which is a major issue in visualization of pre-modern places on map. The final chapter of the dissertation introduces the new version of al-Ṯurayyā, a gazetteer and a spatial model of the classical Islamic world using georeferenced data of a pre-modern atlas with more than 2, 000 toponyms and routes. It offers search, path finding, and flood network functionalities as well as visualizations of regions using one of the models that we describe for regions. However the gazetteer is designed using the classical Islamic world data, the spatial model and features can be used for similarly prepared datasets.:1 Introduction 1 2 Related Work 8 2.1 GIS 8 2.2 NLP, Georeferencing, Geoparsing, Annotation 10 2.3 Gazetteer 15 2.4 Modeling 17 3 Classical Geographical Cases 20 3.1 Gazetteer 21 3.2 Routes and Travelogues 22 3.3 Administrative Hierarchy 24 3.4 Geographical Aspects of Biographical Data 25 4 Annotation and Extraction 27 4.1 Annotation 29 4.1.1 Manual Annotation of Geographical Texts 29 4.1.1.1 Administrative Hierarchy 30 4.1.1.2 Routes and Travelogues 32 4.1.2 Semi-Automatic Toponym Annotation 34 4.1.2.1 The Annotation Process 35 4.1.2.2 Extrapolation Models 37 4.1.2.2.1 Frequency of Toponymic N-grams 37 4.1.2.2.2 Co-occurrence Frequencies 38 4.1.2.2.3 A Supervised ML Approach 40 4.1.2.3 Summary 45 4.2 Data Extraction and Structures 45 4.2.1 Administrative Hierarchy 45 4.2.2 Routes and Distances 49 5 Modeling Geographical Data 51 5.1 Mathematical Models for Administrative Hierarchies 52 5.1.1 Sample Data 53 5.1.2 Quadtree 56 5.1.3 Voronoi Diagram 58 5.1.4 Voronoi Clippings 62 5.1.4.1 Convex Hull 62 5.1.4.2 Concave Hull 63 5.1.5 Convex Hulls 65 5.1.6 Concave Hulls 67 5.1.7 Route Network 69 5.1.8 Summary of Models for Administrative Hierarchy 69 5.2 Comparison Models 71 5.2.1 Hierarchical Data 71 5.2.1.1 Test Data 73 5.2.2 Route Networks 76 5.2.2.1 Post-processing 81 5.2.2.2 Applications 82 5.3 Unknown Places 84 6 Al-Ṯurayyā 89 6.1 Introducing al-Ṯurayyā 90 6.2 Gazetteer 90 6.3 Spatial Model 91 6.3.1 Provinces and Administrative Divisions 93 6.3.2 Pathfinding and Itineraries 93 6.3.3 Flood Network 96 6.3.4 Path Alignment Tool 97 6.3.5 Data Structure 99 6.3.5.1 Places 100 6.3.5.2 Routes and Distances 100 7 Conclusions and Further Work 10

    Unsupervised learning of Arabic non-concatenative morphology

    Get PDF
    Unsupervised approaches to learning the morphology of a language play an important role in computer processing of language from a practical and theoretical perspective, due their minimal reliance on manually produced linguistic resources and human annotation. Such approaches have been widely researched for the problem of concatenative affixation, but less attention has been paid to the intercalated (non-concatenative) morphology exhibited by Arabic and other Semitic languages. The aim of this research is to learn the root and pattern morphology of Arabic, with accuracy comparable to manually built morphological analysis systems. The approach is kept free from human supervision or manual parameter settings, assuming only that roots and patterns intertwine to form a word. Promising results were obtained by applying a technique adapted from previous work in concatenative morphology learning, which uses machine learning to determine relatedness between words. The output, with probabilistic relatedness values between words, was then used to rank all possible roots and patterns to form a lexicon. Analysis using trilateral roots resulted in correct root identification accuracy of approximately 86% for inflected words. Although the machine learning-based approach is effective, it is conceptually complex. So an alternative, simpler and computationally efficient approach was then devised to obtain morpheme scores based on comparative counts of roots and patterns. In this approach, root and pattern scores are defined in terms of each other in a mutually recursive relationship, converging to an optimized morpheme ranking. This technique gives slightly better accuracy while being conceptually simpler and more efficient. The approach, after further enhancements, was evaluated on a version of the Quranic Arabic Corpus, attaining a final accuracy of approximately 93%. A comparative evaluation shows this to be superior to two existing, well used manually built Arabic stemmers, thus demonstrating the practical feasibility of unsupervised learning of non-concatenative morphology

    Sentiment analysis in arabic: opinion polarity detection

    Get PDF
    Con Mención de Doctorado Internacional[ES]El análisis de sentimientos está obteniendo una gran importancia debido al aumento de popularidad de la web 2.0. Esta memoria se centra en el estudio de diferentes aspectos del análisis de sentimientos. El primer objetivo es analizar las opiniones que provienen del árabe y predecir su polaridad. Para alcanzar este objetivo se han generado dos corpora: OCA y EVOCA. OCA es un corpus de opinión de películas en árabe, y EVOCA es un corpus paralelo a OCA que incluye la traducción al inglés de las opiniones. Otro objetivo consiste en el análisis de sentimientos adaptado a diferentes dominios. Para ello, se ha generado el corpus SINAI-SA y se han aplicado distintas técnicas de aprendizaje automático. Finalmente, en esta memoria se realiza un estudio sobre revisiones neutrales. Para llevar a cabo este objetivo, se han investigado dos enfoque principales, uno basado en orientación semántica y el otro basado en algoritmos de aprendizaje automático como SVM o NB.[EN]Sentiment analysis is becoming increasingly important due the growing popularity of Web 2.0. This study focuses mainly on how to analyze opinions in Arabic language and predict their polarity. To achieve that, two corpora have been generated (OCA and EVOCA), OCA is an opinion corpus for Arabic movie reviews, while EVOCA is the translated version of OCA to English. Another corpus was created (SINAI-SA corpus) used with other corpora in order to predict sentiments in different domains. SINAI corpus was also used to study how to sort comments behave as textual information for the prediction of customer rates. Another question that was solved in this study is “How to treat with the neutral reviews”. Two main approaches have been investigated in this research, one based on semantic orientation and the other one based on machine learning algorithms like SVM or NBTesis Univ. Jaén. Departamento de Informática, leída el 7 de octubre de 201
    corecore