186,284 research outputs found

    Text Mining Infrastructure in R

    Get PDF
    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

    Baybayin Character Instance Detection

    Full text link
    The Philippine Government recently passed the "National Writing System Act," which promotes using Baybayin in Philippine texts. In support of this effort to promote the use of Baybayin, we present a computer vision system which can aid individuals who cannot easily read Baybayin script. In this paper, we survey the existing methods of identifying Baybayin scripts using computer vision and machine learning techniques and discuss their capabilities and limitations. Further, we propose a Baybayin Optical Character Instance Segmentation and Classification model using state-of-the-art Convolutional Neural Networks (CNNs) that detect Baybayin character instances in an image then outputs the Latin alphabet counterparts of each character instance in the image. Most existing systems are limited to character-level image classification and often misclassify or not natively support characters with diacritics. In addition, these existing models often have specific input requirements that limit it to classifying Baybayin text in a controlled setting, such as limitations in clarity and contrast, among others. To our knowledge, our proposed method is the first end-to-end character instance detection model for Baybayin, achieving a mAP50 score of 93.30%, mAP50-95 score of 80.50%, and F1-Score of 84.84%

    Theory-enhanced automation of the digital publics' relationship assessments

    Get PDF
    The current dissertation aims to develop a Machine Learning (ML) method for automating the assessment of digital public relations by incorporating the Organization-Public Relationship Assessment (OPRA) developed from the public relations theory. The study targets customers/consumers and employees. For methods, Natural Language Processing (NLP) techniques, specifically text-embedding and classification, are used to analyze the crawled data and three survey data. The results demonstrate that TF-IDF, BERT embedding, and the SVM classification model perform best. The case study outcomes using TripAdvisor and Glassdoor review data validate the previous results. This dissertation project can serve as a pioneering effort to enhance the theoretical foundation of most current data analytics tools in public relations

    A Survey on Various Sentiment Analysis Approaches and Its Challenges

    Get PDF
    Sentiment analysis is a broad research area in academic as well as business field. The term sentiment refers to the feelings or opinion of the person towards some particular domain. Hence it is also known as opinion mining. It leads to the subjective impressions towards the domain, not facts. It can be expressed in terms of polarity, reviews or previously by thumbs up and down to denote positive and negative sentiments respectively. Sentiments can be analyzed using NLP, statistics or machine learning techniques. Sentiment analysis may ask questions regarding “customer satisfaction and dissatisfaction, “public opinion towards new iPhone series launched” etc. In real world, public or consumer opinions about some product or brand are very important for its sell. Hence sentiment analysis is a very important research area for real life applications i.e. decision making. However various methods were introduced for performing sentiment analysis, still that are not efficient in extracting the sentiment features from the given content of text. Naïve Bayes, Support Vector Machine, Maximum Entropy are the machine learning algorithms used for sentiment analysis which has only a limited sentiment classification category ranging between positive and negative. Especially supervised and unsupervised algorithms have only limited accuracy in handling polarity shift and binary classification problem. Even though the advancement in sentiment Analysis technique there are various issues still to be noticed and make the analysis not accurately and efficiently. So this paper presents the survey on various sentiment Analysis methodologies and approaches in detailed. This will be helpful to earn clear knowledge about sentiment analysis methodologies. This Paper describes different applications of sentiment analysis, techniques and challenges of sentiment analysis. Keywords: Sentiment Analysis, Decision Making, Opinion Mining, Machine Learning, NL

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
    corecore