Search CORE

186,284 research outputs found

Text Mining Infrastructure in R

Author: David Meyer
Ingo Feinerer
Kurt Hornik
Publication venue
Publication date
Field of study

During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

Baybayin Character Instance Detection

Author: Amoguis Adriel Isaiah V.
Cordel II Macario O.
Flores IV Benito Miguel D.
Madrid Gian Joseph B.
Publication venue
Publication date: 19/04/2023
Field of study

The Philippine Government recently passed the "National Writing System Act," which promotes using Baybayin in Philippine texts. In support of this effort to promote the use of Baybayin, we present a computer vision system which can aid individuals who cannot easily read Baybayin script. In this paper, we survey the existing methods of identifying Baybayin scripts using computer vision and machine learning techniques and discuss their capabilities and limitations. Further, we propose a Baybayin Optical Character Instance Segmentation and Classification model using state-of-the-art Convolutional Neural Networks (CNNs) that detect Baybayin character instances in an image then outputs the Latin alphabet counterparts of each character instance in the image. Most existing systems are limited to character-level image classification and often misclassify or not natively support characters with diacritics. In addition, these existing models often have specific input requirements that limit it to classifying Baybayin text in a controlled setting, such as limitations in clarity and contrast, among others. To our knowledge, our proposed method is the first end-to-end character instance detection model for Baybayin, achieving a mAP50 score of 93.30%, mAP50-95 score of 80.50%, and F1-Score of 84.84%

arXiv.org e-Print Archive

Theory-enhanced automation of the digital publics' relationship assessments

Author: Lee Hyelim
Publication venue
Publication date: 12/07/2023
Field of study

The current dissertation aims to develop a Machine Learning (ML) method for automating the assessment of digital public relations by incorporating the Organization-Public Relationship Assessment (OPRA) developed from the public relations theory. The study targets customers/consumers and employees. For methods, Natural Language Processing (NLP) techniques, specifically text-embedding and classification, are used to analyze the crawled data and three survey data. The results demonstrate that TF-IDF, BERT embedding, and the SVM classification model perform best. The case study outcomes using TripAdvisor and Glassdoor review data validate the previous results. This dissertation project can serve as a pioneering effort to enhance the theoretical foundation of most current data analytics tools in public relations

SHAREOK repository

A Survey on Various Sentiment Analysis Approaches and Its Challenges

Author: Mohbey Pratibha
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/07/2018
Field of study

Sentiment analysis is a broad research area in academic as well as business field. The term sentiment refers to the feelings or opinion of the person towards some particular domain. Hence it is also known as opinion mining. It leads to the subjective impressions towards the domain, not facts. It can be expressed in terms of polarity, reviews or previously by thumbs up and down to denote positive and negative sentiments respectively. Sentiments can be analyzed using NLP, statistics or machine learning techniques. Sentiment analysis may ask questions regarding “customer satisfaction and dissatisfaction, “public opinion towards new iPhone series launched” etc. In real world, public or consumer opinions about some product or brand are very important for its sell. Hence sentiment analysis is a very important research area for real life applications i.e. decision making. However various methods were introduced for performing sentiment analysis, still that are not efficient in extracting the sentiment features from the given content of text. Naïve Bayes, Support Vector Machine, Maximum Entropy are the machine learning algorithms used for sentiment analysis which has only a limited sentiment classification category ranging between positive and negative. Especially supervised and unsupervised algorithms have only limited accuracy in handling polarity shift and binary classification problem. Even though the advancement in sentiment Analysis technique there are various issues still to be noticed and make the analysis not accurately and efficiently. So this paper presents the survey on various sentiment Analysis methodologies and approaches in detailed. This will be helpful to earn clear knowledge about sentiment analysis methodologies. This Paper describes different applications of sentiment analysis, techniques and challenges of sentiment analysis. Keywords: Sentiment Analysis, Decision Making, Opinion Mining, Machine Learning, NL

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway