Search CORE

6 research outputs found

A Machine Learning Ensemble Model for the Detection of Cyberbullying

Author: Alqahtani Abulkarim Faraj
Ilyas Mohammad
Publication venue
Publication date: 19/02/2024
Field of study

The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms. Motivated by this necessity, we present this paper to contribute to developing an automated system for detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to previous experiments on the same dataset. We employed the stacking ensemble machine learning method, utilizing four various feature extraction techniques to optimize performance within the stacking ensemble learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we achieved superior results compared to traditional machine learning classifier models. The stacking classifier achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive

arXiv.org e-Print Archive

Measuring interdisciplinary interactions using citation analysis and semantic analysis

Author: Chen X
Huang L
Ni X
Zhang Y
Publication venue
Publication date: 29/06/2022
Field of study

Interdisciplinary interactions and integrations have become a major feature of the current development of science and technology. How to measure the strength of interdisciplinary interactions between two disciplines is a crucial issue. In our study, we propose a novel measurement framework based on both citation analytics and semantic analytics, which integrates three indicators - direct citation, bibliographic coupling and research content. Especially, LDA model is incorporated with a word embedding model to create a semantic solution that effectively constructing discipline-keyword vectors based on bibliometric data. At last, entropy method is applied with these three indicators to assess the interdisciplinary interactions strength. The interactions between information science & library science and other six subjects are analyzed as the case study to demonstrate the reliability of the methodology, with subsequent empirical validations

OPUS - University of Technology Sydney

Tourism Review Sentiment Classification Using a Bidirectional Recurrent Neural Network with an Attention Mechanism and Topic-Enriched Word Vectors

Author: Hu Jianjun
Hu Jie
Li Qin
Li Shaobo
Zhang Sen
Publication venue: Scholar Commons
Publication date: 17/09/2018
Field of study

Sentiment analysis of online tourist reviews is playing an increasingly important role in tourism. Accurately capturing the attitudes of tourists regarding different aspects of the scenic sites or the overall polarity of their online reviews is key to tourism analysis and application. However, the performances of current document sentiment analysis methods are not satisfactory as they either neglect the topics of the document or do not consider that not all words contribute equally to the meaning of the text. In this work, we propose a bidirectional gated recurrent unit neural network model (BiGRULA) for sentiment analysis by combining a topic model (lda2vec) and an attention mechanism. Lda2vec is used to discover all the main topics of review corpus, which are then used to enrich the word vector representation of words with context. The attention mechanism is used to learn to attribute different weights of the words to the overall meaning of the text. Experiments over 20 NewsGroup and IMDB datasets demonstrate the effectiveness of our model. Furthermore, we applied our model to hotel review data analysis, which allows us to get more coherent topics from these reviews and achieve good performance in sentiment classification

Scholar Commons - Institutional Repository of the University of South Carolina

A Review of Text Corpus-Based Tourism Big Data Mining

Author: Hu Jianhun
Hu Jie
Li Qin
Li Shaobo
Zhang Sen
Publication venue: Scholar Commons
Publication date: 12/08/2019
Field of study

With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years

Scholar Commons - Institutional Repository of the University of South Carolina

A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

Author: Ma Long
Publication venue: ScholarWorks @ Georgia State University
Publication date: 08/08/2017
Field of study

Text classification, the task of metadata to documents, needs a person to take significant time and effort. Since online-generated contents are explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Recently, various state-or-art text mining methods have been applied to classification process based on the keywords extraction. However, when using these keywords as features in the classification task, it is common that the number of feature dimensions is large. In addition, how to select keywords from documents as features in the classification task is a big challenge. Especially, when using traditional machine learning algorithms in big data, the computation time is very long. On the other hand, about 80% of real data is unstructured and non-labeled in the real world. The conventional supervised feature selection methods cannot be directly used in selecting entities from massive data. Usually, statistical strategies are utilized to extract features from unlabeled data for classification tasks according to their importance scores. We propose a novel method to extract key features effectively before feeding them into the classification assignment. Another challenge in the text classification is the multi-label problem, the assignment of multiple non-exclusive labels to documents. This problem makes text classification more complicated compared with a single label classification. For the above issues, we develop a framework for extracting data and reducing data dimension to solve the multi-label problem on labeled and unlabeled datasets. In order to reduce data dimension, we develop a hybrid feature selection method that extracts meaningful features according to the importance of each feature. The Word2Vec is applied to represent each document by a feature vector for the document categorization for the big dataset. The unsupervised approach is used to extract features from real online-generated data for text classification. Our unsupervised feature selection method is applied to extract depression symptoms from social media such as Twitter. In the future, these depression symptoms will be used for depression self-screening and diagnosis

ScholarWorks @ Georgia State University

A Review of Text Corpus-Based Tourism Big Data Mining

Author: Hu Jianjun
Hu Jie
Li Shaobo
Lin Qin
Zhang Sen
Publication venue: Scholar Commons
Publication date: 12/08/2019
Field of study

Scholar Commons - Institutional Repository of the University of South Carolina