4,093 research outputs found
Applications of Artificial Intelligence and Graphy Theory to Cyberbullying
Cyberbullying is an ongoing and devastating issue in today\u27s online social media. Abusive users engage in cyber-harassment by utilizing social media to send posts, private messages, tweets, or pictures to innocent social media users. Detecting and preventing cases of cyberbullying is crucial. In this work, I analyze multiple machine learning, deep learning, and graph analysis algorithms and explore their applicability and performance in pursuit of a robust system for detecting cyberbullying. First, I evaluate the performance of the machine learning algorithms Support Vector Machine, Naïve Bayes, Random Forest, Decision Tree, and Logistic Regression. This yielded positive results and obtained upwards of 86% accuracy. Further enhancements were achieved using Evolutionary Algorithms, improving the overall results of the machine learning models. Deep Learning algorithms was the next experiment in which efficiency was monitored in terms of training time and performance. Next, analysis of Recurrent Neural Networks and Hierarchical Attention Networks was conducted, achieving 82% accuracy. The final research project used graph analysis to explore the relation among different social media users, and analyze the connectivity and communities of users who were discovered to have posted offensive messages
A Combination of Lexicon-based and Distributional Representations for Classification of Indonesian Vaccine Acceptance Rates
When the COVID-19 pandemic hit, the use of vaccines was advertised as the end of the pandemic by the entire world. However, the chances of vaccination depended on the sentiments of society and individuals about the vaccine. People's acceptance of vaccines can change depending on conditions and events. Social media platforms such as Twitter can be used as a source of information to find out the conditions and attitudes of the community toward the program. By implementing a machine learning technique on the COVID-19 vaccine dataset, we hope to impact the classification result with text. This study suggests three distinct machine learning models for classifying texts of the COVID-19 vaccination, namely a model based on the first lexicon using the feature extraction method; second, using the word insertion technique to utilize distribution representation; and third, a combination model of distribution representation and feature extraction based on the lexicon. From the evaluation that has been carried out, we found that a combination of lexicon-based and distributional representation methods succeeded in giving the best results for classifying the level of acceptance of the COVID-19 vaccine in Indonesia with an accuracy score of 71.44% and an F1-score of 71.43%
Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques
This study is motivated by the magnitude of the problem of Louisiana high
school dropout and its negative impacts on individual and public well-being.
Our goal is to predict students who are at risk of high school dropout, by
examining Louisiana administrative dataset. Due to the imbalanced nature of the
dataset, imbalanced learning techniques including resampling, case weighting,
and cost-sensitive learning have been applied to enhance the prediction
performance on the rare class. Performance metrics used in this study are
F-measure, recall and precision of the rare class. We compare the performance
of several machine learning algorithms such as neural networks, decision trees
and bagging trees in combination with the imbalanced learning approaches using
an administrative dataset of size of 366k+ from Louisiana Department of
Education. Experiments show that application of imbalanced learning methods
produces good results on recall but decreases precision, whereas base
classifiers without regard of imbalanced data handling gives better precision
but poor recall. Overall application of imbalanced learning techniques is
beneficial, yet more studies are desired to improve precision.Comment: 6 page
Recognition physical activities with optimal number of wearable sensors using data mining algorithms and deep belief network
© 2017 IEEE. Daily physical activities monitoring is benefiting the health care field in several ways, in particular with the development of the wearable sensors. This paper adopts effective ways to calculate the optimal number of the necessary sensors and to build a reliable and a high accuracy monitoring system. Three data mining algorithms, namely Decision Tree, Random Forest and PART Algorithm, have been applied for the sensors selection process. Furthermore, the deep belief network (DBN) has been investigated to recognise 33 physical activities effectively. The results indicated that the proposed method is reliable with an overall accuracy of 96.52% and the number of sensors is minimised from nine to six sensors
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
Ensemble missing data techniques for software effort prediction
Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble
- …