Search CORE

4,686 research outputs found

Improved Fault Classification and Localization in Power Transmission Networks Using VAE-Generated Synthetic Data and Machine Learning Algorithms

Author: Asad Bilal
Huynh Khang
Kallaste Ants
Khan Muhammad Amir
Pomarnacki Raimondas
Vaimann Toomas
Publication venue: MDPI
Publication date: 01/01/2023
Field of study

The reliable operation of power transmission networks depends on the timely detection and localization of faults. Fault classification and localization in electricity transmission networks can be challenging because of the complicated and dynamic nature of the system. In recent years, a variety of machine learning (ML) and deep learning algorithms (DL) have found applications in the enhancement of fault identification and classification within power transmission networks. Yet, the efficacy of these ML architectures is profoundly dependent upon the abundance and quality of the training data. This intellectual explanation introduces an innovative strategy for the classification and pinpointing of faults within power transmission networks. This is achieved through the utilization of variational autoencoders (VAEs) to generate synthetic data, which in turn is harnessed in conjunction with ML algorithms. This approach encompasses the augmentation of the available dataset by infusing it with synthetically generated instances, contributing to a more robust and proficient fault recognition and categorization system. Specifically, we train the VAE on a set of real-world power transmission data and generate synthetic fault data that capture the statistical properties of real-world data. To overcome the difficulty of fault diagnosis methodology in three-phase high voltage transmission networks, a categorical boosting (Cat-Boost) algorithm is proposed in this work. The other standard machine learning algorithms recommended for this study, including Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), and K-Nearest Neighbors (KNN), utilizing the customized version of forward feature selection (FFS), were trained using synthetic data generated by a VAE. The results indicate exceptional performance, surpassing current state-of-the-art techniques, in the tasks of fault classification and localization. Notably, our approach achieves a remarkable 99% accuracy in fault classification and an extremely low mean absolute error (MAE) of 0.2 in fault localization. These outcomes represent a notable advancement compared to the most effective existing baseline methods.publishedVersio

Agder University Research Archive

A Comparative Study of Sentiment Analysis Methods for Detecting Fake Reviews in E-Commerce

Author: Boongasame Laor
Puttarattanamanee Maneerat
Thammarak Karanrat
Publication venue: Ital Publication
Publication date: 01/06/2023
Field of study

The popularity of the e-commerce system has increased, especially under the COVID scenario. Consumer product reviews from the past have had a significant impact on influencing consumers' purchasing decisions. Fake reviews—those written by humans and computers that engage in dishonest behavior—are consequently generated to increase product sales. The fake reviews hurt consumers and are dishonest. The goal of this research is to examine and evaluate the performance of various methods for identifying fake reviews. The well-known and widely-used Amazon Review Data (2018) dataset was used for this research. The first 10 product categories on Amazon.com with favorable feedback will be provided in the data section. After that, perform fundamental data preparation procedures such as special character trimming, bag of words, TF-IDF, etc. The models are trained to create a dataset for detecting fake reviews. This research compares the performance of four different models: GPT-2, NBSVM, BiLSTM, and RoBERTa. The hyperparameters of the models are also tuned to find the optimal values. The research concludes that the RoBERTa model performs the best overall, with an accuracy of 97%. GPT-2 has an overall accuracy of 82%, NBSVM has an overall accuracy of 95%, and BiLSTM has an overall accuracy of 92%. The research also calculates the Area Under the Curve (AUC) for each model and finds that RoBERTa has an AUC of 0.9976, NBSVM has an AUC of 0.9888, BiLSTM has an AUC of 0.9753, and GPT-2 has an AUC of 0.9226. It can be observed that the RoBERTa model has the highest AUC value, which is close to 1. Therefore, it can be concluded that this model provides the most accurate prediction for detecting fake reviews, which is the main focus of this research. Doi: 10.28991/HIJ-2023-04-02-08 Full Text: PD

HighTech and Innovation Journal

An Angle-based Stochastic Gradient Descent Method for Machine Learning: Principle and Application

Author: Song Chongya
Publication venue: FIU Digital Commons
Publication date: 23/02/2021
Field of study

In deep learning, optimization algorithms are employed to expedite the resolution to accurate models through the calibrations of the current gradient and the associated learning rate. A major shortcoming of these existing methods is the manner in which the calibration terms are computed, only utilizing the previous gradients during their computations. Because the gradient is a time-sensitive variable computed at a specific moment in time, it is possible that older gradients can introduce significant deviation into the calibration terms. Although most algorithms alleviate this situation by combining the exponential moving average of the previous gradients, we found that this method is not very effective in practice, as it still causes undesirable accumulated impact on the gradients. Another shortcoming is that these existing algorithms lack the ability to incorporate the cost variance during the computation of the new gradient. Therefore, employing the same strategy in reducing the cost under all circumstances is inherently inaccurate. In addition, we identified that some advanced algorithms employ measurements that are confiscatory, resulting in erratic new gradients in practice. With respect to evaluation, we determined that a high error rate is more likely to result from the weak ability of translating the reduction in the cost to the error rate, a circumstance that has not been addressed in the research to improve the accuracies of new gradients. In this dissertation, we propose an algorithm that employs the angle between consecutive gradients as a new metric to resolve all the aforementioned problems. The new and nine existing algorithms are implemented into a neural network and a logistic regression classifier for evaluation. The results show that the new method can improve the ability of cost/error rate reduction by 9.40%/11.11% on MNIST dataset and 41.63%/29.58% on NSL-KDD dataset. Also, the aforementioned translating ability of the new method outperforms other optimizers by 33.06%. One of the main contributions of our work is verifying the feasibility and effectiveness of using the angle between consecutive gradients as a reliable metric in generating accurate new gradients. Angle-based measurements could be incorporated into existing algorithms to enhance the cost reduction and translating abilities

DigitalCommons@Florida International University

Radiation Oncology in the Era of Big Data and Machine Learning for Precision Medicine

Author: Osman Alexander F.I.
Publication venue: 'IntechOpen'
Publication date: 20/03/2019
Field of study

IntechOpen

Crossref

The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data

Author: MacDonald M. Ethan
Messier Geoffrey
Seitz Dallas
Shahidi Faezehsadat
Publication venue
Publication date: 20/07/2023
Field of study

Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.Comment: to be published in Frontiers in Digital Health, Health Informatic

arXiv.org e-Print Archive

OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.

Author: A Saxena
B Li
C Plagemann
F Liu
H Kim
K Karsch
K Karsch
K Matzen
M Ruder
N Silberman
N Srivastava
O Özyeşil
P Hedman
R Garg
R Hartley
S Li
T Rhee
Y Furukawa
Y Zhang
Publication venue
Publication date: 12/09/2018
Field of study

Recent work on depth estimation up to now has only focused on projective images ignoring 360o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360o images. We show promising results in our synthesized data as well as in unseen realistic images

Crossref

ZENODO