191 research outputs found

    Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?

    Full text link
    This paper presents a pipeline to detect and explain anomalous reviews in online platforms. The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition. The classifications are accompanied by a normality score and an explanation that justifies the decision made. The pipeline's ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database. Additionally, a study comparing three explainability techniques involving 241 participants was conducted to assess the explainability module. The study aimed to measure the impact of explanations on the respondents' ability to reproduce the classification model and their perceived usefulness. This work can be useful to automate tasks in review online platforms, such as those for electronic commerce, and offers inspiration for addressing similar problems in the field of anomaly detection in textual data. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.Comment: The article is under review in the journal Engineering Applications of Artificial Intelligenc

    Machine Learning in Resource-constrained Devices: Algorithms, Strategies, and Applications

    Get PDF
    The ever-increasing growth of technologies is changing people's everyday life. As a major consequence: 1) the amount of available data is growing and 2) several applications rely on battery supplied devices that are required to process data in real time. In this scenario the need for ad-hoc strategies for the development of low-power and low-latency intelligent systems capable of learning inductive rules from data using a modest mount of computational resources is becoming vital. At the same time, one needs to develop specic methodologies to manage complex patterns such as text and images. This Thesis presents different approaches and techniques for the development of fast learning models explicitly designed to be hosted on embedded systems. The proposed methods proved able to achieve state-of-the-art performances in term of the trade-off between generalization capabilities and area requirements when implemented in low-cost digital devices. In addition, advanced strategies for ecient sentiment analysis in text and images are proposed

    Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment

    Get PDF
    The purpose of this research is to investigate the impact of social media sentiments on predicting the Bitcoin price using machine learning models, with a focus on integrating on-chain data and employing a Multi Modal Fusion Model. For conducting the experiments, the crypto market data, on-chain data, and corresponding social media data (Twitter) has been collected from 2014 to 2022 containing over 2000 samples. We trained various models over historical data including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting and a Multi Modal Fusion. Next, we added Twitter sentiment data to the models, using the Twitter-roBERTa and VADAR models to analyse the sentiments expressed in social media about Bitcoin. We then compared the performance of these models with and without the Twitter sentiment data and found that the inclusion of sentiment feature resulted in consistently better performance, with Twitter-RoBERTa-based sentiment giving an average F1 scores of 0.79. The best performing model was an optimised Multi Modal Fusion classifier using Twitter-RoBERTa based sentiment, producing an F1 score of 0.85. This study represents a significant contribution to the field of financial forecasting by demonstrating the potential of social media sentiment analysis, on-chain data integration, and the application of a Multi Modal Fusion model to improve the accuracy and robustness of machine learning models for predicting market trends, providing a valuable tool for investors, brokers, and traders seeking to make informed decisions

    SentiMLBench: Benchmark Evaluation of Machine Learning Algorithms for Sentiment Analysis

    Get PDF
    Sentiment Analysis has been a topic of interest for researchers due to its increasing usage by Industry. To measure end-user sentiment., there is no clear verdict on which algorithms are better in real-time scenarios. A rigorous benchmark evaluation of various algorithms running across multiple datasets and different hardware architectures is required that can guide future researchers on potential advantages and limitations. In this paper, proposed SentiMLBench is a critical evaluation of key ML algorithms as standalone classifiers, a novel cascade feature selection (CFS) based ensemble technique in multiple benchmark environments each using a different twitter dataset and processing hardware. The best trained ensemble model with CFS enhancement surpasses current state-of-the-art models, according to experimental results. In a study, though ensemble model provides good accuracy, it falls short of neural networks accuracy by 2%. ML algorithms accuracy is poor as standalone classifiers across all three studies. The supremacy of neural networks is further stamped in study three where it outperforms other algorithms in accuracy by over 10%. Graphical processing unit provide speed and higher computational power at a fraction of a cost compared to a normal processor thereby providing critical architectural insights into developing a robust expert system for sentiment analysis

    Study of machine learning algorithms for potential stock trading strategy frameworks

    Get PDF
    Purpose: This paper discusses major stock market trends and provides information on stock market forecasting. Stock market forecasting is essentially an attempt to forecast the future value of the stock market. Doing this manually can be a strenuous task, and thus we need some software and algorithms to make our task easier. This paper also lists a few of those algorithms, formulas, and calculations associated with them. These algorithms and models primarily revolve around the concept of Machine Learning (ML) and Deep Learning. Research Methodology: This study is based on descriptive, quantitative, and cross-sectional research design. We used a multivariate algorithm model and indicators to examine stocks for investing or trading and their efficiency. It concludes with the recommendations for enhancing trading strategies using machine learning algorithms. Results: This study suggests that after comparing and combining the various algorithms using experimental analysis, the random forest algorithm is the most suitable algorithm for forecasting a stock's market prices based on various data points from historical data. Limitations: The applicability of the study was only hampered by unforeseeable tragic events such as economic crisis, market collapse, etc Contribution: Successful stock prediction will be a substantial benefit for stock market institutions and provide real-world answers to the challenges that stock investors face. As a result, gaining significant knowledge on the subject is quite beneficial for us

    An Application of pre-Trained CNN for Image Classification

    Get PDF
    Image Classification is a branch of computer vision where images are classified into categories. This is a very important topic in today’s context as large databases of images are becoming very common. Images can be classified as supervised or unsupervised techniques. This paper investigates supervised classification and evaluates performances of two classifiers as well as two feature extraction techniques. The classifiers used are Linear Support Vector Machine (SVM) and Quadratic SVM. The classifiers are trained and tested with features extracted using Bag of Words and pre-trained Convolution Neural Network (CNN), namely AlexNet. It has been observed that the classifiers are able to classify images with very high accuracy when trained with features from CNN. The image categories consisted of Binocular, Motorbikes, Watches, Airplanes, and Faces, which are taken from Caltech 265 image archive

    Analyzing digital societal interactions and sentiment classification in Twitter (X) during critical events in Chile

    Get PDF
    This study explores the influence of social media content on societal attitudes and actions during critical events, with a special focus on occurrences in Chile, such as the COVID-19 pandemic, the 2019 protests, and the wildfires in 2017 and 2023. By leveraging a novel tweet dataset, this study introduces new metrics for assessing sentiment, inclusivity, engagement, and impact, thereby providing a comprehensive framework for analyzing social media dynamics. The methodology employed enhances sentiment classification through the use of a Deep Random Vector Functional Link (D-RVFL) neural network, which demonstrates superior performance over traditional models such as Support Vector Machines (SVM), naive Bayes, and back propagation (BP) neural networks, achieving an overall average accuracy of 78.30% (0.17). This advancement is attributed to deep learning techniques with direct input–output connections that facilitate faster and more precise sentiment classification. This analysis differentiates the roles of influencers, press radio, and television handlers during crises, revealing how various social media actors affect information dissemination and audience engagement. By dissecting online behaviors and classifying sentiments using the RVFL network, this study sheds light on the effects of the digital landscape on societal attitudes and actions during emergencies. These findings underscore the importance of understanding the nuances of social media engagement to develop more effective crisis communication strategies

    The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review

    Get PDF
    This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria. The literature review methodology is carried out in three major phases: review planning, implementation, and report preparation, in nine steps from defining systematic review requirements to presentation of results. Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. Furthermore, the use of the technical indicator data set is 74.07%, the rest are combinations of datasets. To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. The best method in terms of accuracy and also small error rate such as SVM, DNN, CNN, RNN, LSTM, bagging ensembles such as RF, boosting ensembles such as XGBoost, ensemble majority vote and the meta-learner approach is ensemble Stacking. Several techniques are proposed to improve prediction accuracy by combining several methods, using boosting algorithms, adding feature selection and using parameter and hyper-parameter optimization

    MS-TR: A Morphologically Enriched Sentiment Treebank and Recursive Deep Models for Compositional Semantics in Turkish

    Get PDF
    Recursive Deep Models have been used as powerful models to learn compositional representations of text for many natural language processing tasks. However, they require structured input (i.e. sentiment treebank) to encode sentences based on their tree-based structure to enable them to learn latent semantics of words using recursive composition functions. In this paper, we present our contributions and efforts for the Turkish Sentiment Treebank construction. We introduce MS-TR, a Morphologically Enriched Sentiment Treebank, which was implemented for training Recursive Deep Models to address compositional sentiment analysis for Turkish, which is one of the well-known Morphologically Rich Language (MRL). We propose a semi-supervised automatic annotation, as a distantsupervision approach, using morphological features of words to infer the polarity of the inner nodes of MS-TR as positive and negative. The proposed annotation model has four different annotation levels: morph-level, stem-level, token-level, and review-level. Each annotation level’s contribution was tested using three different domain datasets, including product reviews, movie reviews, and the Turkish Natural Corpus essays. Comparative results were obtained with the Recursive Neural Tensor Networks (RNTN) model which is operated over MS-TR, and conventional machine learning methods. Experiments proved that RNTN outperformed the baseline methods and achieved much better accuracy results compared to the baseline methods, which cannot accurately capture the aggregated sentiment information

    Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare

    Full text link
    Nature-Inspired Computing or NIC for short is a relatively young field that tries to discover fresh methods of computing by researching how natural phenomena function to find solutions to complicated issues in many contexts. As a consequence of this, ground-breaking research has been conducted in a variety of domains, including synthetic immune functions, neural networks, the intelligence of swarm, as well as computing of evolutionary. In the domains of biology, physics, engineering, economics, and management, NIC techniques are used. In real-world classification, optimization, forecasting, and clustering, as well as engineering and science issues, meta-heuristics algorithms are successful, efficient, and resilient. There are two active NIC patterns: the gravitational search algorithm and the Krill herd algorithm. The study on using the Krill Herd Algorithm (KH) and the Gravitational Search Algorithm (GSA) in medicine and healthcare is given a worldwide and historical review in this publication. Comprehensive surveys have been conducted on some other nature-inspired algorithms, including KH and GSA. The various versions of the KH and GSA algorithms and their applications in healthcare are thoroughly reviewed in the present article. Nonetheless, no survey research on KH and GSA in the healthcare field has been undertaken. As a result, this work conducts a thorough review of KH and GSA to assist researchers in using them in diverse domains or hybridizing them with other popular algorithms. It also provides an in-depth examination of the KH and GSA in terms of application, modification, and hybridization. It is important to note that the goal of the study is to offer a viewpoint on GSA with KH, particularly for academics interested in investigating the capabilities and performance of the algorithm in the healthcare and medical domains.Comment: 35 page
    corecore