93 research outputs found

    Predicting Pulsars from Imbalanced Dataset with Hybrid Resampling Approach

    Get PDF
    Pulsar stars, usually neutron stars, are spherical and compact objects containing a large quantity of mass. Each pulsar star possesses a magnetic field and emits a slightly different pattern of electromagnetic radiation which is used to identify the potential candidates for a real pulsar star. Pulsar stars are considered an important cosmic phenomenon, and scientists use them to study nuclear physics, gravitational waves, and collisions between black holes. Defining the process of automatic detection of pulsar stars can accelerate the study of pulsar stars by scientists. This study contrives an accurate and efficient approach for true pulsar detection using supervised machine learning. For experiments, the high time-resolution (HTRU2) dataset is used in this study. To resolve the data imbalance problem and overcome model overfitting, a hybrid resampling approach is presented in this study. Experiments are performed with imbalanced and balanced datasets using well-known machine learning algorithms. Results demonstrate that the proposed hybrid resampling approach proves highly influential to avoid model overfitting and increase the prediction accuracy. With the proposed hybrid resampling approach, the extra tree classifier achieves a 0.993 accuracy score for true pulsar star prediction

    Deepfake tweets classification using stacked Bi-LSTM and words embedding

    Get PDF
    The spread of altered media in the form of fake videos, audios, and images, has been largely increased over the past few years. Advanced digital manipulation tools and techniques make it easier to generate fake content and post it on social media. In addition, tweets with deep fake content make their way to social platforms. The polarity of such tweets is significant to determine the sentiment of people about deep fakes. This paper presents a deep learning model to predict the polarity of deep fake tweets. For this purpose, a stacked bi-directional long short-term memory (SBi-LSTM) network is proposed to classify the sentiment of deep fake tweets. Several well-known machine learning classifiers are investigated as well such as support vector machine, logistic regression, Gaussian Naive Bayes, extra tree classifier, and AdaBoost classifier. These classifiers are utilized with term frequency-inverse document frequency and a bag of words feature extraction approaches. Besides, the performance of deep learning models is analyzed including long short-term memory network, gated recurrent unit, bi-direction LSTM, and convolutional neural network+LSTM. Experimental results indicate that the proposed SBi-LSTM outperforms both machine and deep learning models and achieves an accuracy of 0.92

    Electroencephalogram Signals for Detecting Confused Students in Online Education Platforms with Probability-Based Features

    Get PDF
    Article discusses how despite the advantages of online education, it lacks face-to-face settings, which makes it very difficult to analyze the students’ level of interaction, understanding, and confusion. This study proposes a novel engineering approach that uses probability-based features (PBF) for increasing the efficacy of machine learning models

    Thyroid disease prediction using selective features and machine learning techniques

    Get PDF
    Producción CientíficaSimple Summary: The study presents a thyroid disease prediction approach which utilizes random forest-based features to obtain high accuracy. The approach can obtain a 0.99 accuracy to predict ten thyroid diseases.Thyroid disease prediction has emerged as an important task recently. Despite existing approaches for its diagnosis, often the target is binary classification, the used datasets are small-sized and results are not validated either. Predominantly, existing approaches focus on model optimization and the feature engineering part is less investigated. To overcome these limitations, this study presents an approach that investigates feature engineering for machine learning and deep learning models. Forward feature selection, backward feature elimination, bidirectional feature elimination, and machine learning-based feature selection using extra tree classifiers are adopted. The proposed approach can predict Hashimoto’s thyroiditis (primary hypothyroid), binding protein (increased binding protein), autoimmune thyroiditis (compensated hypothyroid), and non-thyroidal syndrome (NTIS) (concurrent non-thyroidal illness). Extensive experiments show that the extra tree classifier-based selected feature yields the best results with 0.99 accuracy and an F1 score when used with the random forest classifier. Results suggest that the machine learning models are a better choice for thyroid disease detection regarding the provided accuracy and the computational complexity. K-fold cross-validation and performance comparison with existing studies corroborate the superior performance of the proposed approach

    Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data

    Get PDF
    Twitter has become the foremost standard of social media in today’s world. Over 335 million users are online monthly, and near about 80% are accessing it through their mobiles. Further, Twitter is now supporting 35+ which enhance its usage too much. It facilitates people having different languages. Near about 21% of the total users are from US and 79% of total users are outside of US. A tweet is restricted to a hundred and forty characters; hence it contains such information which is more concise and much valuable. Due to its usage, it is estimated that five hundred million tweets are sent per day by different categories of people including teacher, students, celebrities, officers, musician, etc. So, there is a huge amount of data that is increasing on a daily basis that need to be categorized. The important key feature is to find the keywords in the huge data that is helpful for identifying a twitter for classification. For this purpose, Term Frequency-Inverse Document Frequency (TF-IDF) and Loglikelihood methods are chosen for keywords extracted from the music field and perform a comparative analysis on both results. In the end, relevance is performed from 5 users so that finally we can take a decision to make assumption on the basis of experiments that which method is best. This analysis is much valuable because it gives a more accurate estimation which method’s results are more reliable

    Consortium framework using blockchain for asthma healthcare in pandemics

    Get PDF
    Producción CientíficaAsthma is a deadly disease that affects the lungs and air supply of the human body. Coronavirus and its variants also affect the airways of the lungs. Asthma patients approach hospitals mostly in a critical condition and require emergency treatment, which creates a burden on health institutions during pandemics. The similar symptoms of asthma and coronavirus create confusion for health workers during patient handling and treatment of disease. The unavailability of patient history to physicians causes complications in proper diagnostics and treatments. Many asthma patient deaths have been reported especially during pandemics, which necessitates an efficient framework for asthma patients. In this article, we have proposed a blockchain consortium healthcare framework for asthma patients. The proposed framework helps in managing asthma healthcare units, coronavirus patient records and vaccination centers, insurance companies, and government agencies, which are connected through the secure blockchain network. The proposed framework increases data security and scalability as it stores encrypted patient data on the Interplanetary File System (IPFS) and keeps data hash values on the blockchain. The patient data are traceable and accessible to physicians and stakeholders, which helps in accurate diagnostics, timely treatment, and the management of patients. The smart contract ensures the execution of all business rules. The patient profile generation mechanism is also discussed. The experiment results revealed that the proposed framework has better transaction throughput, query delay, and security than existing solutions

    Fake news detection in Urdu language using machine learning

    No full text
    With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively

    Tweets Classification on the Base of Sentiments for US Airline Companies

    No full text
    The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers
    • …
    corecore