351 research outputs found

    MUFFLE: Multi-Modal Fake News Influence Estimator on Twitter

    Get PDF
    To alleviate the impact of fake news on our society, predicting the popularity of fake news posts on social media is a crucial problem worthy of study. However, most related studies on fake news emphasize detection only. In this paper, we focus on the issue of fake news influence prediction, i.e., inferring how popular a fake news post might become on social platforms. To achieve our goal, we propose a comprehensive framework, MUFFLE, which captures multi-modal dynamics by encoding the representation of news-related social networks, user characteristics, and content in text. The attention mechanism developed in the model can provide explainability for social or psychological analysis. To examine the effectiveness of MUFFLE, we conducted extensive experiments on real-world datasets. The experimental results show that our proposed method outperforms both state-of-the-art methods of popularity prediction and machine-based baselines in top-k NDCG and hit rate. Through the experiments, we also analyze the feature importance for predicting fake news influence via the explainability provided by MUFFLE

    Churn Identification and Prediction from a Large-Scale Telecommunication Dataset Using NLP

    Get PDF
    The identification of customer churn is a major issue for large telecom businesses. In order to manage the data of current customers as well as acquire and manage new customers, every day, a substantial volume of data gets generated. Therefore, it's crucial to identify the causes of client churn so that the appropriate steps can be taken to lower it. Numerous researchers have already discussed their efforts to combine static and dynamic approaches in order to reduce churn in big data sets, but these systems still have many issues when it comes to actually identifying churn. In this paper, we suggested two methods, the first of which is churn identification and using Natural Language Processing (NLP) methods and machine learning techniques, we make predictions based on a vast telecommunication data set. The NLP process involves data pre-processing, normalization, feature extraction, and feature selection. For feature extraction, we employ unique techniques like TF-IDF, Stanford NLP, and occurrence correlation methods, have been suggested. Throughout the lesson, a machine learning classification algorithm is used for training and testing. Finally, the system employs a variety of cross validation techniques and training and evaluating Machine learning algorithms. The experimental analysis shows the system's efficacy and accuracy

    Classification of Animal Sound Using Convolutional Neural Network

    Get PDF
    Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. High-level semantic inference can be conducted based on main audioeffects to facilitate various content-based applications for analysis, efficient recovery and content management. This paper proposes a flexible Convolutional neural network-based framework for animal audio classification. The work takes inspiration from various deep neural network developed for multimedia classification recently. The model is driven by the ideology of identifying the animal sound in the audio file by forcing the network to pay attention to core audio effect present in the audio to generate Mel-spectrogram. The designed framework achieves an accuracy of 98% while classifying the animal audio on weekly labelled datasets. The state-of-the-art in this research is to build a framework which could even run on the basic machine and do not necessarily require high end devices to run the classification

    Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings

    Get PDF
    Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharassment. Manual detection of cyberharassment is cumbersome, slow, and not even feasible in rapidly growing data. In this study, we addressed the challenges of automatic detection of the offensive tweets in the Arabic language. The main contribution of this study is to design and implement an intelligent prediction system encompassing a two-stage optimization approach to identify and classify the offensive from the non-offensive text. In the rst stage, the proposed approach ne-tuned the pre-trainedword embedding models by training them for several epochs on the training dataset. The embeddings of the vocabularies in the new dataset are trained and added to the old embeddings. While in the second stage, it employed a hybrid approach of two classi ers, namely XGBoost and SVM, and a genetic algorithm (GA) to mitigate the drawback of the classi ers in nding the optimal hyperparameter values to run the proposed approach. We tested the proposed approach on Arabic Cyberbullying Corpus (ArCybC), which contains tweets collected from four Twitter domains: gaming, sports, news, and celebrities. The ArCybC dataset has four categories: sexual, racial, intelligence, and appearance. The proposed approach produced superior results, in which the SVM algorithm with the Aravec SkipGram word embedding model achieved an accuracy rate of 88.2% and an F1-score rate of 87.8%.Ministerio Espanol de Ciencia e Innovacion (DemocratAI::UGR) PID2020-115570GB-C2

    A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends

    Get PDF
    Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy

    Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

    Get PDF
    Kaks küsimust on semantselt dubleeritud, arvestades, et täpselt sama vastus võib rahuldada mõlemaid küsimusi. Semantselt identsete küsimuste väljaselgitamine selliste sotsiaalmeedia platvormide kohta nagu Quora on erakordselt oluline, et tagada kasutajatele esitatud sisu kvaliteet ja kogus, lähtudes küsimuse kavatsusest ja nii rikastades üldist kasutajakogemust. Dubleerivate küsimuste avastamine on väljakutseks, sest looduskeel on väga väljendusrikas ning ainulaadset kavatsust saab edastada erinevate sõnade, fraaside ja lausekujunduse abil. Masinõppe ja sügava õppimise meetodid on teadaolevalt saavutanud paremaid tulemusi võrreldes traditsiooniliste loodusliku keeletöötlemise tehnikatega sarnaste tekstide väljaselgitamisel.Selles teoses, võttes Quora oma juhtumiuuringuks, uurisime ja kohaldasime erinevaid masinõppe- ja sügavõppetehnikaid ülesandel tuvastada Quora küsimuse paari andmestikul kahekordsed küsimused. Kasutades omaduste inseneritehnikat, eristavaid tähtsaid tehnikaid ning katsetades seitsme valitud masinõppe klassifikaatoriga, näitasime, et meie mudelid edestasid paari varasemat selle ülesandega seotud uuringut. Xgboost mudelil, mida söödetakse tähetaseme termilise sagedusega ja pöördsagedusega, saavutati teiste masinõppemudelite suhtes paremad tulemused ning edestati ka paari Deep learningi algmudelit.Meie kasutasime sügava õppimise tehnikat, et modelleerida neli erinevat sügavat neuralivõrgustikku, mis koosnevad Glove Embedding, Long Short Term Memory, Convolution, Max Pooling, Dense, Batch normaliseerimisest, aktuaalsetest funktsioonidest ja mudeli ühendamisest. Meie süvaõppemudelid saavutasid parema täpsuse kui masinõppemudelid. Kolm neljast väljapakutud arhitektuurist edestasid täpsust varasemast masinõppe- ja süvaõppetööst, kaks neljast mudelist edestasid täpsust varasemast sügava õppimise uuringust Quora küsitluspaari andmestik ning meie parim mudel saavutas täpsuse 85.82% mis on kunstilise seisundi Quora lähedane täpsus.Two questions are semantically duplicate, given that precisely the same answer can satisfy both the questions. Identifying semantically identical questions on, Question and Answering(QandA) social media platforms like Quora is exceptionally significant to ensure that the quality and the quantity of content are presented to users, based on the intent of the question and thus enriching overall user experience. Detecting duplicate questions is a challenging problem because natural language is very expressive, and a unique intent can be conveyed using different words, phrases, and sentence structuring. Machine learning and deep learning methods are known to have accomplished superior results over traditional natural language processing techniques in identifying similar texts.In this thesis, taking Quora for our case study, we explored and applied different machine learning and deep learning techniques on the task of identifying duplicate questions on Quora’s question pair dataset. By using feature engineering, feature importance techniques, and experimenting with seven selected machine learning classifiers, we demonstrated that our models outperformed a few of the previous studies on this task. Xgboost model, when fed with character level term frequency and inverse term frequency, achieved superior results to other machine learning models and also outperformed a few of the Deep learning baseline models.We applied deep learning techniques to model four different deep neural networks of multiple layers consisting of Glove embeddings, Long Short Term Memory, Convolution, Max pooling, Dense, Batch Normalization, Activation functions, and model merge. Our deep learning models achieved better accuracy than machine learning models. Three out of four proposed architectures outperformed the accuracy from previous machine learning and deep learning research work, two out of four models outperformed accuracy from previous deep learning study on Quora’s question pair dataset, and our best model achieved accuracy of 85.82% which is close to Quora state of the art accuracy
    corecore