Search CORE

1,382 research outputs found

Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

Author: Wu Xintao
Xiang Yang
Yuan Shuhan
Publication venue
Publication date: 02/06/2017
Field of study

Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. However, we often have a set of labeled short texts where each short text has a task-related class label, e.g., discriminatory or non-discriminatory, specified by users or learned by classification algorithms. In this paper, we focus on identifying task-specific words and phrases from short texts by exploiting their class labels rather than using seed words or lexical dictionaries. We consider the task-specific word and phrase identification as feature learning. We train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Experimental results on sentiment word identification show that our approach significantly outperforms existing methods. We further conduct two case studies to show the effectiveness of our approach. One case study on a crawled tweets dataset demonstrates that our approach can successfully capture the discrimination-related words/phrases. The other case study on fake review detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa

arXiv.org e-Print Archive

ScholarWorks@UARK

Crossref

UARK (University of Arkansas )

"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

Author: Wang William Yang
Publication venue
Publication date: 01/01/2017
Field of study

Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this paper, we present liar: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate meta-data with text. We show that this hybrid approach can improve a text-only deep learning model.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Artificial intelligence in the cyber domain: Offense and defense

Author: Diep Quoc Bao
Truong Thanh Cong
Zelinka Ivan
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41

Multidisciplinary Digital Publishing Institute

DSpace at VSB Technical University of Ostrava

A systematic literature review on spam content detection and classification

Author: Chandrasekaran Ganesh
Duraisamy Jude Hemanth
Kaddoura Sanaa
Popescu Daniela Elena
Publication venue: ZU Scholars
Publication date: 20/01/2022
Field of study

The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e ., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection

ZU Scholars (Zayed University)

PubMed Central

Recommended from our members

Enhancing YouTube Spam Detection

Author: Pesaru Sai Charan
Publication venue: CSUSB ScholarWorks
Publication date: 01/08/2024
Field of study

This culminating experience project investigated various methods for enhancing spam detection on YouTube, a prevalent issue impacting user experience and platform integrity. The research questions addressed were: Q1) How do different spam detection methods compare regarding robustness, efficiency, and accuracy? Q2) What role do deep learning approaches like RNNs and CNNs play in improving spam comment identification? Q3) What are the unique benefits of using deep learning models for spam comment identification on YouTube? Q4) How can machine learning models be optimized for real-time spam detection on YouTube? The study gave adequate findings that explained each research question. In the case of (Q1), while algorithms like the Naïve Bayes and Logistic Regression offered precision in identifying spam emails, the models have proven ineffectual at adapting to new forms of spam and constant enhancement in spam techniques, deep learning algorithms like the CNN and RNN offered high accuracy through their robustness due to the models\u27 abilities of feature extraction independently from the text data. The results shown in (Q2) indicate that RNNs and CNNs are critical in transforming the level of spam detection by addressing the problem of semantic meaning and temporal relationships in comments and surpassing traditional methods. Concerning (Q3), it was pointed out that deep learning models are the most accurate, scalable, and resistant to false negatives when identifying spam comments on the videos hosted on YouTube, which helps regain users\u27 trust and enhance the platform\u27s security as the traffic continues to grow. (Q4) was focused on advancing machine learning models for real-time processing, using methods such as model pruning and distribution. The findings were as follows: (Q1) found that although conventional approaches are efficient at meeting accurate results, deep learning models are highly effective in dealing with the changes in spam strategies. (Q2) pointed out that RNNs and CNNs contribute immensely to discovering spam in SM platforms due to their raw power in NLP and pattern recognition. (Q3) established that the deep learning models\u27 accuracy, scalability, and adaptability, including CNN and RNN, are beneficial in identifying spam on YouTube due to their effectiveness in tackling the ever-evolving spam tactics. (Q4) It has emerged that the fine-tuning of machine learning models is imperative for scaling up the approaches by deploying high-end methodologies for real-time spam detection, which subserves the daunting task of training the algorithms to deal with the flood of user-generated content in the context of YouTube. Areas of further study include analyzing other complex natural language processing methods combined with classifiers for better spam identification, improving the computational time for multi-modal learning for spam comment detection, and considering federated learning for real-time spam identification on platforms such as YouTube. These research directions are being carried out to boost the existing permutations and improve the permeate spam detection technologies in Information Systems so that they can be efficient, effective, and highly accurate systems capable of coping with the newly emerged spam technologies in flexible, transparent, and effective ways

CSUSB ScholarWorks