1,382 research outputs found
Task-specific Word Identification from Short Texts Using a Convolutional Neural Network
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
Automatic fake news detection is a challenging problem in deception
detection, and it has tremendous real-world political and social impacts.
However, statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets. In this paper, we present
liar: a new, publicly available dataset for fake news detection. We collected a
decade-long, 12.8K manually labeled short statements in various contexts from
PolitiFact.com, which provides detailed analysis report and links to source
documents for each case. This dataset can be used for fact-checking research as
well. Notably, this new dataset is an order of magnitude larger than previously
largest public fake news datasets of similar type. Empirically, we investigate
automatic fake news detection based on surface-level linguistic patterns. We
have designed a novel, hybrid convolutional neural network to integrate
meta-data with text. We show that this hybrid approach can improve a text-only
deep learning model.Comment: ACL 201
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
A systematic literature review on spam content detection and classification
The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e ., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection
Recommended from our members
Enhancing YouTube Spam Detection
This culminating experience project investigated various methods for enhancing spam detection on YouTube, a prevalent issue impacting user experience and platform integrity. The research questions addressed were: Q1) How do different spam detection methods compare regarding robustness, efficiency, and accuracy? Q2) What role do deep learning approaches like RNNs and CNNs play in improving spam comment identification? Q3) What are the unique benefits of using deep learning models for spam comment identification on YouTube? Q4) How can machine learning models be optimized for real-time spam detection on YouTube?
The study gave adequate findings that explained each research question. In the case of (Q1), while algorithms like the Naïve Bayes and Logistic Regression offered precision in identifying spam emails, the models have proven ineffectual at adapting to new forms of spam and constant enhancement in spam techniques, deep learning algorithms like the CNN and RNN offered high accuracy through their robustness due to the models\u27 abilities of feature extraction independently from the text data. The results shown in (Q2) indicate that RNNs and CNNs are critical in transforming the level of spam detection by addressing the problem of semantic meaning and temporal relationships in comments and surpassing traditional methods. Concerning (Q3), it was pointed out that deep learning models are the most accurate, scalable, and resistant to false negatives when identifying spam comments on the videos hosted on YouTube, which helps regain users\u27 trust and enhance the platform\u27s security as the traffic continues to grow. (Q4) was focused on advancing machine learning models for real-time processing, using methods such as model pruning and distribution.
The findings were as follows: (Q1) found that although conventional approaches are efficient at meeting accurate results, deep learning models are highly effective in dealing with the changes in spam strategies. (Q2) pointed out that RNNs and CNNs contribute immensely to discovering spam in SM platforms due to their raw power in NLP and pattern recognition. (Q3) established that the deep learning models\u27 accuracy, scalability, and adaptability, including CNN and RNN, are beneficial in identifying spam on YouTube due to their effectiveness in tackling the ever-evolving spam tactics. (Q4) It has emerged that the fine-tuning of machine learning models is imperative for scaling up the approaches by deploying high-end methodologies for real-time spam detection, which subserves the daunting task of training the algorithms to deal with the flood of user-generated content in the context of YouTube.
Areas of further study include analyzing other complex natural language processing methods combined with classifiers for better spam identification, improving the computational time for multi-modal learning for spam comment detection, and considering federated learning for real-time spam identification on platforms such as YouTube. These research directions are being carried out to boost the existing permutations and improve the permeate spam detection technologies in Information Systems so that they can be efficient, effective, and highly accurate systems capable of coping with the newly emerged spam technologies in flexible, transparent, and effective ways
- …