1,261 research outputs found
Deceptive Opinions Detection Using New Proposed Arabic Semantic Features
Some users try to post false reviews to promote or to devalue other’s products and services. This action is known as deceptive opinions spam, where spammers try to gain or to profit from posting untruthful reviews. Therefore, we conducted this work to develop and to implement new semantic features to improve the Arabic deception detection. These features were inspired from the study of discourse parse and the rhetoric relations in Arabic. Looking to the importance of the phrase unit in the Arabic language and the grammatical studies, we have analyzed and selected the most used unit markers and relations to calculate the proposed features. These last were used basically to represent the reviews texts in the classification phase. Thus, the most accurate classification technique used in this area which has been proven by several previous works is the Support Vector Machine classifier (SVM). But there is always a lack concerning the Arabic annotated resources specially for deception detection area as it is considered new research area. Therefore, we used the semi supervised SVM to overcome this problem by using the unlabeled data
Modeling Analytical Streams for Social Business Intelligence
Social Business Intelligence (SBI) enables companies to capture strategic information from public social networks. Contrary to traditional Business Intelligence (BI), SBI has to face the high dynamicity of both the social network’s contents and the company’s analytical requests, as well as the enormous amount of noisy data. Effective exploitation of these continuous sources of data requires efficient processing of the streamed data to be semantically shaped into insightful facts. In this paper, we propose a multidimensional formalism to represent and evaluate social indicators directly from fact streams derived in turn from social network data. This formalism relies on two main aspects: the semantic representation of facts via Linked Open Data and the support of OLAP-like multidimensional analysis models. Contrary to traditional BI formalisms, we start the process by modeling the required social indicators according to the strategic goals of the company. From these specifications, all the required fact streams are modeled and deployed to trace the indicators. The main advantages of this approach are the easy definition of on-demand social indicators, and the treatment of changing dimensions and metrics through streamed facts. We demonstrate its usefulness by introducing a real scenario user case in the automotive sector
Macro-micro approach for mining public sociopolitical opinion from social media
During the past decade, we have witnessed the emergence of social media, which has prominence as a means for the general public to exchange opinions towards a broad range of topics. Furthermore, its social and temporal dimensions make it a rich resource for policy makers and organisations to understand public opinion. In this thesis, we present our research in understanding public opinion on Twitter along three dimensions: sentiment, topics and summary.
In the first line of our work, we study how to classify public sentiment on Twitter. We focus on the task of multi-target-specific sentiment recognition on Twitter, and propose an approach which utilises the syntactic information from parse-tree in conjunction with the left-right context of the target. We show the state-of-the-art performance on two datasets including a multi-target Twitter corpus on UK elections which we make public available for the research community. Additionally we also conduct two preliminary studies including cross-domain emotion classification on discourse around arts and cultural experiences, and social spam detection to improve the signal-to-noise ratio of our sentiment corpus.
Our second line of work focuses on automatic topical clustering of tweets. Our aim is to group tweets into a number of clusters, with each cluster representing a meaningful topic, story, event or a reason behind a particular choice of sentiment. We explore various ways of tackling this challenge and propose a two-stage hierarchical topic modelling system that is efficient and effective in achieving our goal.
Lastly, for our third line of work, we study the task of summarising tweets on common topics, with the goal to provide informative summaries for real-world events/stories or explanation underlying the sentiment expressed towards an issue/entity. As most existing tweet summarisation approaches rely on extractive methods, we propose to apply state-of-the-art neural abstractive summarisation model for tweets. We also tackle the challenge of cross-medium supervised summarisation with no target-medium training resources. To the best of our knowledge, there is no existing work on studying neural abstractive summarisation on tweets. In addition, we present a system for providing interactive visualisation of topic-entity sentiments and the corresponding summaries in chronological order.
Throughout our work presented in this thesis, we conduct experiments to evaluate and verify the effectiveness of our proposed models, comparing to relevant baseline methods. Most of our evaluations are quantitative, however, we do perform qualitative analyses where it is appropriate. This thesis provides insights and findings that can be used for better understanding public opinion in social media
Dissecting AI-Generated Fake Reviews: Detection and Analysis of GPT-Based Restaurant Reviews on Social Media
Recent advances in generative models such as GPT may be used to fabricate indistinguishable fake customer reviews at a much lower cost, posing challenges for social media platforms to detect this kind of content. This study addresses two research questions: (1) the effective detection of AI-generated restaurant reviews generated from high-quality elite authentic reviews, and (2) the comparison of out-of-sample predicted AI-generated reviews and authentic reviews across multiple dimensions of review, user, restaurant, and content characteristics. We fine-tuned a GPT text detector to predict fake reviews, significantly outperforming existing solutions. We applied the model to predict non-elite reviews that already passed the Yelp filtering system, revealing that AI-generated reviews typically score higher ratings, users posting such content have less established Yelp reputations and AI-generated reviews are more comprehensible and less linguistically complex than human-generated reviews. Notably, machine-generated reviews are more prevalent in low-traffic restaurants in terms of customer visits
Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection
The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
We present REMARK-LLM, a novel efficient, and robust watermarking framework
designed for texts generated by large language models (LLMs). Synthesizing
human-like content using LLMs necessitates vast computational resources and
extensive datasets, encapsulating critical intellectual property (IP). However,
the generated content is prone to malicious exploitation, including spamming
and plagiarism. To address the challenges, REMARK-LLM proposes three new
components: (i) a learning-based message encoding module to infuse binary
signatures into LLM-generated texts; (ii) a reparameterization module to
transform the dense distributions from the message encoding to the sparse
distribution of the watermarked textual tokens; (iii) a decoding module
dedicated for signature extraction; Furthermore, we introduce an optimized beam
search algorithm to guarantee the coherence and consistency of the generated
content. REMARK-LLM is rigorously trained to encourage the preservation of
semantic integrity in watermarked content, while ensuring effective watermark
retrieval. Extensive evaluations on multiple unseen datasets highlight
REMARK-LLM proficiency and transferability in inserting 2 times more signature
bits into the same texts when compared to prior art, all while maintaining
semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against
a spectrum of watermark detection and removal attacks
- …