Search CORE

3 research outputs found

The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums

Author: Chen Cheng
R Kesav Bharadwaj
Srinivasan Venkatesh
Wu Kui
Publication venue
Publication date: 01/01/2012
Field of study

In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commercial campaigns hidden behind the answers. However, existing research focuses more on the quality of answers and does not meet the above need. In this paper, we develop a system that automatically analyzes the hidden patterns of commercial spam and raises alarms instantaneously to end users whenever a potential commercial campaign is detected. Our detection method integrates semantic analysis and posters' track records and utilizes the special features of CQA websites largely different from those in other types of forums such as microblogs or news reports. Our system is adaptive and accommodates new evidence uncovered by the detection algorithms over time. Validated with real-world trace data from a popular Chinese CQA website over a period of three months, our system shows great potential towards adaptive online detection of CQA spams.Comment: 9 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Quality-biased ranking of short texts in microblogging services

Author: Minlie Huang
Xiaoyan Zhu
Yi Yang
Publication venue: Asian Federation of Natural Language Processing (AFNLP)
Publication date: 01/01/2011
Field of study

Meeting: 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8 - 13, 2011The abundance of user-generated content comes at a price: the quality of content may range from very high to very low. We propose a regression approach that incorporates various features to recommend short-text documents from Twitter, with a bias toward quality perspective. The approach is built on top of a linear regression model which includes a regularization factor inspired from the content conformity hypothesis - documents similar in content may have similar quality. We test the system on the Edinburgh Twitter corpus. Experimental results show that the regularization factor inspired from the hypothesis can improve the ranking performance and that using unlabeled data can make ranking performance better. Comparative results show that our method outperforms several baseline systems. We also make systematic feature analysis and find that content quality features are dominant in short-text ranking

CiteSeerX

International Development Research Centre: IDRC Digital Library

Exploiting Social Media Network Structure to Improve User Profiles for Short-Text-Based Recommender Systems

Author: Alshammari Abdullah
Publication venue
Publication date: 01/05/2019
Field of study

University of Brighton Research Portal