5,527 research outputs found

    Ranking News-Quality Multimedia

    Full text link
    News editors need to find the photos that best illustrate a news piece and fulfill news-media quality standards, while being pressed to also find the most recent photos of live events. Recently, it became common to use social-media content in the context of news media for its unique value in terms of immediacy and quality. Consequently, the amount of images to be considered and filtered through is now too much to be handled by a person. To aid the news editor in this process, we propose a framework designed to deliver high-quality, news-press type photos to the user. The framework, composed of two parts, is based on a ranking algorithm tuned to rank professional media highly and a visual SPAM detection module designed to filter-out low-quality media. The core ranking algorithm is leveraged by aesthetic, social and deep-learning semantic features. Evaluation showed that the proposed framework is effective at finding high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and a classification precision of 70%.Comment: To appear in ICMR'1

    Finding Relevant Answers in Software Forums

    Get PDF
    Abstractā€”Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin

    Community based Question Answer Detection

    Get PDF
    Each day, millions of people ask questions and search for answers on the World Wide Web. Due to this, the Internet has grown to a world wide database of questions and answers, accessible to almost everyone. Since this database is so huge, it is hard to find out whether a question has been answered or even asked before. As a consequence, users are asking the same questions again and again, producing a vicious circle of new content which hides the important information. One platform for questions and answers are Web forums, also known as discussion boards. They present discussions as item streams where each item contains the contribution of one author. These contributions contain questions and answers in human readable form. People use search engines to search for information on such platforms. However, current search engines are neither optimized to highlight individual questions and answers nor to show which questions are asked often and which ones are already answered. In order to close this gap, this thesis introduces the \\emph{Effingo} system. The Effingo system is intended to extract forums from around the Web and find question and answer items. It also needs to link equal questions and aggregate associated answers. That way it is possible to find out whether a question has been asked before and whether it has already been answered. Based on these information it is possible to derive the most urgent questions from the system, to determine which ones are new and which ones are discussed and answered frequently. As a result, users are prevented from creating useless discussions, thus reducing the server load and information overload for further searches. The first research area explored by this thesis is forum data extraction. The results from this area are intended be used to create a database of forum posts as large as possible. Furthermore, it uses question-answer detection in order to find out which forum items are questions and which ones are answers and, finally, topic detection to aggregate questions on the same topic as well as discover duplicate answers. These areas are either extended by Effingo, using forum specific features such as the user graph, forum item relations and forum link structure, or adapted as a means to cope with the specific problems created by user generated content. Such problems arise from poorly written and very short texts as well as from hidden or distributed information

    Automated Crowdturfing Attacks and Defenses in Online Review Systems

    Full text link
    Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers
    • ā€¦
    corecore