6,148 research outputs found

    Experimental Approach Based on Ensemble and Frequent Itemsets Mining for Image Spam Filtering

    Get PDF
    Excessive amounts of image spam cause many problems to e-mail users. Since image spam is difficult to detect using conventional text-based spam approach, various image processing techniques have been proposed. In this paper, we present an ensemble method using frequent itemset mining (FIM) for filtering image spam. Despite the fact that FIM techniques are well established in data mining, it is not commonly used in the ensemble method. In order to obtain a good filtering performance, a SIFT descriptor is used since it is widely known as effective image descriptors. K-mean clustering is applied to the SIFT keypoints which produce a visual codebook. The bag-of-word (BOW) feature vectors for each image is generated using a hard bag-of-features (HBOF) approach. FIM descriptors are obtained from the frequent itemsets of the BOW feature vectors. We combine BOW, FIM with another three different feature selections, namely Information Gain (IG), Symmetrical Uncertainty (SU) and Chi Square (CS) with a Spatial Pyramid in an ensemble method. We have performed experiments on Dredze and SpamArchive datasets. The results show that our ensemble that uses the frequent itemsets mining has significantly outperform the traditional BOW and naive approach that combines all descriptors directly in a very large single input vector

    Ranking News-Quality Multimedia

    Full text link
    News editors need to find the photos that best illustrate a news piece and fulfill news-media quality standards, while being pressed to also find the most recent photos of live events. Recently, it became common to use social-media content in the context of news media for its unique value in terms of immediacy and quality. Consequently, the amount of images to be considered and filtered through is now too much to be handled by a person. To aid the news editor in this process, we propose a framework designed to deliver high-quality, news-press type photos to the user. The framework, composed of two parts, is based on a ranking algorithm tuned to rank professional media highly and a visual SPAM detection module designed to filter-out low-quality media. The core ranking algorithm is leveraged by aesthetic, social and deep-learning semantic features. Evaluation showed that the proposed framework is effective at finding high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and a classification precision of 70%.Comment: To appear in ICMR'1

    Analysis of adversarial attacks against CNN-based image forgery detectors

    Full text link
    With the ubiquitous diffusion of social networks, images are becoming a dominant and powerful communication channel. Not surprisingly, they are also increasingly subject to manipulations aimed at distorting information and spreading fake news. In recent years, the scientific community has devoted major efforts to contrast this menace, and many image forgery detectors have been proposed. Currently, due to the success of deep learning in many multimedia processing tasks, there is high interest towards CNN-based detectors, and early results are already very promising. Recent studies in computer vision, however, have shown CNNs to be highly vulnerable to adversarial attacks, small perturbations of the input data which drive the network towards erroneous classification. In this paper we analyze the vulnerability of CNN-based image forensics methods to adversarial attacks, considering several detectors and several types of attack, and testing performance on a wide range of common manipulations, both easily and hardly detectable

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart
    corecore