14 research outputs found

    and

    No full text
    We present a thorough investigation on using machine learning to construct effective personalized anti-spam filters. The investigation includes four learning algorithms, Naive Bayes, Flexible Bayes, LogitBoost, and Support Vector Machines, and four datasets, constructed from the mailboxes of different users. We discuss the model and search biases of the learning algorithms, along with worst-case computational complexity figures, and observe how the latter relate to experimental measurements. We study how classification accuracy is affected when using attributes that represent sequences of tokens, as opposed to single tokens, and explore the effect of the size of the attribute and training set, all within a cost-sensitive framework. Furthermore, we describe the architecture of a fully implemented learning-based anti-spam filter, and present an analysis of its behavior in real use over a period of seven months. Information is also provided on other available learning-based anti-spam filters, and alternative filtering approaches

    Filtron: A Learning-Based Anti-Spam Filter

    No full text
    We present Filtron, a prototype anti-spam filter that integrates the main empirical conclusions of our comprehensive analysis on using machine learning to construct e#ective personalized anti-spam filters. Filtron is based on the experimental results over several design parameters on four publicly available benchmark corpora. After describing Filtron's architecture, we assess its behavior in real use over a period of seven months. The results are deemed satisfactory, though they can be improved with more elaborate preprocessing and regular re-training

    Spectral counting of triangles in power-law networks via element-wise sparsification

    No full text
    Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio, two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful graph mining applications rely on the number of triangles. In this paper, we study the problem of counting triangles in large, power-law networks. Our algorithm, SPARSI-FYINGEIGENTRIANGLE, relies on the spectral properties of power-law networks and the Achlioptas-McSherry sparsification process. SPARSIFYINGEIGENTRIANGLE is easy to parallelize, fast and accurate. We verify the validity of our approach with several experiments in real-world graphs, where we achieve at the same time high accuracy and important speedup versus a straight-forward exact counting competitor.
    corecore