34 research outputs found

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem

    INSTANT MESSAGING SPAM DETECTION IN LONG TERM EVOLUTION NETWORKS

    Get PDF
    The lack of efficient spam detection modules for packet data communication is resulting to increased threat exposure for the telecommunication network users and the service providers. In this thesis, we propose a novel approach to classify spam at the server side by intercepting packet-data communication among instant messaging applications. Spam detection is performed using machine learning techniques on packet headers and contents (if unencrypted) in two different phases: offline training and online classification. The contribution of this study is threefold. First, it identifies the scope of deploying a spam detection module in a state-of-the-art telecommunication architecture. Secondly, it compares the usefulness of various existing machine learning algorithms in order to intercept and classify data packets in near real-time communication of the instant messengers. Finally, it evaluates the accuracy and classification time of spam detection using our approach in a simulated environment of continuous packet data communication. Our research results are mainly generated by executing instances of a peer-to-peer instant messaging application prototype within a simulated Long Term Evolution (LTE) telecommunication network environment. This prototype is modeled and executed using OPNET network modeling and simulation tools. The research produces considerable knowledge on addressing unsolicited packet monitoring in instant messaging and similar applications

    TecnologĂ­a, libertad y privacidad

    Get PDF
    La tecnología garante de la privacidad, llamada PET o Privacy by design, estå llegando a las recomendaciones de los grupos de expertos que asesoran a las Agencias nacionales de protección de dato. Si nos tomamos en serio la privacidad, deberemos convenir en la complementariedad de la tecnología con el Derecho como garantes de libertades. La suma de ambas garantías coordinadas todavía sería mejor. Quizås el principio de privacidad en el diseño sea un primer paso en este sentido. En cuanto a las libertades informativas, la situación es distinta. No hay aquí ninguna llamada a la protección tecnológica, ni siquiera limitada a un caso reducido o concreto. La evolución tecnológica es en paralelo a la jurídica, y por el momento, las posibilidades tecnológicas se usan para clasificar y obtener significado relevante para un espacio complejo y que ha crecido sin orden preestablecido: la blogosfera. El jurista debería acercarse sin complejos a esta propuesta multidisciplinar de estudio de las libertades informativas, si de verdad quiere complementar la protección jurídica de derechos fundamentales con el también apasionante mundo de la tecnología garante de los derechos fundamentales

    RSS v2.0: Spamming, User Experience and Formalization

    Get PDF
    RSS, once the most popular publish/subscribe system is believed to have come to an end due to reasons unexplored yet. The aim of this thesis is to examine one such reason, spamming. The context of this thesis is limited to spamming related to RSS v2.0. The study discusses RSS as a publish/subscribe system and investigates the possible reasons for the decline in the use of such a system and possible solutions to address RSS spamming. The thesis introduces RSS (being dependent on feed readers) and tries to find its relationship with spamming. In addition, the thesis tries to investigate possible socio-technical influences on spamming in RSS. The author presents the idea of applying formalization (formal specification technique) to open standards, RSSv2.0 in particular. Formal specifications are more concise, consistent, unambiguous and highly reusable in many cases. The merging of formal specification methods and open standards allows for i) a more concrete standard design, ii) an improved understanding of the environment under design, iii) an enforced certain level of precision into the specification, and also iv) provides software engineers with extended property checking/verification capabilities. The author supports and proposes the use of formalization in RSS. Based on the inferences gathered from the user experiment conducted during the course of this study, an analysis on the downfall of RSS is presented. However, the user experiment opens up different directions for future work in the evolution of RSS v3.0 which could be supported by formalization. The thesis concludes that RSS is on the verge of death/discontinuation due to the adverse effects of spamming and lack of its development which is evident from the limited amount of available research literature. RSS Feeds is a perfect example of what happens to a software if it fails to evolve itself with time

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    On Profiling Blogs with Representative Entries

    Get PDF
    corecore