2,736 research outputs found

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Using Web Archives to Enrich the Live Web Experience Through Storytelling

    Get PDF
    Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, storytelling is becoming a popular technique in social media for selecting Web resources to support a particular narrative or story . In this dissertation, we address the problem of understanding the archived collections through proposing the Dark and Stormy Archive (DSA) framework, in which we integrate storytelling social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. To inform our work of generating stories from archived collections, we start by building a baseline for the structural characteristics of popular (i.e., receiving the most views) human-generated stories through investigating stories from Storify. Furthermore, we checked the entire population of Archive-It collections for better understanding the characteristics of the collections we intend to summarize. We then filter off-topic pages from the collections the using different methods to detect when an archived page in a collection has gone off-topic. We created a gold standard dataset from three Archive-It collections to evaluate the proposed methods at different thresholds. From the gold standard dataset, we identified five behaviors for the TimeMaps (a list of archived copies of a page) based on the page’s aboutness. Based on a dynamic slicing algorithm, we divide the collection and cluster the pages in each slice. We then select the best representative page from each cluster based on different quality metrics (e.g., the replay quality, and the quality of the generated snippet from the page). At the end, we put the selected pages in chronological order and visualize them using Storify. For evaluating the DSA framework, we obtained a ground truth dataset of hand-crafted stories from Archive-It collections generated by expert archivists. We used Amazon’s Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated storie

    Quantum inspired approach for early classification of time series

    Get PDF
    Is it possible to apply some fundamental principles of quantum-computing to time series classi\ufb01cation algorithms? This is the initial spark that became the research question I decided to chase at the very beginning of my PhD studies. The idea came accidentally after reading a note on the ability of entanglement to express the correlation between two particles, even far away from each other. The test problem was also at hand because I was investigating on possible algorithms for real time bot detection, a challenging problem at present day, by means of statistical approaches for sequential classi\ufb01cation. The quantum inspired algorithm presented in this thesis stemmed as an evolution of the statistical method mentioned above: it is a novel approach to address binary and multinomial classi\ufb01cation of an incoming data stream, inspired by the principles of Quantum Computing, in order to ensure the shortest decision time with high accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each item in a data stream with the preceding ones. Starting from the a-posteriori probability of each item to belong to a particular class, we can assign a Qubit state representing a combination of the aforesaid probabilities for all available observations of the time series. By leveraging superposition and entanglement on subsequences of growing length, it is possible to devise a measure of membership to each class, thus enabling the system to take a reliable decision when a suf\ufb01cient level of con\ufb01dence is met. In order to provide an extensive and thorough analysis of the problem, a well-\ufb01tting approach for bot detection was replicated on our dataset and later compared with the statistical algorithm to determine the best option. The winner was subsequently examined against the new quantum-inspired proposal, showing the superior capability of the latter in both binary and multinomial classi\ufb01cation of data streams. The validation of quantum-inspired approach in a synthetically generated use case, completes the research framework and opens new perspectives in on-the-\ufb02y time series classi\ufb01cation, that we have just started to explore. Just to name a few ones, the algorithm is currently being tested with encouraging results in predictive maintenance and prognostics for automotive, in collaboration with University of Bradford (UK), and in action recognition from video streams

    Machine Tool Communication (MTComm) Method and Its Applications in a Cyber-Physical Manufacturing Cloud

    Get PDF
    The integration of cyber-physical systems and cloud manufacturing has the potential to revolutionize existing manufacturing systems by enabling better accessibility, agility, and efficiency. To achieve this, it is necessary to establish a communication method of manufacturing services over the Internet to access and manage physical machines from cloud applications. Most of the existing industrial automation protocols utilize Ethernet based Local Area Network (LAN) and are not designed specifically for Internet enabled data transmission. Recently MTConnect has been gaining popularity as a standard for monitoring status of machine tools through RESTful web services and an XML based messaging structure, but it is only designed for data collection and interpretation and lacks remote operation capability. This dissertation presents the design, development, optimization, and applications of a service-oriented Internet-scale communication method named Machine Tool Communication (MTComm) for exchanging manufacturing services in a Cyber-Physical Manufacturing Cloud (CPMC) to enable manufacturing with heterogeneous physically connected machine tools from geographically distributed locations over the Internet. MTComm uses an agent-adapter based architecture and a semantic ontology to provide both remote monitoring and operation capabilities through RESTful services and XML messages. MTComm was successfully used to develop and implement multi-purpose applications in in a CPMC including remote and collaborative manufacturing, active testing-based and edge-based fault diagnosis and maintenance of machine tools, cross-domain interoperability between Internet-of-things (IoT) devices and supply chain robots etc. To improve MTComm’s overall performance, efficiency, and acceptability in cyber manufacturing, the concept of MTComm’s edge-based middleware was introduced and three optimization strategies for data catching, transmission, and operation execution were developed and adopted at the edge. Finally, a hardware prototype of the middleware was implemented on a System-On-Chip based FPGA device to reduce computational and transmission latency. At every stage of its development, MTComm’s performance and feasibility were evaluated with experiments in a CPMC testbed with three different types of manufacturing machine tools. Experimental results demonstrated MTComm’s excellent feasibility for scalable cyber-physical manufacturing and superior performance over other existing approaches

    Machine Tool Communication (MTComm) Method and Its Applications in a Cyber-Physical Manufacturing Cloud

    Get PDF
    The integration of cyber-physical systems and cloud manufacturing has the potential to revolutionize existing manufacturing systems by enabling better accessibility, agility, and efficiency. To achieve this, it is necessary to establish a communication method of manufacturing services over the Internet to access and manage physical machines from cloud applications. Most of the existing industrial automation protocols utilize Ethernet based Local Area Network (LAN) and are not designed specifically for Internet enabled data transmission. Recently MTConnect has been gaining popularity as a standard for monitoring status of machine tools through RESTful web services and an XML based messaging structure, but it is only designed for data collection and interpretation and lacks remote operation capability. This dissertation presents the design, development, optimization, and applications of a service-oriented Internet-scale communication method named Machine Tool Communication (MTComm) for exchanging manufacturing services in a Cyber-Physical Manufacturing Cloud (CPMC) to enable manufacturing with heterogeneous physically connected machine tools from geographically distributed locations over the Internet. MTComm uses an agent-adapter based architecture and a semantic ontology to provide both remote monitoring and operation capabilities through RESTful services and XML messages. MTComm was successfully used to develop and implement multi-purpose applications in in a CPMC including remote and collaborative manufacturing, active testing-based and edge-based fault diagnosis and maintenance of machine tools, cross-domain interoperability between Internet-of-things (IoT) devices and supply chain robots etc. To improve MTComm’s overall performance, efficiency, and acceptability in cyber manufacturing, the concept of MTComm’s edge-based middleware was introduced and three optimization strategies for data catching, transmission, and operation execution were developed and adopted at the edge. Finally, a hardware prototype of the middleware was implemented on a System-On-Chip based FPGA device to reduce computational and transmission latency. At every stage of its development, MTComm’s performance and feasibility were evaluated with experiments in a CPMC testbed with three different types of manufacturing machine tools. Experimental results demonstrated MTComm’s excellent feasibility for scalable cyber-physical manufacturing and superior performance over other existing approaches

    Detection of unsolicited web browsing with clustering and statistical analysis

    Get PDF
    Unsolicited web browsing denotes illegitimate accessing or processing web content. The harmful activity varies from extracting e-mail information to downloading entire website for duplication. In addition, computer criminals prevent legitimate users from gaining access to websites by implementing a denial of service attack with high-volume legitimate traffic. These offences are accomplished by preprogrammed machines that avoid rate-dependent intrusion detection systems. Therefore, it is assumed in this thesis that the only difference between a legitimate and malicious web session is in the intention rather than physical characteristics or network-layer information. As a result, the main aim of this research has been to provide a method of malicious intention detection. This has been accomplished by two-fold process. Initially, to discover most recent and popular transitions of lawful users, a clustering method has been introduced based on entropy minimisation. In principle, by following popular transitions among the web objects, the legitimate users are placed in low-entropy clusters, as opposed to the undesired hosts whose transitions are uncommon, and lead to placement in high-entropy clusters. In addition, by comparing distributions of sequences of requests generated by the actual and malicious users across the clusters, it is possible to discover whether or not a website is under attack. Secondly, a set of statistical measurements have been tested to detect the actual intention of browsing hosts. The intention classification based on Bayes factors and likelihood analysis have provided the best results. The combined approach has been validated against actual web traces (i.e. datasets), and generated promising results

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    A closer look at Intrusion Detection System for web applications

    Full text link
    Intrusion Detection System (IDS) is one of the security measures being used as an additional defence mechanism to prevent the security breaches on web. It has been well known methodology for detecting network-based attacks but still immature in the domain of securing web application. The objective of the paper is to thoroughly understand the design methodology of the detection system in respect to web applications. In this paper, we discuss several specific aspects of a web application in detail that makes challenging for a developer to build an efficient web IDS. The paper also provides a comprehensive overview of the existing detection systems exclusively designed to observe web traffic. Furthermore, we identify various dimensions for comparing the IDS from different perspectives based on their design and functionalities. We also provide a conceptual framework of an IDS with prevention mechanism to offer a systematic guidance for the implementation of the system specific to the web applications. We compare its features with five existing detection systems, namely AppSensor, PHPIDS, ModSecurity, Shadow Daemon and AQTRONIX WebKnight. The paper will highly facilitate the interest groups with the cutting edge information to understand the stronger and weaker sections of the web IDS and provide a firm foundation for developing an intelligent and efficient system
    • 

    corecore