4 research outputs found

    How Phishing Pages Look Like?

    Get PDF
    Recent phishing campaigns are increasingly targeted to specific, small population of users and last for increasingly shorter life spans. There is thus an urgent need for developing defense mechanisms that do not rely on any forms of blacklisting or reputation: there is simply no time for detecting novel phishing campaigns and notify all interested organizations quickly enough. Such mechanisms should be close to browsers and based solely on the visual appearance of the rendered page. One of the major impediments to research in this area is the lack of systematic knowledge about how phishing pages actually look like. In this work we describe the technical challenges in collecting a large and diverse collection of screenshots of phishing pages and propose practical solutions. We also analyze systematically the visual similarity between phishing pages and pages of targeted organizations, from the point of view of a similarity metric that has been proposed as a foundation for visual phishing detection and from the point of view of a human operator

    A Security-Oriented Analysis of Web Inclusions in the Italian Public Administration

    Get PDF
    Modern web sites serve content that browsers fetch automatically from a number of different web servers that may be placed anywhere in the world. Such content is essential for defining the appearance and behavior of a web site and is thus a potential target for attacks. Many public administrations offer services on the web, thus we have entered a world in which web sites of public interest are continuously and systematically depending on web servers that may be located anywhere in the world and are potentially under control of other governments. In this work we focus on these issues by investigating the content included by almost 10.000 web sites of the Italian Public Administration. We analyze the nature of such content, its quantity, its geographical location, the amount of dynamic variations over time. Our analyses demonstrate that the perimeter of trust of the Italian Public Administration collectively includes countries that are well beyond the control of the Italian government and provides several insights useful for implementing a centralized monitoring service aimed at detecting anomalies

    Genetic Programming Techniques in Engineering Applications

    Get PDF
    2012/2013Machine learning is a suite of techniques that allow developing algorithms for performing tasks by generalizing from examples. Machine learning systems, thus, may automatically synthesize programs from data. This approach is often feasible and cost-effective where manual programming or manual algorithm design is not. In the last decade techniques based on machine learning have spread in a broad range of application domains. In this thesis, we will present several novel applications of a specific machine Learning technique, called Genetic Programming, to a wide set of engineering applications grounded in real world problems. The problems treated in this work range from the automatic synthesis of regular expressions, to the generation of electricity price forecast, to the synthesis of a model for the tracheal pressure in mechanical ventilation. The results demonstrate that Genetic Programming is indeed a suitable tool for solving complex problems of practical interest. Furthermore, several results constitute a significant improvement over the existing state-of-the-art. The main contribution of this thesis is the design and implementation of a framework for the automatic inference of regular expressions from examples based on Genetic Programming. First, we will show the ability of such a framework to cope with the generation of regular expressions for solving text-extraction tasks from examples. We will experimentally assess our proposal comparing our results with previous proposals on a collection of real-world datasets. The results demonstrate a clear superiority of our approach. We have implemented the approach in a web application that has gained considerable interest and has reached peaks of more 10000 daily accesses. Then, we will apply the framework to a popular "regex golf" challenge, a competition for human players that are required to generate the shortest regular expression solving a given set of problems. Our results rank in the top 10 list of human players worldwide and outperform those generated by the only existing algorithm specialized to this purpose. Hence, we will perform an extensive experimental evaluation in order to compare our proposal to the state-of-the-art proposal in a very close and long-established research field: the generation of a Deterministic Finite Automata (DFA) from a labelled set of examples. Our results demonstrate that the existing state-of-the-art in DFA learning is not suitable for text extraction tasks. We will also show a variant of our framework designed for solving text processing tasks of the search-and-replace form. A common way to automate search-and-replace is to describe the region to be modified and the desired changes through a regular expression and a replacement expression. We will propose a solution to automatically produce both those expressions based only on examples provided by user. We will experimentally assess our proposal on real-word search-and-replace tasks. The results indicate that our proposal is indeed feasible. Finally, we will study the applicability of our framework to the generation of schema based on a sample of the eXtensible Markup Language documents. The eXtensible Markup Language documents are largely used in machine-to-machine interactions and such interactions often require that some constraints are applied to the contents of the documents. These constraints are usually specified in a separate document which is often unavailable or missing. In order to generate a missing schema, we will apply and will evaluate experimentally our framework to solve this problem. In the final part of this thesis we will describe two significant applications from different domains. We will describe a forecasting system for producing estimates of the next day electricity price. The system is based on a combination of a predictor based on Genetic Programming and a classifier based on Neural Networks. Key feature of this system is the ability of handling outliers-i.e., values rarely seen during the learning phase. We will compare our results with a challenging baseline representative of the state-of-the-art. We will show that our proposal exhibits smaller prediction error than the baseline. Finally, we will move to a biomedical problem: estimating tracheal pressure in a patient treated with high-frequency percussive ventilation. High-frequency percussive ventilation is a new and promising non-conventional mechanical ventilatory strategy. In order to avoid barotrauma and volutrauma in patience, the pressure of air insufflated must be monitored carefully. Since measuring the tracheal pressure is difficult, a model for accurately estimating the tracheal pressure is required. We will propose a synthesis of such model by means of Genetic Programming and we will compare our results with the state-of-the-art.XXVI Ciclo198

    Automatic Face Annotation in News Images by Mining the Web

    No full text
    We consider the automatic annotation of faces of people mentioned in news. News stories provide a constant flow of potentially useful image indexing information, due to their huge diffusion on the web and to the involvement of human operators in selecting relevant images for the stories. In this work we investigate the possibility of actually exploiting this wealth of information. We propose and evaluate a system for automatic face annotation of image news that is fully unsupervised and does not require any prior knowledge about topic or people involved. Key feature of our proposal is that it attempts to identify the essential piece of information---how a person with a given name looks like---by querying popular image search engines. Mining the web allows overcoming intrinsic limitations of approaches built above a predefined collection of stories: our system can potentially annotate people never handled before since its knowledge base is constantly expanded, as long as search engines keep on indexing the web. On the other hand, leveraging on image search engines forces to cope with the substantial amount of noise in search engine results. Our contribution shows experimentally that automatic face annotation may indeed be achieved based entirely on knowledge that lives in the web
    corecore