27 research outputs found

    Antidefacement

    Get PDF
    Internet connects around three billions of users worldwide, a number increasing every day. Thanks to this technology, people, companies and devices perform several tasks, such as information broadcasting through websites. Because of the large volumes of sensitive information and the lack of security in the websites, the number of attacks on these applications has been increasing significantly. Attacks on websites have different purposes, one of these is the introduction of unauthorized modifications (defacement). Defacement is an issue which involves impacts on both, system users and company image, thus, the researchers community has been working on solutions to reduce security risks. This paper presents an introduction to the state of the art about techniques, methodologies and solutions proposed by both, the researchers community and the computer security industry

    New strategies for efficient and practical genetic programming.

    Get PDF
    2006/2007In the last decades, engineers and decision makers expressed a growing interest in the development of effective modeling and simulation methods to understand or predict the behavior of many phenomena in science and engineering. Many of these phenomena are translated in mathematical models for convenience and to carry out an easy interpretation. Methods commonly employed for this purpose include, for example, Neural Networks, Simulated Annealing, Genetic Algorithms, Tabu search, and so on. These methods all seek for the optimal or near optimal values of a predefined set of parameters of a model built a priori. But in this case, a suitable model should be known beforehand. When the form of this model cannot be found, the problem can be seen from another level where the goal is to find a program or a mathematical representation which can solve the problem. According to this idea the modeling step is performed automatically thanks to a quality criterion which drives the building process. In this thesis, we focus on the Genetic Programming (GP) approach as an automatic method for creating computer programs by means of artificial evolution based upon the original contributions of Darwin and Mendel. While GP has proven to be a powerful means for coping with problems in which finding a solution and its representation is difficult, its practical applicability is still severely limited by several factors. First, the GP approach is inherently a stochastic process. It means there is no guarantee to obtain a satisfactory solution at the end of the evolutionary loop. Second, the performances on a given problem may be strongly dependent on a broad range of parameters, including the number of variables involved, the quantity of data for each variable, the size and composition of the initial population, the number of generations and so on. On the contrary, when one uses Genetic Programming to solve a problem, he has two expectancies: on the one hand, maximize the probability to obtain an acceptable solution, and on the other hand, minimize the amount of computational resources to get this solution. Initially we present innovative and challenging applications related to several fields in science (computer science and mechanical science) which participate greatly in the experience gained in the GP field. Then we propose new strategies for improving the performances of the GP approach in terms of efficiency and accuracy. We probe our approach on a large set of benchmark problems in three different domains. Furthermore we introduce a new approach based on GP dedicated to symbolic regression of multivariate data-sets where the underlying phenomenon is best characterized by a discontinuous function. These contributions aim to provide a better understanding of the key features and the underlying relationships which make enhancements successful in improving the original algorithm.Negli ultimi anni, ingegneri e progettisti hanno espresso un interesse crescente nello sviluppo di nuovi metodi di simulazione e di modellazione per comprendere e predire il comportamento di diversi fenomeni sia in ambito scientifico che ingegneristico. Molti di questi fenomeni vengono descritti attraverso modelli matematici che ne facilitano l'interpretazione. A questo fine, i metodi più comunemente impiegati sono, le tecniche basate sui Reti Neurali, Simulated Annealing, gli Algoritmi Genetici, la ricerca Tabu, ecc. Questi metodi vanno a determinare i valori ottimali o quasi ottimali dei parametri di un modello costruito a priori. E evidente che in tal caso, si dovrebbe conoscere in anticipo un modello idoneo. Quando ciò non è possibile, il problema deve essere considerato da un altro punto di vista: l'obiettivo è trovare un programma o una rappresentazione matematica che possano risolvere il problema. A questo scopo, la fase di modellazione è svolta automaticamente in funzione di un criterio qualitativo che guida il processo di ricerca. Il tema di ricerca di questa tesi è la programmazione genetica (“Genetic Programming” che chiameremo GP) e le sue applicazioni. La programmazione genetica si può definire come un metodo automatico per la generazione di programmi attraverso una simulazione artificiale dei principi relativi all'evoluzione naturale basata sui contributi originali di Darwin e di Mendel. La programmazione genetica ha dimostrato di essere un potente mezzo per affrontare quei problemi in cui trovare una soluzione e la sua rappresentazione è difficile. Però la sua applicabilità rimane severamente limitata da diversi fattori. In primo luogo, il metodo GP è inerentemente un processo stocastico. Ciò significa che non garantisce che una soluzione soddisfacente sarà trovata alla fine del ciclo evolutivo. In secondo luogo, le prestazioni su un dato problema dipendono fortemente da una vasta gamma di parametri, compresi il numero di variabili impiegate, la quantità di dati per ogni variabile, la dimensione e la composizione della popolazione iniziale, il numero di generazioni e così via. Al contrario, un utente della programmazione genetica ha due aspettative: da una parte, massimizzare la probabilità di ottenere una soluzione accettabile, e dall'altra, minimizzare la quantità di risorse di calcolo per ottenerla. Nella fase iniziale di questo lavoro sono state considerate delle applicazioni particolarmente innovative relative a diversi campi della scienza (informatica e meccanica) che hanno contributo notevolmente all'esperienza acquisita nel campo della programmazione genetica. In questa tesi si propone un nuovo procedimento con lo scopo di migliorare le prestazioni della programmazione genetica in termini di efficienza ed accuratezza. Abbiamo testato il nostro approccio su un ampio insieme di benchmarks in tre domini applicativi diversi. Si propone inoltre una tecnica basata sul GP per la regressione simbolica di data-set multivariati dove il fenomeno di fondo è caratterizzato da una funzione discontinua. Questi contributi cercano di fornire una comprensione migliore degli elementi chiave e dei meccanismi interni che hanno consentito il miglioramento dell'algoritmo originale.XX Ciclo198

    Techniques for large-scale automatic detection of web site defacements.

    Get PDF
    2006/2007Web site defacement, the process of introducing unauthorized modifications to a web site, is a very common form of attack. This thesis describes the design and experimental evaluation of a framework that may constitute the basis for a defacement detection service capable of monitoring thousands of remote web sites sistematically and automatically. With this framework an organization may join the service by simply providing the URL of the resource to be monitored along with the contact point of an administrator. The monitored organization may thus take advantage of the service with just a few mouse clicks, without installing any software locally nor changing its own daily operational processes. The main proposed approach is based on anomaly detection and allows monitoring the integrity of many remote web resources automatically while remaining fully decoupled from them, in particular, without requiring any prior knowledge about those resources. During a preliminary learning phase a profile of the monitored resource is built automatically. Then, while monitoring, the remote resource is retrieved periodically and an alert is generated whenever something "unusual" shows up. The thesis discusses about the effectiveness of the approach in terms of accuracy of detection---i.e., missed detections and false alarms. The thesis also considers the problem of misclassified readings in the learning set. The effectiveness of anomaly detection approach, and hence of the proposed framework, bases on the assumption that the profile is computed starting from a learning set which is not corrupted by attacks; this assumption is often taken for granted. The influence of leaning set corruption on our framework effectiveness is assessed and a procedure aimed at discovering when a given unknown learning set is corrupted by positive readings is proposed and evaluated experimentally. An approach to automatic defacement detection based on Genetic Programming (GP), an automatic method for creating computer programs by means of artificial evolution, is proposed and evaluated experimentally. Moreover, a set of techniques that have been used in literature for designing several host-based or network-based Intrusion Detection Systems are considered and evaluated experimentally, in comparison with the proposed approach. Finally, the thesis presents the findings of a large-scale study on reaction time to web site defacement. There exist several statistics that indicate the number of incidents of this sort but there is a crucial piece of information still lacking: the typical duration of a defacement. A two months monitoring activity has been performed over more than 62000 defacements in order to figure out whether and when a reaction to the defacement is taken. It is shown that such time tends to be unacceptably long---in the order of several days---and with a long-tailed distribution.Il web site defacement, che consiste nell'introdurre modifiche non autorizzate ad un sito web, è una forma di attacco molto comune. Questa tesi descrive il progetto, la realizzazione e la valutazione sperimentale di una sistema che può costituire la base per un servizio capace di monitorare migliaia di siti web remoti in maniera sistematica e automatica. Con questo sistema un'organizzazione può avvalersi del servizio semplicemente fornendo l'URL della risorsa da monitorare e un punto di contatto per l'amministratore. L'organizzazione monitorata può quindi avvantaggiarsi del servizio con pochi click del mouse, senza dover installare nessun software in locale e senza dover cambiare le sue attività quotidiane. Il principale approccio proposto è basato sull'anomaly detection e permette di monitorare l'integrita di molte risorse web remote automaticamente rimanendo completamente distaccato da queste e, in particolare, non richiedendo nessuna conoscenza a priori delle stesse. Durante una fase preliminare di apprendimento viene generato automaticamente un profilo della risorsa. Successivamente, durante il monitoraggio, la risorsa è controllata periodicamente ed un allarme viene generato quando qualcosa di "unusuale" si manifesta. La tesi prende in considerazione l'efficacia dell'approccio in termini di accuratezza di rilevamento---cioè, attacchi non rilevati e falsi allarmi generati. La tesi considera anche il problema dei reading mal classificati presenti nel learning set. L'efficiacia dell'approccio anomaly detection, e quindi del sistema proposto, si basa sull'ipotesi che il profilo è generato a partire da un learning set che non è corrotto dalla presenza di attacchi; questa ipotesi viene spesso data per vera. Viene quantificata l'influenza della presenza di reading corrotti sull'efficacia del sistema proposto e viene proposta e valutata sperimentalmente una procedura atta a rilevare quando un learning set ignoto è corrotto dalla presenza di reading positivi. Viene proposto e valutato sperimentalmente un approccio per la rilevazione automatica dei defacement basato sul Genetic Programming (GP), un metodo automatico per creare programmi in termini di evoluzione artificiale. Inoltre, vengono valutate sperimentalmente, in riferimento all'approccio proposto, un insieme di tecniche che sono state utilizzate per progettare Intrusion Detection Systems, sia host based che network-based. Infine, la tesi presenta i risultati di uno studio su larga scala sul tempo di reazione ai defacement. Ci sono diverse statistiche che indicano quale sia il numero di questo tipo di attacchi ma manca un'informazione molto importante: la durata tipica di un defacement. Si è effettuato un monitoraggio di oltre 62000 pagine defacciate per circa due mesi per scoprire se e quando viene presa una contromisura in seguito ad un defacement. Lo studio mostra che i tempi sono inaccettabilmente lunghi---dell'ordine di molti giorni---e con una distribuzione a coda lunga.XX Ciclo197

    Wide spectrum attribution: Using deception for attribution intelligence in cyber attacks

    Get PDF
    Modern cyber attacks have evolved considerably. The skill level required to conduct a cyber attack is low. Computing power is cheap, targets are diverse and plentiful. Point-and-click crimeware kits are widely circulated in the underground economy, while source code for sophisticated malware such as Stuxnet is available for all to download and repurpose. Despite decades of research into defensive techniques, such as firewalls, intrusion detection systems, anti-virus, code auditing, etc, the quantity of successful cyber attacks continues to increase, as does the number of vulnerabilities identified. Measures to identify perpetrators, known as attribution, have existed for as long as there have been cyber attacks. The most actively researched technical attribution techniques involve the marking and logging of network packets. These techniques are performed by network devices along the packet journey, which most often requires modification of existing router hardware and/or software, or the inclusion of additional devices. These modifications require wide-scale infrastructure changes that are not only complex and costly, but invoke legal, ethical and governance issues. The usefulness of these techniques is also often questioned, as attack actors use multiple stepping stones, often innocent systems that have been compromised, to mask the true source. As such, this thesis identifies that no publicly known previous work has been deployed on a wide-scale basis in the Internet infrastructure. This research investigates the use of an often overlooked tool for attribution: cyber de- ception. The main contribution of this work is a significant advancement in the field of deception and honeypots as technical attribution techniques. Specifically, the design and implementation of two novel honeypot approaches; i) Deception Inside Credential Engine (DICE), that uses policy and honeytokens to identify adversaries returning from different origins and ii) Adaptive Honeynet Framework (AHFW), an introspection and adaptive honeynet framework that uses actor-dependent triggers to modify the honeynet envi- ronment, to engage the adversary, increasing the quantity and diversity of interactions. The two approaches are based on a systematic review of the technical attribution litera- ture that was used to derive a set of requirements for honeypots as technical attribution techniques. Both approaches lead the way for further research in this field

    Genetic Programming Techniques in Engineering Applications

    Get PDF
    2012/2013Machine learning is a suite of techniques that allow developing algorithms for performing tasks by generalizing from examples. Machine learning systems, thus, may automatically synthesize programs from data. This approach is often feasible and cost-effective where manual programming or manual algorithm design is not. In the last decade techniques based on machine learning have spread in a broad range of application domains. In this thesis, we will present several novel applications of a specific machine Learning technique, called Genetic Programming, to a wide set of engineering applications grounded in real world problems. The problems treated in this work range from the automatic synthesis of regular expressions, to the generation of electricity price forecast, to the synthesis of a model for the tracheal pressure in mechanical ventilation. The results demonstrate that Genetic Programming is indeed a suitable tool for solving complex problems of practical interest. Furthermore, several results constitute a significant improvement over the existing state-of-the-art. The main contribution of this thesis is the design and implementation of a framework for the automatic inference of regular expressions from examples based on Genetic Programming. First, we will show the ability of such a framework to cope with the generation of regular expressions for solving text-extraction tasks from examples. We will experimentally assess our proposal comparing our results with previous proposals on a collection of real-world datasets. The results demonstrate a clear superiority of our approach. We have implemented the approach in a web application that has gained considerable interest and has reached peaks of more 10000 daily accesses. Then, we will apply the framework to a popular "regex golf" challenge, a competition for human players that are required to generate the shortest regular expression solving a given set of problems. Our results rank in the top 10 list of human players worldwide and outperform those generated by the only existing algorithm specialized to this purpose. Hence, we will perform an extensive experimental evaluation in order to compare our proposal to the state-of-the-art proposal in a very close and long-established research field: the generation of a Deterministic Finite Automata (DFA) from a labelled set of examples. Our results demonstrate that the existing state-of-the-art in DFA learning is not suitable for text extraction tasks. We will also show a variant of our framework designed for solving text processing tasks of the search-and-replace form. A common way to automate search-and-replace is to describe the region to be modified and the desired changes through a regular expression and a replacement expression. We will propose a solution to automatically produce both those expressions based only on examples provided by user. We will experimentally assess our proposal on real-word search-and-replace tasks. The results indicate that our proposal is indeed feasible. Finally, we will study the applicability of our framework to the generation of schema based on a sample of the eXtensible Markup Language documents. The eXtensible Markup Language documents are largely used in machine-to-machine interactions and such interactions often require that some constraints are applied to the contents of the documents. These constraints are usually specified in a separate document which is often unavailable or missing. In order to generate a missing schema, we will apply and will evaluate experimentally our framework to solve this problem. In the final part of this thesis we will describe two significant applications from different domains. We will describe a forecasting system for producing estimates of the next day electricity price. The system is based on a combination of a predictor based on Genetic Programming and a classifier based on Neural Networks. Key feature of this system is the ability of handling outliers-i.e., values rarely seen during the learning phase. We will compare our results with a challenging baseline representative of the state-of-the-art. We will show that our proposal exhibits smaller prediction error than the baseline. Finally, we will move to a biomedical problem: estimating tracheal pressure in a patient treated with high-frequency percussive ventilation. High-frequency percussive ventilation is a new and promising non-conventional mechanical ventilatory strategy. In order to avoid barotrauma and volutrauma in patience, the pressure of air insufflated must be monitored carefully. Since measuring the tracheal pressure is difficult, a model for accurately estimating the tracheal pressure is required. We will propose a synthesis of such model by means of Genetic Programming and we will compare our results with the state-of-the-art.XXVI Ciclo198
    corecore