22 research outputs found

    Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata

    Full text link

    Maximum Likelihood Associative Memories

    Full text link
    Associative memories are structures that store data in such a way that it can later be retrieved given only a part of its content -- a sort-of error/erasure-resilience property. They are used in applications ranging from caches and memory management in CPUs to database engines. In this work we study associative memories built on the maximum likelihood principle. We derive minimum residual error rates when the data stored comes from a uniform binary source. Second, we determine the minimum amount of memory required to store the same data. Finally, we bound the computational complexity for message retrieval. We then compare these bounds with two existing associative memory architectures: the celebrated Hopfield neural networks and a neural network architecture introduced more recently by Gripon and Berrou

    Multi-key Security: The Even-Mansour Construction Revisited

    Get PDF
    International audienceAt ASIACRYPT 1991, Even and Mansour introduced a block cipher construction based on a single permutation. Their construction has since been lauded for its simplicity, yet also criticized for not providing the same security as other block ciphers against generic attacks. In this paper, we prove that if a small number of plaintexts are encrypted under multiple independent keys, the Even-Mansour construction surprisingly offers similar security as an ideal block cipher with the same block and key size. Note that this multi-key setting is of high practical relevance, as real-world implementations often allow frequent rekeying. We hope that the results in this paper will further encourage the use of the Even-Mansour construction, especially when a secure and efficient implementation of a key schedule would result in significant overhead

    Human-artificial intelligence approaches for secure analysis in CAPTCHA codes

    Get PDF
    CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) has long been used to keep automated bots from misusing web services by leveraging human-artificial intelligence (HAI) interactions to distinguish whether the user is a human or a computer program. Various CAPTCHA schemes have been proposed over the years, principally to increase usability and security against emerging bots and hackers performing malicious operations. However, automated attacks have effectively cracked all common conventional schemes, and the majority of present CAPTCHA methods are also vulnerable to human-assisted relay attacks. Invisible reCAPTCHA and some approaches have not yet been cracked. However, with the introduction of fourth-generation bots accurately mimicking human behavior, a secure CAPTCHA would be hardly designed without additional special devices. Almost all cognitive-based CAPTCHAs with sensor support have not yet been compromised by automated attacks. However, they are still compromised to human-assisted relay attacks due to having a limited number of challenges and can be only solved using trusted devices. Obviously, cognitive-based CAPTCHA schemes have an advantage over other schemes in the race against security attacks. In this study, as a strong starting point for creating future secure and usable CAPTCHA schemes, we have offered an overview analysis of HAI between computer users and computers under the security aspects of open problems, difficulties, and opportunities of current CAPTCHA schemes.Web of Science20221art. no.

    Machine Learning and Security of Non-Executable Files

    Get PDF
    Computer malware is a well-known threat in security which, despite the enormous time and effort invested in fighting it, is today more prevalent than ever. Recent years have brought a surge in one particular type: malware embedded in non-executable file formats, e.g., PDF, SWF and various office file formats. The result has been a massive number of infections, owed primarily to the trust that ordinary computer users have in these file formats. In addition, their feature-richness and implementation complexity have created enormous attack surfaces in widely deployed client software, resulting in regular discoveries of new vulnerabilities. The traditional approach to malware detection – signature matching, heuristics and behavioral profiling – has from its inception been a labor-intensive manual task, always lagging one step behind the attacker. With the exponential growth of computers and networks, malware has become more diverse, wide-spread and adaptive than ever, scaling much faster than the available talent pool of human malware analysts. An automated and scalable approach is needed to fill the gap between automated malware adaptation and manual malware detection, and machine learning is emerging as a viable solution. Its branch called adversarial machine learning studies the security of machine learning algorithms and the special conditions that arise when machine learning is applied for security. This thesis is a study of adversarial machine learning in the context of static detection of malware in non-executable file formats. It evaluates the effectiveness, efficiency and security of machine learning applications in this context. To this end, it introduces 3 data-driven detection methods developed using very large, high quality datasets. PJScan detects malicious PDF files based on lexical properties of embedded JavaScript code and is the fastest method published to date. SL2013 extends its coverage to all PDF files, regardless of JavaScript presence, by analyzing the hierarchical structure of PDF logical building blocks and demonstrates excellent performance in a novel long-term realistic experiment. Finally, Hidost generalizes the hierarchical-structure-based feature set to become the first machine-learning-based malware detector operating on multiple file formats. In a comprehensive experimental evaluation on PDF and SWF, it outperforms other academic methods and commercial antivirus systems in detection effectiveness. Furthermore, the thesis presents a framework for security evaluation of machine learning classifiers in a case study performed on an independent PDF malware detector. The results show that the ability to manipulate a part of the classifier’s feature set allows a malicious adversary to disguise malware so that it appears benign to the classifier with a high success rate. The presented methods are released as open-source software.Schadsoftware ist eine gut bekannte Sicherheitsbedrohung. Trotz der enormen Zeit und des Aufwands die investiert werden, um sie zu beseitigen, ist sie heute weiter verbreitet als je zuvor. In den letzten Jahren kam es zu einem starken Anstieg von Schadsoftware, welche in nicht-ausfĂŒhrbaren Dateiformaten, wie PDF, SWF und diversen Office-Formaten, eingebettet ist. Die Folge war eine massive Anzahl von Infektionen, ermöglicht durch das Vertrauen, das normale Rechnerbenutzer in diese Dateiformate haben. Außerdem hat die KomplexitĂ€t und Vielseitigkeit dieser Dateiformate große AngriffsflĂ€chen in weitverbreiteter Klient-Software verursacht, und neue SicherheitslĂŒcken werden regelmĂ€ĂŸig entdeckt. Der traditionelle Ansatz zur Erkennung von Schadsoftware – Mustererkennung, Heuristiken und Verhaltensanalyse – war vom Anfang an eine Ă€ußerst mĂŒhevolle Handarbeit, immer einen Schritt hinter den Angreifern zurĂŒck. Mit dem exponentiellen Wachstum von Rechenleistung und Netzwerkgeschwindigkeit ist Schadsoftware diverser, zahlreicher und schneller-anpassend geworden als je zuvor, doch die VerfĂŒgbarkeit von menschlichen Schadsoftware-Analysten kann nicht so schnell skalieren. Ein automatischer und skalierbarer Ansatz ist gefragt, und maschinelles Lernen tritt als eine brauchbare Lösung hervor. Ein Bereich davon, Adversarial Machine Learning, untersucht die Sicherheit von maschinellen Lernverfahren und die besonderen VerhĂ€ltnisse, die bei der Anwendung von machinellem Lernen fĂŒr Sicherheit entstehen. Diese Arbeit ist eine Studie von Adversarial Machine Learning im Kontext statischer Schadsoftware-Erkennung in nicht-ausfĂŒhrbaren Dateiformaten. Sie evaluiert die Wirksamkeit, LeistungsfĂ€higkeit und Sicherheit von maschinellem Lernen in diesem Kontext. Zu diesem Zweck stellt sie 3 datengesteuerte Erkennungsmethoden vor, die alle auf sehr großen und diversen DatensĂ€tzen entwickelt wurden. PJScan erkennt bösartige PDF-Dateien anhand lexikalischer Eigenschaften von eingebettetem JavaScript-Code und ist die schnellste bisher veröffentliche Methode. SL2013 erweitert die Erkennung auf alle PDF-Dateien, unabhĂ€ngig davon, ob sie JavaScript enthalten, indem es die hierarchische Struktur von logischen PDF-Bausteinen analysiert. Es zeigt hervorragende Leistung in einem neuen, langfristigen und realistischen Experiment. Schließlich generalisiert Hidost den auf hierarchischen Strukturen basierten Merkmalsraum und wurde zum ersten auf maschinellem Lernen basierten Schadsoftware-Erkennungssystem, das auf mehreren Dateiformaten anwendbar ist. In einer umfassenden experimentellen Evaulierung auf PDF- und SWF-Formaten schlĂ€gt es andere akademische Methoden und kommerzielle Antiviren-Lösungen bezĂŒglich Erkennungswirksamkeit. Überdies stellt diese Doktorarbeit ein Framework fĂŒr Sicherheits-Evaluierung von auf machinellem Lernen basierten Klassifikatoren vor und wendet es in einer Fallstudie auf eine unabhĂ€ngige akademische Schadsoftware-Erkennungsmethode an. Die Ergebnisse zeigen, dass die FĂ€higkeit, nur einen Teil von Features, die ein Klasifikator verwendet, zu manipulieren, einem Angreifer ermöglicht, Schadsoftware in Dateien so einzubetten, dass sie von der Erkennungsmethode mit hoher Erfolgsrate als gutartig fehlklassifiziert wird. Die vorgestellten Methoden wurden als Open-Source-Software veröffentlicht

    Data-Driven Anomaly Detection in Industrial Networks

    Get PDF
    Since the conception of the first Programmable Logic Controllers (PLCs) in the 1960s, Industrial Control Systems (ICSs) have evolved vastly. From the primitive isolated setups, ICSs have become increasingly interconnected, slowly forming the complex networked environments, collectively known as Industrial Networks (INs), that we know today. Since ICSs are responsible for a wide range of physical processes, including those belonging to Critical Infrastructures (CIs), securing INs is vital for the well-being of modern societies. Out of the many research advances on the field, Anomaly Detection Systems (ADSs) play a prominent role. These systems monitor IN and/or ICS behavior to detect abnormal events, known or unknown. However, as the complexity of INs has increased, monitoring them in the search of anomalous trends has effectively become a Big Data problem. In other words, IN data has become too complex to process it by traditional means, due to its large scale, diversity and generation speeds. Nevertheless, ADSs designed for INs have not evolved at the same pace, and recent proposals are not designed to handle this data complexity, as they do not scale well or do not leverage the majority of the data types created in INs. This thesis aims to fill that gap, by presenting two main contributions: (i) a visual flow monitoring system and (ii) a multivariate ADS that is able to tackle data heterogeneity and to scale efficiently. For the flow monitor, we propose a system that, based on current flow data, builds security visualizations depicting network behavior while highlighting anomalies. For the multivariate ADS, we analyze the performance of Multivariate Statistical Process Control (MSPC) for detecting and diagnosing anomalies, and later we present a Big Data, MSPCinspired ADS that monitors field and network data to detect anomalies. The approaches are experimentally validated by building INs in test environments and analyzing the data created by them. Based on this necessity for conducting IN security research in a rigorous and reproducible environment, we also propose the design of a testbed that serves this purpose
    corecore