30 research outputs found

    Feasible, Robust and Reliable Automation and Control for Autonomous Systems

    Get PDF
    The Special Issue book focuses on highlighting current research and developments in the automation and control field for autonomous systems as well as showcasing state-of-the-art control strategy approaches for autonomous platforms. The book is co-edited by distinguished international control system experts currently based in Sweden, the United States of America, and the United Kingdom, with contributions from reputable researchers from China, Austria, France, the United States of America, Poland, and Hungary, among many others. The editors believe the ten articles published within this Special Issue will be highly appealing to control-systems-related researchers in applications typified in the fields of ground, aerial, maritime vehicles, and robotics as well as industrial audiences

    Machine Learning and Security of Non-Executable Files

    Get PDF
    Computer malware is a well-known threat in security which, despite the enormous time and effort invested in fighting it, is today more prevalent than ever. Recent years have brought a surge in one particular type: malware embedded in non-executable file formats, e.g., PDF, SWF and various office file formats. The result has been a massive number of infections, owed primarily to the trust that ordinary computer users have in these file formats. In addition, their feature-richness and implementation complexity have created enormous attack surfaces in widely deployed client software, resulting in regular discoveries of new vulnerabilities. The traditional approach to malware detection – signature matching, heuristics and behavioral profiling – has from its inception been a labor-intensive manual task, always lagging one step behind the attacker. With the exponential growth of computers and networks, malware has become more diverse, wide-spread and adaptive than ever, scaling much faster than the available talent pool of human malware analysts. An automated and scalable approach is needed to fill the gap between automated malware adaptation and manual malware detection, and machine learning is emerging as a viable solution. Its branch called adversarial machine learning studies the security of machine learning algorithms and the special conditions that arise when machine learning is applied for security. This thesis is a study of adversarial machine learning in the context of static detection of malware in non-executable file formats. It evaluates the effectiveness, efficiency and security of machine learning applications in this context. To this end, it introduces 3 data-driven detection methods developed using very large, high quality datasets. PJScan detects malicious PDF files based on lexical properties of embedded JavaScript code and is the fastest method published to date. SL2013 extends its coverage to all PDF files, regardless of JavaScript presence, by analyzing the hierarchical structure of PDF logical building blocks and demonstrates excellent performance in a novel long-term realistic experiment. Finally, Hidost generalizes the hierarchical-structure-based feature set to become the first machine-learning-based malware detector operating on multiple file formats. In a comprehensive experimental evaluation on PDF and SWF, it outperforms other academic methods and commercial antivirus systems in detection effectiveness. Furthermore, the thesis presents a framework for security evaluation of machine learning classifiers in a case study performed on an independent PDF malware detector. The results show that the ability to manipulate a part of the classifier’s feature set allows a malicious adversary to disguise malware so that it appears benign to the classifier with a high success rate. The presented methods are released as open-source software.Schadsoftware ist eine gut bekannte Sicherheitsbedrohung. Trotz der enormen Zeit und des Aufwands die investiert werden, um sie zu beseitigen, ist sie heute weiter verbreitet als je zuvor. In den letzten Jahren kam es zu einem starken Anstieg von Schadsoftware, welche in nicht-ausführbaren Dateiformaten, wie PDF, SWF und diversen Office-Formaten, eingebettet ist. Die Folge war eine massive Anzahl von Infektionen, ermöglicht durch das Vertrauen, das normale Rechnerbenutzer in diese Dateiformate haben. Außerdem hat die Komplexität und Vielseitigkeit dieser Dateiformate große Angriffsflächen in weitverbreiteter Klient-Software verursacht, und neue Sicherheitslücken werden regelmäßig entdeckt. Der traditionelle Ansatz zur Erkennung von Schadsoftware – Mustererkennung, Heuristiken und Verhaltensanalyse – war vom Anfang an eine äußerst mühevolle Handarbeit, immer einen Schritt hinter den Angreifern zurück. Mit dem exponentiellen Wachstum von Rechenleistung und Netzwerkgeschwindigkeit ist Schadsoftware diverser, zahlreicher und schneller-anpassend geworden als je zuvor, doch die Verfügbarkeit von menschlichen Schadsoftware-Analysten kann nicht so schnell skalieren. Ein automatischer und skalierbarer Ansatz ist gefragt, und maschinelles Lernen tritt als eine brauchbare Lösung hervor. Ein Bereich davon, Adversarial Machine Learning, untersucht die Sicherheit von maschinellen Lernverfahren und die besonderen Verhältnisse, die bei der Anwendung von machinellem Lernen für Sicherheit entstehen. Diese Arbeit ist eine Studie von Adversarial Machine Learning im Kontext statischer Schadsoftware-Erkennung in nicht-ausführbaren Dateiformaten. Sie evaluiert die Wirksamkeit, Leistungsfähigkeit und Sicherheit von maschinellem Lernen in diesem Kontext. Zu diesem Zweck stellt sie 3 datengesteuerte Erkennungsmethoden vor, die alle auf sehr großen und diversen Datensätzen entwickelt wurden. PJScan erkennt bösartige PDF-Dateien anhand lexikalischer Eigenschaften von eingebettetem JavaScript-Code und ist die schnellste bisher veröffentliche Methode. SL2013 erweitert die Erkennung auf alle PDF-Dateien, unabhängig davon, ob sie JavaScript enthalten, indem es die hierarchische Struktur von logischen PDF-Bausteinen analysiert. Es zeigt hervorragende Leistung in einem neuen, langfristigen und realistischen Experiment. Schließlich generalisiert Hidost den auf hierarchischen Strukturen basierten Merkmalsraum und wurde zum ersten auf maschinellem Lernen basierten Schadsoftware-Erkennungssystem, das auf mehreren Dateiformaten anwendbar ist. In einer umfassenden experimentellen Evaulierung auf PDF- und SWF-Formaten schlägt es andere akademische Methoden und kommerzielle Antiviren-Lösungen bezüglich Erkennungswirksamkeit. Überdies stellt diese Doktorarbeit ein Framework für Sicherheits-Evaluierung von auf machinellem Lernen basierten Klassifikatoren vor und wendet es in einer Fallstudie auf eine unabhängige akademische Schadsoftware-Erkennungsmethode an. Die Ergebnisse zeigen, dass die Fähigkeit, nur einen Teil von Features, die ein Klasifikator verwendet, zu manipulieren, einem Angreifer ermöglicht, Schadsoftware in Dateien so einzubetten, dass sie von der Erkennungsmethode mit hoher Erfolgsrate als gutartig fehlklassifiziert wird. Die vorgestellten Methoden wurden als Open-Source-Software veröffentlicht

    A Framework for Semantic Similarity Measures to enhance Knowledge Graph Quality

    Get PDF
    Precisely determining similarity values among real-world entities becomes a building block for data driven tasks, e.g., ranking, relation discovery or integration. Semantic Web and Linked Data initiatives have promoted the publication of large semi-structured datasets in form of knowledge graphs. Knowledge graphs encode semantics that describes resources in terms of several aspects or resource characteristics, e.g., neighbors, class hierarchies or attributes. Existing similarity measures take into account these aspects in isolation, which may prevent them from delivering accurate similarity values. In this thesis, the relevant resource characteristics to determine accurately similarity values are identified and considered in a cumulative way in a framework of four similarity measures. Additionally, the impact of considering these resource characteristics during the computation of similarity values is analyzed in three data-driven tasks for the enhancement of knowledge graph quality. First, according to the identified resource characteristics, new similarity measures able to combine two or more of them are described. In total four similarity measures are presented in an evolutionary order. While the first three similarity measures, OnSim, IC-OnSim and GADES, combine the resource characteristics according to a human defined aggregation function, the last one, GARUM, makes use of a machine learning regression approach to determine the relevance of each resource characteristic during the computation of the similarity. Second, the suitability of each measure for real-time applications is studied by means of a theoretical and an empirical comparison. The theoretical comparison consists on a study of the worst case computational complexity of each similarity measure. The empirical comparison is based on the execution times of the different similarity measures in two third-party benchmarks involving the comparison of semantically annotated entities. Ultimately, the impact of the described similarity measures is shown in three data-driven tasks for the enhancement of knowledge graph quality: relation discovery, dataset integration and evolution analysis of annotation datasets. Empirical results show that relation discovery and dataset integration tasks obtain better results when considering semantics encoded in semantic similarity measures. Further, using semantic similarity measures in the evolution analysis tasks allows for defining new informative metrics able to give an overview of the evolution of the whole annotation set, instead of the individual annotations like state-of-the-art evolution analysis frameworks

    SUMMARIZATION AND VISUALIZATION OF DIGITAL CONVERSATIONS

    Get PDF
    Digital conversations are all around us: recorded meetings, television debates, instant messaging, blogs, and discussion forums. With this work, we present some solutions for the condensation and distillation of content from digital conversation based on advanced language technology. At the core of this technology we have argumentative analysis, which allow us to produce high-quality text summaries and intuitive graphical visualizations of conversational content enabling easier and faster access to digital conversations

    Monitoring oriented agile based web applications development methodology for small software firms in Jordan

    Get PDF
    Small software firms (SSF) is vital to the software industry in many countries as they provide substantial growth to their economy. In Jordan, most software companies that are involved with developing Web applications are small firms. However, the extent of applying best Web applications development and management practices in these firms is limited. Besides, the existing software development methods are still lack of monitoring the quality of process and product. As a result, the Web application being developed exceeds deadlines and budget, and not meeting user requirements. Therefore, this research aims to construct a new methodology referred as Monitoring Oriented Agile Based Web Applications Development (MOGWD) Methodology for SSF. This study introduced an Extended Agile Method by extending the Scrum method with Extreme Programming (XP) elements. The Extended Agile Method was improved by combining common steps of Web design method and incorporating the Goal Oriented Monitoring Method (GOMM). The GOMM has defined twenty goals. Each goal has one or more questions. The questions are answered through the defined metrics. There are 101 qualitative metrics for monitoring the process quality, and 37 quantitative metrics for monitoring the process and product quality. Moreover, the proposed MOGWD methodology defines four phases: Plan, Do, Check and Act. The MOGWD methodology was evaluated using expert review and case study. The evaluation results show that the MOGWD methodology has gained SSF practitioners’ satisfaction and found to be practical for the real environment. This study contributes to the field of Agile based development and Web applications measurement. It also provides SSF practitioners a development methodology that monitors the quality of the process and product for Web development

    Un système de recherche d'information personnalisée basé sur la modélisation multidimensionnelle de l'utilisateur

    Get PDF
    Depuis l'explosion du Web, la Recherche d'Information (RI) s'est vue étendue et les moteurs de recherche sur le Web ont vu le jour. Les méthodes classiques de la RI, surtout destinées à des recherches textuelles simples, se sont retrouvées face à des documents de différents formats et des contenus riches. L'utilisateur, en réponse à cette avancée, est devenu plus exigeant quant aux résultats retournés par les systèmes de RI. La personnalisation tente de répondre à ces exigences en ayant pour objectif principal l'amélioration des résultats retournés à l'utilisateur en fonction de sa perception et de ses intérêts ainsi que de ses préférences. Le présent travail de thèse se situe à la croisée des différents aspects présentés et couvre cette problématique. Elle a pour objectif principal de proposer des solutions nouvelles et efficaces à cette problématique. Pour atteindre cet objectif, un système de personnalisation de la recherche spatiale et sémantique sur le Web et intégrant la modélisation de l'utilisateur, a été proposé. Ce système comprend deux volets : 1/ la modélisation de l'utilisateur ; 2/ la collaboration implicite des utilisateurs à travers la construction d'un réseau de modèles utilisateurs, construit itérativement lors des différentes recherches effectuées en ligne. Un prototype supportant le système proposé a été développé afin d'expérimenter et d'évaluer l'ensemble de la proposition. Ainsi, nous avons effectué un ensemble d'évaluation, dont les principales sont : a) l'évaluation de la qualité du modèle de l'utilisateur ; b) l'évaluation de l'efficacité de la recherche d'information ; c) l évaluation de l'efficacité de la recherche d'information intégrant les informations spatiales ; d) l'évaluation de la recherche exploitant le réseau d'utilisateurs. Les expérimentations menées montrent une amélioration de la personnalisation des résultats présentés par rapport à ceux obtenus par d'autres moteurs de recherche.The web explosion has led Information Retrieval (IR) to be extended and web search engines emergence. The conventional IR methods, usually intended for simple textual searches, faced new documents types and rich and scalable contents. The users, facing these evolutions, ask more for IR systems search results quality. In this context, the personalization main objective is improving results returned to the end user based sing on its perception and its interests and preferences. This thesis context is concerned with these different aspects. Its main objective is to propose new and effective solutions to the personalization problem. To achieve this goal, a spatial and semantic web personalization system integrating implicit user modeling is proposed. This system has two components: 1/ user modeling; /2 implicit users' collaboration through the construction of a users' models network. A system prototype was developed for the evaluation purpose that contains: a) user model quality evaluation; b) information retrieval quality evaluation; c) information retrieval quality evaluation with the spatial user model data; d) information retrieval quality evaluation with the whole user model data and the users' models network. Experiments showed amelioration in the personalized search results compared to a baseline web search.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF
    corecore