43 research outputs found

    Efficient, Scalable, and Accurate Program Fingerprinting in Binary Code

    Get PDF
    Why was this binary written? Which compiler was used? Which free software packages did the developer use? Which sections of the code were borrowed? Who wrote the binary? These questions are of paramount importance to security analysts and reverse engineers, and binary fingerprinting approaches may provide valuable insights that can help answer them. This thesis advances the state of the art by addressing some of the most fundamental problems in program fingerprinting for binary code, notably, reusable binary code discovery, fingerprinting free open source software packages, and authorship attribution. First, to tackle the problem of discovering reusable binary code, we employ a technique for identifying reused functions by matching traces of a novel representation of binary code known as the semantic integrated graph. This graph enhances the control flow graph, the register flow graph, and the function call graph, key concepts from classical program analysis, and merges them with other structural information to create a joint data structure. Second, we approach the problem of fingerprinting free open source software (FOSS) packages by proposing a novel resilient and efficient system that incorporates three components. The first extracts the syntactical features of functions by considering opcode frequencies and performing a hidden Markov model statistical test. The second applies a neighborhood hash graph kernel to random walks derived from control flow graphs, with the goal of extracting the semantics of the functions. The third applies the z-score to normalized instructions to extract the behavior of the instructions in a function. Then, the components are integrated using a Bayesian network model which synthesizes the results to determine the FOSS function, making it possible to detect user-related functions. Third, with these elements now in place, we present a framework capable of decoupling binary program functionality from the coding habits of authors. To capture coding habits, the framework leverages a set of features that are based on collections of functionalityindependent choices made by authors during coding. Finally, it is well known that techniques such as refactoring and code transformations can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we design a technique that extracts the semantics of binary code in terms of both data and control flow. The proposed technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from code transformation, refactoring, and varying the compilers or compilation settings. Specifically, it employs data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). We evaluate the framework on large-scale datasets extracted from selected open source C++ projects on GitHub, Google Code Jam events, Planet Source Code contests, and students’ programming projects and found that it outperforms existing methods in several respects. First, it is able to detect the reused functions. Second, it can identify FOSS packages in real-world projects and reused binary functions with high precision. Third, it decouples authorship from functionality so that it can be applied to real malware binaries to automatically generate evidence of similar coding habits. Fourth, compared to existing research contributions, it successfully attributes a larger number of authors with a significantly higher accuracy. Finally, the new framework is more robust than previous methods in the sense that there is no significant drop in accuracy when the code is subjected to refactoring techniques, code transformation methods, and different compilers

    Software similarity and classification

    Full text link
    This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

    Survey on representation techniques for malware detection system

    Get PDF
    Malicious programs are malignant software’s designed by hackers or cyber offenders with a harmful intent to disrupt computer operation. In various researches, we found that the balance between designing an accurate architecture that can detect the malware and track several advanced techniques that malware creators apply to get variants of malware are always a difficult line. Hence the study of malware detection techniques has become more important and challenging within the security field. This review paper provides a detailed discussion and full reviews for various types of malware, malware detection techniques, various researches on them, malware analysis methods and different dynamic programmingbased tools that could be used to represent the malware sampled. We have provided a comprehensive bibliography in malware detection, its techniques and analysis methods for malware researchers

    A Geometric Approach for Deciphering Protein Structure from Cryo-EM Volumes

    Get PDF
    Electron Cryo-Microscopy or cryo-EM is an area that has received much attention in the recent past. Compared to the traditional methods of X-Ray Crystallography and NMR Spectroscopy, cryo-EM can be used to image much larger complexes, in many different conformations, and under a wide range of biochemical conditions. This is because it does not require the complex to be crystallisable. However, cryo-EM reconstructions are limited to intermediate resolutions, with the state-of-the-art being 3.6A, where secondary structure elements can be visually identified but not individual amino acid residues. This lack of atomic level resolution creates new computational challenges for protein structure identification. In this dissertation, we present a suite of geometric algorithms to address several aspects of protein modeling using cryo-EM density maps. Specifically, we develop novel methods to capture the shape of density volumes as geometric skeletons. We then use these skeletons to find secondary structure elements: SSEs) of a given protein, to identify the correspondence between these SSEs and those predicted from the primary sequence, and to register high-resolution protein structures onto the density volume. In addition, we designed and developed Gorgon, an interactive molecular modeling system, that integrates the above methods with other interactive routines to generate reliable and accurate protein backbone models

    Report on shape analysis and matching and on semantic matching

    No full text
    In GRAVITATE, two disparate specialities will come together in one working platform for the archaeologist: the fields of shape analysis, and of metadata search. These fields are relatively disjoint at the moment, and the research and development challenge of GRAVITATE is precisely to merge them for our chosen tasks. As shown in chapter 7 the small amount of literature that already attempts join 3D geometry and semantics is not related to the cultural heritage domain. Therefore, after the project is done, there should be a clear ‘before-GRAVITATE’ and ‘after-GRAVITATE’ split in how these two aspects of a cultural heritage artefact are treated.This state of the art report (SOTA) is ‘before-GRAVITATE’. Shape analysis and metadata description are described separately, as currently in the literature and we end the report with common recommendations in chapter 8 on possible or plausible cross-connections that suggest themselves. These considerations will be refined for the Roadmap for Research deliverable.Within the project, a jargon is developing in which ‘geometry’ stands for the physical properties of an artefact (not only its shape, but also its colour and material) and ‘metadata’ is used as a general shorthand for the semantic description of the provenance, location, ownership, classification, use etc. of the artefact. As we proceed in the project, we will find a need to refine those broad divisions, and find intermediate classes (such as a semantic description of certain colour patterns), but for now the terminology is convenient – not least because it highlights the interesting area where both aspects meet.On the ‘geometry’ side, the GRAVITATE partners are UVA, Technion, CNR/IMATI; on the metadata side, IT Innovation, British Museum and Cyprus Institute; the latter two of course also playing the role of internal users, and representatives of the Cultural Heritage (CH) data and target user’s group. CNR/IMATI’s experience in shape analysis and similarity will be an important bridge between the two worlds for geometry and metadata. The authorship and styles of this SOTA reflect these specialisms: the first part (chapters 3 and 4) purely by the geometry partners (mostly IMATI and UVA), the second part (chapters 5 and 6) by the metadata partners, especially IT Innovation while the joint overview on 3D geometry and semantics is mainly by IT Innovation and IMATI. The common section on Perspectives was written with the contribution of all

    Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP

    Get PDF
    Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software

    Elastic phone : towards detecting and mitigating computation and energy inefficiencies in mobile apps

    Get PDF
    Mobile devices have become ubiquitous and their ever evolving capabilities are bringing them closer to personal computers. Nonetheless, due to their mobility and small size factor constraints, they still present many hardware and software challenges. Their limited battery life time has led to the design of mobile networks that are inherently different from previous networks (e.g., wifi) and more restrictive task scheduling. Additionally, mobile device ecosystems are more susceptible to the heterogeneity of hardware and from conflicting interests of distributors, internet service providers, manufacturers, developers, etc. The high number of stakeholders ultimately responsible for the performance of a device, results in an inconsistent behavior and makes it very challenging to build a solution that improves resource usage in most cases. The focus of this thesis is on the study and development of techniques to detect and mitigate computation and energy inefficiencies in mobile apps. It follows a bottom-up approach, starting from the challenges behind detecting inefficient execution scheduling by looking only at apps’ implementations. It shows that scheduling APIs are largely misused and have a great impact on devices wake up frequency and on the efficiency of existing energy saving techniques (e.g., batching scheduled executions). Then it addresses many challenges of app testing in the dynamic analysis field. More specifically, how to scale mobile app testing with realistic user input and how to analyze closed source apps’ code at runtime, showing that introducing humans in the app testing loop improves the coverage of app’s code and generated network volume. Finally, using the combined knowledge of static and dynamic analysis, it focuses on the challenges of identifying the resource hungry sections of apps and how to improve their execution via offloading. There is a special focus on performing non-intrusive offloading transparent to existing apps and on in-network computation offloading and distribution. It shows that, even without a custom OS or app modifications, in-network offloading is still possible, greatly improving execution times, energy consumption and reducing both end-user experienced latency and request drop rates. It concludes with a real app measurement study, showing that a good portion of the most popular apps’ code can indeed be offloaded and proposes future directions for the app testing and computation offloading fields.Los dispositivos móviles se han tornado omnipresentes y sus capacidades están en constante evolución acercándolos a los computadoras personales. Sin embargo, debido a su movilidad y tamaño reducido, todavía presentan muchos desafíos de hardware y software. Su duración limitada de batería ha llevado al diseño de redes móviles que son inherentemente diferentes de las redes anteriores y una programación de tareas más restrictiva. Además, los ecosistemas de dispositivos móviles son más susceptibles a la heterogeneidad de hardware y los intereses conflictivos de las entidades responsables por el rendimiento final de un dispositivo. El objetivo de esta tesis es el estudio y desarrollo de técnicas para detectar y mitigar las ineficiencias de computación y energéticas en las aplicaciones móviles. Empieza con los desafíos detrás de la detección de planificación de ejecución ineficientes, mirando sólo la implementación de las aplicaciones. Se muestra que las API de planificación son en gran medida mal utilizadas y tienen un gran impacto en la frecuencia con que los dispositivos despiertan y en la eficiencia de las técnicas de ahorro de energía existentes. A continuación, aborda muchos desafíos de las pruebas de aplicaciones en el campo de análisis dinámica. Más específicamente, cómo escalar las pruebas de aplicaciones móviles con una interacción realista y cómo analizar código de aplicaciones de código cerrado durante la ejecución, mostrando que la introducción de humanos en el bucle de prueba de aplicaciones mejora la cobertura del código y el volumen de comunicación de red generado. Por último, combinando la análisis estática y dinámica, se centra en los desafíos de identificar las secciones de aplicaciones con uso intensivo de recursos y cómo mejorar su ejecución a través de la ejecución remota (i.e.,"offload"). Hay un enfoque especial en el "offload" no intrusivo y transparente a las aplicaciones existentes y en el "offload"y distribución de computación dentro de la red. Demuestra que, incluso sin un sistema operativo personalizado o modificaciones en la aplicación, el "offload" en red sigue siendo posible, mejorando los tiempos de ejecución, el consumo de energía y reduciendo la latencia del usuario final y las tasas de caída de solicitudes de "offload". Concluye con un estudio real de las aplicaciones más populares, mostrando que una buena parte de su código puede de hecho ser ejecutado remotamente y propone direcciones futuras para los campos de "offload" de aplicaciones

    Elastic phone : towards detecting and mitigating computation and energy inefficiencies in mobile apps

    Get PDF
    Mobile devices have become ubiquitous and their ever evolving capabilities are bringing them closer to personal computers. Nonetheless, due to their mobility and small size factor constraints, they still present many hardware and software challenges. Their limited battery life time has led to the design of mobile networks that are inherently different from previous networks (e.g., wifi) and more restrictive task scheduling. Additionally, mobile device ecosystems are more susceptible to the heterogeneity of hardware and from conflicting interests of distributors, internet service providers, manufacturers, developers, etc. The high number of stakeholders ultimately responsible for the performance of a device, results in an inconsistent behavior and makes it very challenging to build a solution that improves resource usage in most cases. The focus of this thesis is on the study and development of techniques to detect and mitigate computation and energy inefficiencies in mobile apps. It follows a bottom-up approach, starting from the challenges behind detecting inefficient execution scheduling by looking only at apps’ implementations. It shows that scheduling APIs are largely misused and have a great impact on devices wake up frequency and on the efficiency of existing energy saving techniques (e.g., batching scheduled executions). Then it addresses many challenges of app testing in the dynamic analysis field. More specifically, how to scale mobile app testing with realistic user input and how to analyze closed source apps’ code at runtime, showing that introducing humans in the app testing loop improves the coverage of app’s code and generated network volume. Finally, using the combined knowledge of static and dynamic analysis, it focuses on the challenges of identifying the resource hungry sections of apps and how to improve their execution via offloading. There is a special focus on performing non-intrusive offloading transparent to existing apps and on in-network computation offloading and distribution. It shows that, even without a custom OS or app modifications, in-network offloading is still possible, greatly improving execution times, energy consumption and reducing both end-user experienced latency and request drop rates. It concludes with a real app measurement study, showing that a good portion of the most popular apps’ code can indeed be offloaded and proposes future directions for the app testing and computation offloading fields.Los dispositivos móviles se han tornado omnipresentes y sus capacidades están en constante evolución acercándolos a los computadoras personales. Sin embargo, debido a su movilidad y tamaño reducido, todavía presentan muchos desafíos de hardware y software. Su duración limitada de batería ha llevado al diseño de redes móviles que son inherentemente diferentes de las redes anteriores y una programación de tareas más restrictiva. Además, los ecosistemas de dispositivos móviles son más susceptibles a la heterogeneidad de hardware y los intereses conflictivos de las entidades responsables por el rendimiento final de un dispositivo. El objetivo de esta tesis es el estudio y desarrollo de técnicas para detectar y mitigar las ineficiencias de computación y energéticas en las aplicaciones móviles. Empieza con los desafíos detrás de la detección de planificación de ejecución ineficientes, mirando sólo la implementación de las aplicaciones. Se muestra que las API de planificación son en gran medida mal utilizadas y tienen un gran impacto en la frecuencia con que los dispositivos despiertan y en la eficiencia de las técnicas de ahorro de energía existentes. A continuación, aborda muchos desafíos de las pruebas de aplicaciones en el campo de análisis dinámica. Más específicamente, cómo escalar las pruebas de aplicaciones móviles con una interacción realista y cómo analizar código de aplicaciones de código cerrado durante la ejecución, mostrando que la introducción de humanos en el bucle de prueba de aplicaciones mejora la cobertura del código y el volumen de comunicación de red generado. Por último, combinando la análisis estática y dinámica, se centra en los desafíos de identificar las secciones de aplicaciones con uso intensivo de recursos y cómo mejorar su ejecución a través de la ejecución remota (i.e.,"offload"). Hay un enfoque especial en el "offload" no intrusivo y transparente a las aplicaciones existentes y en el "offload"y distribución de computación dentro de la red. Demuestra que, incluso sin un sistema operativo personalizado o modificaciones en la aplicación, el "offload" en red sigue siendo posible, mejorando los tiempos de ejecución, el consumo de energía y reduciendo la latencia del usuario final y las tasas de caída de solicitudes de "offload". Concluye con un estudio real de las aplicaciones más populares, mostrando que una buena parte de su código puede de hecho ser ejecutado remotamente y propone direcciones futuras para los campos de "offload" de aplicaciones.Postprint (published version

    Mustererkennungsbasierte Verteidgung gegen gezielte Angriffe

    Get PDF
    The speed at which everything and everyone is being connected considerably outstrips the rate at which effective security mechanisms are introduced to protect them. This has created an opportunity for resourceful threat actors which have specialized in conducting low-volume persistent attacks through sophisticated techniques that are tailored to specific valuable targets. Consequently, traditional approaches are rendered ineffective against targeted attacks, creating an acute need for innovative defense mechanisms. This thesis aims at supporting the security practitioner in bridging this gap by introducing a holistic strategy against targeted attacks that addresses key challenges encountered during the phases of detection, analysis and response. The structure of this thesis is therefore aligned to these three phases, with each one of its central chapters taking on a particular problem and proposing a solution built on a strong foundation on pattern recognition and machine learning. In particular, we propose a detection approach that, in the absence of additional authentication mechanisms, allows to identify spear-phishing emails without relying on their content. Next, we introduce an analysis approach for malware triage based on the structural characterization of malicious code. Finally, we introduce MANTIS, an open-source platform for authoring, sharing and collecting threat intelligence, whose data model is based on an innovative unified representation for threat intelligence standards based on attributed graphs. As a whole, these ideas open new avenues for research on defense mechanisms and represent an attempt to counteract the imbalance between resourceful actors and society at large.In unserer heutigen Welt sind alle und alles miteinander vernetzt. Dies bietet mächtigen Angreifern die Möglichkeit, komplexe Verfahren zu entwickeln, die auf spezifische Ziele angepasst sind. Traditionelle Ansätze zur Bekämpfung solcher Angriffe werden damit ineffektiv, was die Entwicklung innovativer Methoden unabdingbar macht. Die vorliegende Dissertation verfolgt das Ziel, den Sicherheitsanalysten durch eine umfassende Strategie gegen gezielte Angriffe zu unterstützen. Diese Strategie beschäftigt sich mit den hauptsächlichen Herausforderungen in den drei Phasen der Erkennung und Analyse von sowie der Reaktion auf gezielte Angriffe. Der Aufbau dieser Arbeit orientiert sich daher an den genannten drei Phasen. In jedem Kapitel wird ein Problem aufgegriffen und eine entsprechende Lösung vorgeschlagen, die stark auf maschinellem Lernen und Mustererkennung basiert. Insbesondere schlagen wir einen Ansatz vor, der eine Identifizierung von Spear-Phishing-Emails ermöglicht, ohne ihren Inhalt zu betrachten. Anschliessend stellen wir einen Analyseansatz für Malware Triage vor, der auf der strukturierten Darstellung von Code basiert. Zum Schluss stellen wir MANTIS vor, eine Open-Source-Plattform für Authoring, Verteilung und Sammlung von Threat Intelligence, deren Datenmodell auf einer innovativen konsolidierten Graphen-Darstellung für Threat Intelligence Stardards basiert. Wir evaluieren unsere Ansätze in verschiedenen Experimenten, die ihren potentiellen Nutzen in echten Szenarien beweisen. Insgesamt bereiten diese Ideen neue Wege für die Forschung zu Abwehrmechanismen und erstreben, das Ungleichgewicht zwischen mächtigen Angreifern und der Gesellschaft zu minimieren
    corecore