Search CORE

69 research outputs found

The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

Author: Hendrikse Steven
Publication venue: NSUWorks
Publication date: 01/01/2017
Field of study

In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

NSU Works

Air Force Institute of Technology Research Report 2010

Author: Office of Research and Sponsored Programs Graduate School of Engineering and Management, AFIT
Publication venue: AFIT Scholar
Publication date: 15/02/2011
Field of study

This report summarizes the research activities of the Air Force Institute of Technology’s Graduate School of Engineering and Management. It describes research interests and faculty expertise; lists student theses/dissertations; identifies research sponsors and contributions; and outlines the procedures for contacting the school. Included in the report are: faculty publications, conference presentations, consultations, and funded research projects. Research was conducted in the areas of Aeronautical and Astronautical Engineering, Electrical Engineering and Electro-Optics, Computer Engineering and Computer Science, Systems and Engineering Management, Operational Sciences, Mathematics, Statistics and Engineering Physic

AFTI Scholar (Air Force Institute of Technology)

Self-learning Anomaly Detection in Industrial Production

Author: Meshram Ankush
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 07/11/2022
Field of study

KITopen

Software similarity and classification

Author: Cesare Silvio
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/06/2013
Field of study

This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

Deakin Research Online

Generative Non-Markov Models for Information Extraction

Author: Andrews Nicholas Oliver
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 15/04/2019
Field of study

Learning from unlabeled data is a long-standing challenge in machine learning. A principled solution involves modeling the full joint distribution over inputs and the latent structure of interest, and imputing the missing data via marginalization. Unfortunately, such marginalization is expensive for most non-trivial problems, which places practical limits on the expressiveness of generative models. As a result, joint models often encode strict assumptions about the underlying process such as fixed-order Markovian assumptions and employ simple count-based features of the inputs. In contrast, conditional models, which do not directly model the observed data, are free to incorporate rich overlapping features of the input in order to predict the latent structure of interest. It would be desirable to develop expressive generative models that retain tractable inference. This is the topic of this thesis. In particular, we explore joint models which relax fixed-order Markov assumptions, and investigate the use of recurrent neural networks for automatic feature induction in the generative process. We focus on two structured prediction problems: (1) imputing labeled segmentions of input character sequences, and (2) imputing directed spanning trees relating strings in text corpora. These problems arise in many applications of practical interest, but we are primarily concerned with named-entity recognition and cross-document coreference resolution in this work. For named-entity recognition, we propose a generative model in which the observed characters originate from a latent non-Markov process over words, and where the characters are themselves produced via a non-Markov process: a recurrent neural network (RNN). We propose a sampler for the proposed model in which sequential Monte Carlo is used as a transition kernel for a Gibbs sampler. The kernel is amenable to a fast parallel implementation, and results in fast mixing in practice. For cross-document coreference resolution, we move beyond sequence modeling to consider string-to-string transduction. We stipulate a generative process for a corpus of documents in which entity names arise from copying---and optionally transforming---previous names of the same entity. Our proposed model is sensitive to both the context in which the names occur as well as their spelling. The string-to-string transformations correspond to systematic linguistic processes such as abbreviation, typos, and nicknaming, and by analogy to biology, we think of them as mutations along the edges of a phylogeny. We propose a novel block Gibbs sampler for this problem that alternates between sampling an ordering of the mentions and a spanning tree relating all mentions in the corpus

JScholarship

Stinging the Predators: A collection of papers that should never have been published

Author: Faulkes Zen
Publication venue: ScholarWorks @ UTRGV
Publication date: 20/05/2022
Field of study

This ebook collects academic papers and conference abstracts that were meant to be so terrible that nobody in their right mind would publish them. All were submitted to journals and conferences to expose weak or non-existent peer review and other exploitative practices. Each paper has a brief introduction. Short essays round out the collection

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

UMSL Bulletin 2020-2021

Author: University of Missouri-St. Louis
Publication venue: IRL @ UMSL
Publication date: 01/07/2020
Field of study

The 2020-2021 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1084/thumbnail.jp

University of Missouri, St. Louis

UMSL Bulletin 2021-2022

Author: University of Missouri-St. Louis
Publication venue: IRL @ UMSL
Publication date: 01/07/2021
Field of study

The 2021-2022 Bulletin and Course Catalog for the University of Missouri St. Louis. This is the July 1, 2021 pdf snapshot version of the University Bulletin and Course Catalog.https://irl.umsl.edu/bulletin/1086/thumbnail.jp

University of Missouri, St. Louis