69 research outputs found
The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files
In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
Air Force Institute of Technology Research Report 2010
This report summarizes the research activities of the Air Force Institute of Technology’s Graduate School of Engineering and Management. It describes research interests and faculty expertise; lists student theses/dissertations; identifies research sponsors and contributions; and outlines the procedures for contacting the school. Included in the report are: faculty publications, conference presentations, consultations, and funded research projects. Research was conducted in the areas of Aeronautical and Astronautical Engineering, Electrical Engineering and Electro-Optics, Computer Engineering and Computer Science, Systems and Engineering Management, Operational Sciences, Mathematics, Statistics and Engineering Physic
Software similarity and classification
This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities
Generative Non-Markov Models for Information Extraction
Learning from unlabeled data is a long-standing challenge in machine learning. A
principled solution involves modeling the full joint distribution over inputs
and the latent structure of interest, and imputing the missing data via
marginalization. Unfortunately, such marginalization is expensive for most
non-trivial problems, which places practical limits on the expressiveness of
generative models. As a result, joint models often encode strict assumptions
about the underlying process such as fixed-order Markovian assumptions and
employ simple count-based features of the inputs. In contrast, conditional
models, which do not directly model the observed data, are free to incorporate
rich overlapping features of the input in order to predict the latent structure
of interest. It would be desirable to develop expressive generative models that
retain tractable inference. This is the topic of this thesis. In particular, we
explore joint models which relax fixed-order Markov assumptions, and investigate
the use of recurrent neural networks for automatic feature induction in the
generative process.
We focus on two structured prediction problems: (1) imputing labeled segmentions
of input character sequences, and (2) imputing directed spanning trees relating
strings in text corpora. These problems arise in many applications of practical
interest, but we are primarily concerned with named-entity recognition and
cross-document coreference resolution in this work.
For named-entity recognition, we propose a generative model in which the
observed characters originate from a latent non-Markov process over words, and
where the characters are themselves produced via a non-Markov process: a
recurrent neural network (RNN). We propose a sampler for the proposed model in
which sequential Monte Carlo is used as a transition kernel for a Gibbs sampler.
The kernel is amenable to a fast parallel implementation, and results in fast
mixing in practice.
For cross-document coreference resolution, we move beyond sequence modeling to
consider string-to-string transduction. We stipulate a generative process for a
corpus of documents in which entity names arise from copying---and optionally
transforming---previous names of the same entity. Our proposed model is
sensitive to both the context in which the names occur as well as their
spelling. The string-to-string transformations correspond to systematic
linguistic processes such as abbreviation, typos, and nicknaming, and by analogy
to biology, we think of them as mutations along the edges of a phylogeny. We
propose a novel block Gibbs sampler for this problem that alternates between
sampling an ordering of the mentions and a spanning tree relating all mentions
in the corpus
Stinging the Predators: A collection of papers that should never have been published
This ebook collects academic papers and conference abstracts that were meant to be so terrible that nobody in their right mind would publish them. All were submitted to journals and conferences to expose weak or non-existent peer review and other exploitative practices. Each paper has a brief introduction. Short essays round out the collection
UMSL Bulletin 2020-2021
The 2020-2021 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1084/thumbnail.jp
UMSL Bulletin 2021-2022
The 2021-2022 Bulletin and Course Catalog for the University of Missouri St. Louis.
This is the July 1, 2021 pdf snapshot version of the University Bulletin and Course Catalog.https://irl.umsl.edu/bulletin/1086/thumbnail.jp
UMSL Bulletin 2018-2019
The University Bulletin/Course Catalog 2018-2019 Edition.https://irl.umsl.edu/bulletin/1082/thumbnail.jp
UMSL Bulletin 2019-2020
The University Bulletin/Course Catalog 2019-2020 Edition.https://irl.umsl.edu/bulletin/1083/thumbnail.jp
- …