research

Digital forensic analysis of internet history using principal component analysis

Abstract

A modern Digital Forensic examination, even on a small-scale home computer typically involves searching large-size hard disk drive storage, a variety of host and web-based applications which may or may not be known to the investigator, and a proliferation of web-based Internet history artefacts that may be highly significant to showing the motivation of a suspect. Faster keyword searching and larger and more accurate sets of file hashes may point the investigator to relevant artefacts but when dealing with the new or the unknown, or there is a need to holistically profile the activity of the computer, the investigator is left with a manual and labour-intensive investigation. This paper proposes using an unsupervised statistical learning technique called Principal Component Analysis to provide a novel approach to the analysis of Digital Forensic Internet history. The approach groups and analyses artefacts to produce a high-level context view of the timeline data. The paper proposes a Principal Component Analysis approach and the selection of the appropriate number of Principal Components is described using the Scree test method. A case study of the approach is shown, first using a simulated set of data test comprising of 820 Mozilla Internet History artefacts and then using a set of 5900 Internet Explorer history artefacts from real-world browser data. The results of the analysis are presented in a tabular format that provides an accessible overall view of the activity within the timeline. They show a promising approach to effectively and simply represent large quantities of timeline data at a high-level where basic patterns of usage can be determined. Further work on enhancing the proposed approach to include low-level pattern rules is discussed

    Similar works