407 research outputs found

    Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails

    Get PDF
    This thesis empirically investigates how a corpus linguistic approach can address the main theoretical and methodological challenges facing the field of forensic authorship analysis. Linguists approach the problem of questioned authorship from the theoretical position that each person has their own distinctive idiolect (Coulthard 2004: 431). However, the notion of idiolect has come under scrutiny in forensic linguistics over recent years for being too abstract to be of practical use (Grant 2010; Turell 2010). At the same time, two competing methodologies have developed in authorship analysis. On the one hand, there are qualitative stylistic approaches, and on the other there are statistical ‘stylometric’ techniques. This study uses a corpus of over 60,000 emails and 2.5 million words written by 176 employees of the former American company Enron to tackle these issues in the contexts of both authorship attribution (identifying authors using linguistic evidence) and author profiling (predicting authors’ social characteristics using linguistic evidence). Analyses reveal that even in shared communicative contexts, and when using very common lexical items, individual Enron employees produce distinctive collocation patterns and lexical co-selections. In turn, these idiolectal elements of linguistic output can be captured and quantified by word n-grams (strings of n words). An attribution experiment is performed using word n-grams to identify the authors of anonymised email samples. Results of the experiment are encouraging, and it is argued that the approach developed here offers a means by which stylistic and statistical techniques can complement each other. Finally, quantitative and qualitative analyses are combined in the sociolinguistic profiling of Enron employees by gender and occupation. Current author profiling research is exclusively statistical in nature. However, the findings here demonstrate that when statistical results are augmented by qualitative evidence, the complex relationship between language use and author identity can be more accurately observed

    Advancing security information and event management frameworks in managed enterprises using geolocation

    Get PDF
    Includes bibliographical referencesSecurity Information and Event Management (SIEM) technology supports security threat detection and response through real-time and historical analysis of security events from a range of data sources. Through the retrieval of mass feedback from many components and security systems within a computing environment, SIEMs are able to correlate and analyse events with a view to incident detection. The hypothesis of this study is that existing Security Information and Event Management techniques and solutions can be complemented by location-based information provided by feeder systems. In addition, and associated with the introduction of location information, it is hypothesised that privacy-enforcing procedures on geolocation data in SIEMs and meta- systems alike are necessary and enforceable. The method for the study was to augment a SIEM, established for the collection of events in an enterprise service management environment, with geo-location data. Through introducing the location dimension, it was possible to expand the correlation rules of the SIEM with location attributes and to see how this improved security confidence. An important co-consideration is the effect on privacy, where location information of an individual or system is propagated to a SIEM. With a theoretical consideration of the current privacy directives and regulations (specifically as promulgated in the European Union), privacy supporting techniques are introduced to diminish the accuracy of the location information - while still enabling enhanced security analysis. In the context of a European Union FP7 project relating to next generation SIEMs, the results of this work have been implemented based on systems, data, techniques and resilient features of the MASSIF project. In particular, AlienVault has been used as a platform for augmentation of a SIEM and an event set of several million events, collected over a three month period, have formed the basis for the implementation and experimentation. A "brute-force attack" misuse case scenario was selected to highlight the benefits of geolocation information as an enhancement to SIEM detection (and false-positive prevention). With respect to privacy, a privacy model is introduced for SIEM frameworks. This model utilises existing privacy legislation, that is most stringent in terms of privacy, as a basis. An analysis of the implementation and testing is conducted, focusing equally on data security and privacy, that is, assessing location-based information in enhancing SIEM capability in advanced security detection, and, determining if privacy-enforcing procedures on geolocation in SIEMs and other meta-systems are achievable and enforceable. Opportunities for geolocation enhancing various security techniques are considered, specifically for solving misuse cases identified as existing problems in enterprise environments. In summary, the research shows that additional security confidence and insight can be achieved through the augmentation of SIEM event information with geo-location information. Through the use of spatial cloaking it is also possible to incorporate location information without com- promising individual privacy. Overall the research reveals that there are significant benefits for SIEMs to make use of geo-location in their analysis calculations, and that this can be effectively conducted in ways which are acceptable to privacy considerations when considered against prevailing privacy legislation and guidelines

    Investigating serial murder : case linkage methods employed by the South African Police Service

    Get PDF
    The aim of this descriptive research was to determine the methods of case linkage (methods to link murder cases to each other as well as to link the murder series to one offender) employed by the South African Police Service (SAPS) to investigate serial murder in South Africa and to comprehensively explain them. A qualitative approach was employed with a multi-method data collection process which included case study, interviews and literature review in order to gain an in-depth understanding of the subject. The methods of case linkage are explained within three phases of a serial murder investigation: the identification phase, the investigation and apprehension phase, and the trial and sentencing phase. The main findings of the study revealed the need for further training of the SAPS members and the need for a Standing Operating Procedure to be implemented to specifically govern the system of investigation for a serial murder case.Criminology and Security ScienceM.A. (Criminology

    Data visualisation in digital forensics

    Get PDF
    As digital crimes have risen, so has the need for digital forensics. Numerous state-of-the-art tools have been developed to assist digital investigators conduct proper investigations into digital crimes. However, digital investigations are becoming increasingly complex and time consuming due to the amount of data involved, and digital investigators can find themselves unable to conduct them in an appropriately efficient and effective manner. This situation has prompted the need for new tools capable of handling such large, complex investigations. Data mining is one such potential tool. It is still relatively unexplored from a digital forensics perspective, but the purpose of data mining is to discover new knowledge from data where the dimensionality, complexity or volume of data is prohibitively large for manual analysis. This study assesses the self-organising map (SOM), a neural network model and data mining technique that could potentially offer tremendous benefits to digital forensics. The focus of this study is to demonstrate how the SOM can help digital investigators to make better decisions and conduct the forensic analysis process more efficiently and effectively during a digital investigation. The SOM’s visualisation capabilities can not only be used to reveal interesting patterns, but can also serve as a platform for further, interactive analysis.Dissertation (MSc (Computer Science))--University of Pretoria, 2007.Computer Scienceunrestricte

    E-mail forensic authorship attribution

    Get PDF
    E-mails have become the standard for business as well as personal communication. The inherent security risks within e-mail communication present the problem of anonymity. If an author of an e-mail is not known, the digital forensic investigator needs to determine the authorship of the e-mail using a process that has not been standardised in the e-mail forensic field. This research project examines many problems associated with e-mail communication and the digital forensic domain; more specifically e-mail forensic investigations, and the recovery of legally admissible evidence to be presented in a court of law. The Research Methodology utilised a comprehensive literature review in combination with Design Science which results in the development of an artifact through intensive research. The Proposed E-Mail Forensic Methodology is based on the most current digital forensic investigation process and further validation of the process was established via expert reviews. The opinions of the digital forensic experts were an integral portion of the validation process which adds to the credibility of the study. This was performed through the aid of the Delphi technique. This Proposed E-Mail Forensic Methodology adopts a standardised investigation process applied to an e-mail investigation and takes into account the South African perspective by incorporating various checks with the laws and legislation. By following the Proposed E-mail Forensic Methodology, e-mail forensic investigators can produce evidence that is legally admissible in a court of law

    Assessing the evidential value of artefacts recovered from the cloud

    Get PDF
    Cloud computing offers users low-cost access to computing resources that are scalable and flexible. However, it is not without its challenges, especially in relation to security. Cloud resources can be leveraged for criminal activities and the architecture of the ecosystem makes digital investigation difficult in terms of evidence identification, acquisition and examination. However, these same resources can be leveraged for the purposes of digital forensics, providing facilities for evidence acquisition, analysis and storage. Alternatively, existing forensic capabilities can be used in the Cloud as a step towards achieving forensic readiness. Tools can be added to the Cloud which can recover artefacts of evidential value. This research investigates whether artefacts that have been recovered from the Xen Cloud Platform (XCP) using existing tools have evidential value. To determine this, it is broken into three distinct areas: adding existing tools to a Cloud ecosystem, recovering artefacts from that system using those tools and then determining the evidential value of the recovered artefacts. From these experiments, three key steps for adding existing tools to the Cloud were determined: the identification of the specific Cloud technology being used, identification of existing tools and the building of a testbed. Stemming from this, three key components of artefact recovery are identified: the user, the audit log and the Virtual Machine (VM), along with two methodologies for artefact recovery in XCP. In terms of evidential value, this research proposes a set of criteria for the evaluation of digital evidence, stating that it should be authentic, accurate, reliable and complete. In conclusion, this research demonstrates the use of these criteria in the context of digital investigations in the Cloud and how each is met. This research shows that it is possible to recover artefacts of evidential value from XCP

    A multi-disciplinary framework for cyber attribution

    Get PDF
    Effective Cyber security is critical to the prosperity of any nation in the modern world. We have become dependant upon this interconnected network of systems for a number of critical functions within society. As our reliance upon this technology has increased, as has the prospective gains for malicious actors who would abuse these systems for their own personal benefit, at the cost of legitimate users. The result has been an explosion of cyber attacks, or cyber enabled crimes. The threat from hackers, organised criminals and even nations states is ever increasing. One of the critical enablers to our cyber security is that of cyber attribution, the ability to tell who is acting against our systems. A purely technical approach to cyber attribution has been found to be ineffective in the majority of cases, taking too narrow approach to the attribution problem. A purely technical approach will provide Indicators Of Compromise (IOC) which is suitable for the immediate recovery and clean up of a cyber event. It fails however to ask the deeper questions of the origin of the attack. This can be derived from a wider set of analysis and additional sources of data. Unfortunately due to the wide range of data types and highly specialist skills required to perform the deep level analysis there is currently no common framework for analysts to work together towards resolving the attribution problem. This is further exasperated by a communication barrier between the highly specialised fields and no obviously compatible data types. The aim of the project is to develop a common framework upon which experts from a number of disciplines can add to the overall attribution picture. These experts will add their input in the form of a library. Firstly a process was developed to enable the creation of compatible libraries in different specialist fields. A series of libraries can be used by an analyst to create an overarching attribution picture. The framework will highlight any intelligence gaps and additionally an analyst can use the list of libraries to suggest a tool or method to fill that intelligence gap. By the end of the project a working framework had been developed with a number of libraries from a wide range of technical attribution disciplines. These libraries were used to feed in real time intelligence to both technical and nontechnical analysts who were then able to use this information to perform in depth attribution analysis. The pictorial format of the framework was found to assist in the breaking down of the communication barrier between disciplines and was suitable as an intelligence product in its own right, providing a useful visual aid to briefings. The simplicity of the library based system meant that the process was easy to learn with only a short introduction to the framework required

    Automated Digital Forensics and Computer Crime Profiling

    Get PDF
    Edited version embargoed until 01.12.2017 Full version: Access restricted permanently due to 3rd party copyright restrictions. Restriction set on 08.12.2016 by SC, Graduate SchoolOver the past two decades, technology has developed tremendously, at an almost exponential rate. While this development has served the nation in numerous different positive ways, negatives have also emerged. One such negative is that of computer crime. This criminality has even grown so fast as to leave current digital forensic tools lagging behind in terms of development, and capabilities to manage such increasing and sophisticated types of crime. In essence the time taken to analyse a case is huge and increasing, and cases are not fully or properly investigated. This results in an ever-increasing number of pending and unsolved cases pertaining to computer crime. Digital forensics has become an essential tool in the fight against computer crime, providing both procedures and tools for the acquisition, examination and analysis of digital evidence. However, the use of technology is expanding at an ever-increasing rate, with the number of devices a single user might engage with increasing from a single device to 3 or more, the data capacity of those devices reaching far into the Terabytes, and the nature of the underlying technology evolving (for example, the use of cloud services). This results in an incredible challenge for forensic examiners to process and analyse cases in an efficient and effective manner. This thesis focuses upon the examination and analysis phases of the investigative process and considers whether automation of the process is possible. The investigation begins with researching the current state of the art, and illustrates a wide range of challenges that are facing the digital forensics investigators when analysing a case. Supported by a survey of forensic researchers and practitioners, key challenges were identified and prioritised. It was found that 95% of participants believed that the number of forensic investigations would increase in the coming times, with 75% of participants believing that the time consumed in such cases would increase. With regards to the digital forensic sophistication, 95% of the participants expected a rise in the complexity level and sophistication of digital forensics. To this end, an automated intelligent system that could be used to reduce the investigator’s time and cognitive load was found to be a promising solution. A series of experiments are devised around the use of Self-Organising Maps (SOMs) – a technique well known for unsupervised clustering of objects. The analysis is performed on a range of file system and application-level objects (e.g. email, internet activity) across four forensic cases. Experiment evaluations revealed SOMs are able to successfully cluster forensic artefacts from the remaining files. Having established SOMs are capable of clustering wanted artefacts from the case, a novel algorithm referred to as the Automated Evidence Profiler (AEP), is proposed to encapsulate the process and provide further refinement of the artefact identification process. The algorithm led to achieving identification rates in examined cases of 100% in two cases and 94% in a third. A novel architecture is proposed to support the algorithm in an operational capacity – considering standard forensic techniques such as hashing for known files, file signature analysis, application-level analysis. This provides a mechanism that is capable of utilising the A E P with several other components that are able to filter, prioritise and visualise artefacts of interest to investigator. The approach, known as Automated Forensic Examiner (AFE), is capable of identifying potential evidence in a more efficient and effective manner. The approach was evaluated by a number of experts in the field, and it was unanimously agreed that the chosen research problem was one with great validity. Further to this, the experts all showed support for the Automated Forensic Examiner based on the results of cases analysed

    A study on plagiarism detection and plagiarism direction identification using natural language processing techniques

    Get PDF
    Ever since we entered the digital communication era, the ease of information sharing through the internet has encouraged online literature searching. With this comes the potential risk of a rise in academic misconduct and intellectual property theft. As concerns over plagiarism grow, more attention has been directed towards automatic plagiarism detection. This is a computational approach which assists humans in judging whether pieces of texts are plagiarised. However, most existing plagiarism detection approaches are limited to super cial, brute-force stringmatching techniques. If the text has undergone substantial semantic and syntactic changes, string-matching approaches do not perform well. In order to identify such changes, linguistic techniques which are able to perform a deeper analysis of the text are needed. To date, very limited research has been conducted on the topic of utilising linguistic techniques in plagiarism detection. This thesis provides novel perspectives on plagiarism detection and plagiarism direction identi cation tasks. The hypothesis is that original texts and rewritten texts exhibit signi cant but measurable di erences, and that these di erences can be captured through statistical and linguistic indicators. To investigate this hypothesis, four main research objectives are de ned. First, a novel framework for plagiarism detection is proposed. It involves the use of Natural Language Processing techniques, rather than only relying on the vii traditional string-matching approaches. The objective is to investigate and evaluate the in uence of text pre-processing, and statistical, shallow and deep linguistic techniques using a corpus-based approach. This is achieved by evaluating the techniques in two main experimental settings. Second, the role of machine learning in this novel framework is investigated. The objective is to determine whether the application of machine learning in the plagiarism detection task is helpful. This is achieved by comparing a thresholdsetting approach against a supervised machine learning classi er. Third, the prospect of applying the proposed framework in a large-scale scenario is explored. The objective is to investigate the scalability of the proposed framework and algorithms. This is achieved by experimenting with a large-scale corpus in three stages. The rst two stages are based on longer text lengths and the nal stage is based on segments of texts. Finally, the plagiarism direction identi cation problem is explored as supervised machine learning classi cation and ranking tasks. Statistical and linguistic features are investigated individually or in various combinations. The objective is to introduce a new perspective on the traditional brute-force pair-wise comparison of texts. Instead of comparing original texts against rewritten texts, features are drawn based on traits of texts to build a pattern for original and rewritten texts. Thus, the classi cation or ranking task is to t a piece of text into a pattern. The framework is tested by empirical experiments, and the results from initial experiments show that deep linguistic analysis contributes to solving the problems we address in this thesis. Further experiments show that combining shallow and viii deep techniques helps improve the classi cation of plagiarised texts by reducing the number of false negatives. In addition, the experiment on plagiarism direction detection shows that rewritten texts can be identi ed by statistical and linguistic traits. The conclusions of this study o er ideas for further research directions and potential applications to tackle the challenges that lie ahead in detecting text reuse.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • 

    corecore