12 research outputs found

    Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive

    Get PDF
    With the volume of digital forensic evidence rapidly increasing, this paper proposes a data reduction and data mining framework that incorporates a process of reducing data volume by focusing on a subset of information. Foreword The volume of digital forensic evidence is rapidly increasing, leading to large backlogs. In this paper, a Digital Forensic Data Reduction and Data Mining Framework is proposed. Initial research with sample data from South Australia Police Electronic Crime Section and Digital Corpora Forensic Images using the proposed framework resulted in significant reduction in the storage requirements—the reduced subset is only 0.196 percent and 0.75 percent respectively of the original data volume. The framework outlined is not suggested to replace full analysis, but serves to provide a rapid triage, collection, intelligence analysis, review and storage methodology to support the various stages of digital forensic examinations. Agencies that can undertake rapid assessment of seized data can more effectively target specific criminal matters. The framework may also provide a greater potential intelligence gain from analysis of current and historical data in a timely manner, and the ability to undertake research of trends over time

    Designing a Data Warehouse for Cyber Crimes

    Get PDF
    One of the greatest challenges facing modern society is the rising tide of cyber crimes. These crimes, since they rarely fit the model of conventional crimes, are difficult to investigate, hard to analyze, and difficult to prosecute. Collecting data in a unified framework is a mandatory step that will assist the investigator in sorting through the mountains of data. In this paper, we explore designing a dimensional model for a data warehouse that can be used in analyzing cyber crime data. We also present some interesting queries and the types of cyber crime analyses that can be performed based on the data warehouse. We discuss several ways of utilizing the data warehouse using OLAP and data mining technologies. We finally discuss legal issues and data population issues for the data warehouse

    Designing a Data Warehouse for Cyber Crimes

    Get PDF
    One of the greatest challenges facing modern society is the rising tide of cyber crimes. These crimes, since they rarely fit the model of conventional crimes, are difficult to investigate, hard to analyze, and difficult to prosecute. Collecting data in a unified framework is a mandatory step that will assist the investigator in sorting through the mountains of data. In this paper, we explore designing a dimensional model for a data warehouse that can be used in analyzing cyber crime data. We also present some interesting queries and the types of cyber crime analyses that can be performed based on the data warehouse. We discuss several ways of utilizing the data warehouse using OLAP and data mining technologies. We finally discuss legal issues and data population issues for the data warehouse

    Reverse Image Search Using Deep Unsupervised Generative Learning and Deep Convolutional Neural Network

    Get PDF
    Reverse image search has been a vital and emerging research area of information retrieval. One of the primary research foci of information retrieval is to increase the space and computational efficiency by converting a large image database into an efficiently computed feature database. This paper proposes a novel deep learning-based methodology, which captures channel-wise, low-level details of each image. In the first phase, sparse auto-encoder (SAE), a deep generative model, is applied to RGB channels of each image for unsupervised representational learning. In the second phase, transfer learning is utilized by using VGG-16, a variant of deep convolutional neural network (CNN). The output of SAE combined with the original RGB channel is forwarded to VGG-16, thereby producing a more effective feature database by the ensemble/collaboration of two effective models. The proposed method provides an information rich feature space that is a reduced dimensionality representation of the image database. Experiments are performed on a hybrid dataset that is developed by combining three standard publicly available datasets. The proposed approach has a retrieval accuracy (precision) of 98.46%, without using the metadata of images, by using a cosine similarity measure between the query image and the image database. Additionally, to further validate the proposed methodology’s effectiveness, image quality has been degraded by adding 5% noise (Speckle, Gaussian, and Salt pepper noise types) in the hybrid dataset. Retrieval accuracy has generally been found to be 97% for different variants of nois

    Fire Pattern Analysis, Junk Science, Old Wives Tales, and Ipse Dixit: Emerging Forensic 3D Imaging Technologies to the Rescue?

    Get PDF
    Forensic science is undergoing a period of transformation as legal and scientific forces converge and force older forensic sciences toward a new scientific paradigm. Fire investigative undertakings are not an exception to this trend. Skeptical defense attorneys who routinely formulate astute Daubert challenges to contest the scientific validity and reliability of every major forensic science discipline are one catalyst to this revolution. Furthermore, a steady influx of novel scientific advances makes possible the formulation of consistent and scientifically-based quantitative forensic evidence analyses to overcome the “undervalidated and oversold” problems affecting many areas of forensic science

    Application of modern statistical methods in worldwide health insurance

    Get PDF
    With the increasing availability of internal and external data in the (health) insurance industry, the demand for new data insights from analytical methods is growing. This dissertation presents four examples of the application of advanced regression-based prediction techniques for claims and network management in health insurance: patient segmentation for and economic evaluation of disease management programs, fraud and abuse detection and medical quality assessment. Based on different health insurance datasets, it is shown that tailored models and newly developed algorithms, like Bayesian latent variable models, can optimize the business steering of health insurance companies. By incorporating and structuring medical and insurance knowledge these tailored regression approaches can at least compete with machine learning and artificial intelligence methods while being more transparent and interpretable for the business users. In all four examples, methodology and outcomes of the applied approaches are discussed extensively from an academic perspective. Various comparisons to analytical and market best practice methods allow to also judge the added value of the applied approaches from an economic perspective.Mit der wachsenden VerfĂŒgbarkeit von internen und externen Daten in der (Kranken-) Versicherungsindustrie steigt die Nachfrage nach neuen Erkenntnissen gewonnen aus analytischen Verfahren. In dieser Dissertation werden vier Anwendungsbeispiele fĂŒr komplexe regressionsbasierte Vorhersagetechniken im Schaden- und Netzwerkmanagement von Krankenversicherungen prĂ€sentiert: Patientensegmentierung fĂŒr und ökonomische Auswertung von Gesundheitsprogrammen, Betrugs- und Missbrauchserkennung und Messung medizinischer BehandlungsqualitĂ€t. Basierend auf verschiedenen KrankenversicherungsdatensĂ€tzen wird gezeigt, dass maßgeschneiderte Modelle und neu entwickelte Algorithmen, wie bayesianische latente Variablenmodelle, die GeschĂ€ftsteuerung von Krankenversicherern optimieren können. Durch das Einbringen und Strukturieren von medizinischem und versicherungstechnischem Wissen können diese maßgeschneiderten RegressionsansĂ€tze mit Methoden aus dem maschinellen Lernen und der kĂŒnstlichen Intelligenz zumindest mithalten. Gleichzeitig bieten diese AnsĂ€tze dem Businessanwender ein höheres Maß an Transparenz und Interpretierbarkeit. In allen vier Beispielen werden Methodik und Ergebnisse der angewandten Verfahren ausfĂŒhrlich aus einer akademischen Perspektive diskutiert. Verschiedene Vergleiche mit analytischen und marktĂŒblichen Best-Practice-Methoden erlauben es, den Mehrwert der angewendeten AnsĂ€tze auch aus einer ökonomischen Perspektive zu bewerten

    Data visualisation in digital forensics

    Get PDF
    As digital crimes have risen, so has the need for digital forensics. Numerous state-of-the-art tools have been developed to assist digital investigators conduct proper investigations into digital crimes. However, digital investigations are becoming increasingly complex and time consuming due to the amount of data involved, and digital investigators can find themselves unable to conduct them in an appropriately efficient and effective manner. This situation has prompted the need for new tools capable of handling such large, complex investigations. Data mining is one such potential tool. It is still relatively unexplored from a digital forensics perspective, but the purpose of data mining is to discover new knowledge from data where the dimensionality, complexity or volume of data is prohibitively large for manual analysis. This study assesses the self-organising map (SOM), a neural network model and data mining technique that could potentially offer tremendous benefits to digital forensics. The focus of this study is to demonstrate how the SOM can help digital investigators to make better decisions and conduct the forensic analysis process more efficiently and effectively during a digital investigation. The SOM’s visualisation capabilities can not only be used to reveal interesting patterns, but can also serve as a platform for further, interactive analysis.Dissertation (MSc (Computer Science))--University of Pretoria, 2007.Computer Scienceunrestricte

    Optimum parameter machine learning classification and prediction of Internet of Things (IoT) malwares using static malware analysis techniques

    Get PDF
    Application of machine learning in the field of malware analysis is not a new concept, there have been lots of researches done on the classification of malware in android and windows environments. However, when it comes to malware analysis in the internet of things (IoT), it still requires work to be done. IoT was not designed to keeping security/privacy under consideration. Therefore, this area is full of research challenges. This study seeks to evaluate important machine learning classifiers like Support Vector Machines, Neural Network, Random Forest, Decision Trees, Naive Bayes, Bayesian Network, etc. and proposes a framework to utilize static feature extraction and selection processes highlight issues like over-fitting and generalization of classifiers to get an optimized algorithm with better performance. For background study, we used systematic literature review to find out research gaps in IoT, presented malware as a big challenge for IoT and the reasons for applying malware analysis targeting IoT devices and finally perform classification on malware dataset. The classification process used was applied on three different datasets containing file header, program header and section headers as features. Preliminary results show the accuracy of over 90% on file header, program header, and section headers. The scope of this document just discusses these results as initial results and still require some issues to be addressed which may effect on the performance measures
    corecore