409 research outputs found

    A Framework for Identifying Host-based Artifacts in Dark Web Investigations

    Get PDF
    The dark web is the hidden part of the internet that is not indexed by search engines and is only accessible with a specific browser like The Onion Router (Tor). Tor was originally developed as a means of secure communications and is still used worldwide for individuals seeking privacy or those wanting to circumvent restrictive regimes. The dark web has become synonymous with nefarious and illicit content which manifests itself in underground marketplaces containing illegal goods such as drugs, stolen credit cards, stolen user credentials, child pornography, and more (Kohen, 2017). Dark web marketplaces contribute both to illegal drug usage and child pornography. Given the fundamental goal of privacy and anonymity, there are limited techniques for finding forensic artifacts and evidence files when investigating misuse and criminal activity in the dark web. Previous studies of digital forensics frameworks reveal a common theme of collection, examination, analysis, and reporting. The existence and frequency of proposed frameworks demonstrate the acceptance and utility of these frameworks in the field of digital forensics. Previous studies of dark web forensics have focused on network forensics rather than hostbased forensics. macOS is the second most popular operating system after Windows (Net Marketshare, n.d.); however, previous research has focused on the Windows operating system with little attention given to macOS forensics. This research uses design science methodology to develop a framework for identifying host-based artifacts during a digital forensic investigation involving suspected dark web use. Both the Windows operating system and macOS are included with the expected result being a reusable, comprehensive framework that is easy to follow and assists investigators in finding artifacts that are designed to be hidden or otherwise hard to find. The contribution of this framework will assist investigators in identifying evidence in cases where the user is suspected of accessing the dark web for criminal intent when little or no other evidence of a crime is present. The artifact produced for this research, The Dark Web Artifact Framework, was evaluated using three different methods to ensure that it met the stated goals of being easy to follow, considering both Windows and macOS operating systems, considering multiple ways of accessing the dark web, and being adaptable to future platforms. The methods of evaluation v included experimental evaluation conducted using a simulation of the framework, comparison of a previously worked dark web case using the created framework, and the expert opinion of members of the South Dakota Internet Crimes Against Children taskforce (ICAC) and the Division of Criminal Investigation (DCI). A digital component can be found in nearly every crime committed today. The Dark Web Artifact Framework is a reusable, paperless, comprehensive framework that provides investigators with a map to follow to locate the necessary artifacts to determine if the system being investigated has been used to access the dark web for the purpose of committing a crime. In the creation of this framework, a process itself was created that will contribute to future works. The yes/no, if/then structure of the framework is adaptable to fit with workflows in any area that would benefit from a recurring process

    OCR Report

    Get PDF
    Optical Character Recognition (OCR) is the most commonly known method of text extraction from digitised documents used in the cultural heritage sector. It is a process that transforms images of text into a machine-readable format. Traditionally, OCR uses technology to digitally scan text and identify letters individually, therefore recognising one character at a time. Advancements have been made over time that introduce aspects of machine learning into OCR which change this dynamic slightly, which will be explored in more detail later in this report. This report explores OCR software options broadly, in addition to past, current and future proposed OCR processes and workflows that the University of Edinburgh library may introduce

    Multimedia Retrieval

    Get PDF

    Multiple Media Correlation: Theory and Applications

    Get PDF
    This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis

    Phraseology in Corpus-Based Translation Studies: A Stylistic Study of Two Contemporary Chinese Translations of Cervantes's Don Quijote

    No full text
    The present work sets out to investigate the stylistic profiles of two modern Chinese versions of Cervantes’s Don Quijote (I): by Yang Jiang (1978), the first direct translation from Castilian to Chinese, and by Liu Jingsheng (1995), which is one of the most commercially successful versions of the Castilian literary classic. This thesis focuses on a detailed linguistic analysis carried out with the help of the latest textual analytical tools, natural language processing applications and statistical packages. The type of linguistic phenomenon singled out for study is four-character expressions (FCEXs), which are a very typical category of Chinese phraseology. The work opens with the creation of a descriptive framework for the annotation of linguistic data extracted from the parallel corpus of Don Quijote. Subsequently, the classified and extracted data are put through several statistical tests. The results of these tests prove to be very revealing regarding the different use of FCEXs in the two Chinese translations. The computational modelling of the linguistic data would seem to indicate that among other findings, while Liu’s use of archaic idioms has followed the general patterns of the original and also of Yang’s work in the first half of Don Quijote I, noticeable variations begin to emerge in the second half of Liu’s more recent version. Such an idiosyncratic use of archaisms by Liu, which may be defined as style shifting or style variation, is then analyzed in quantitative terms through the application of the proposed context-motivated theory (CMT). The results of applying the CMT-derived statistical models show that the detected stylistic variation may well point to the internal consistency of the translator in rendering the second half of Part I of the novel, which reflects his freer, more creative and experimental style of translation. Through the introduction and testing of quantitative research methods adapted from corpus linguistics and textual statistics, this thesis has made a major contribution to methodological innovation in the study of style within the context of corpus-based translation studies

    Phraseology in Corpus-based transaltion studies : stylistic study of two contempoarary Chinese translation of Cervantes's Don Quijote

    No full text
    The present work sets out to investigate the stylistic profiles of two modern Chinese versions of Cervantes???s Don Quijote (I): by Yang Jiang (1978), the first direct translation from Castilian to Chinese, and by Liu Jingsheng (1995), which is one of the most commercially successful versions of the Castilian literary classic. This thesis focuses on a detailed linguistic analysis carried out with the help of the latest textual analytical tools, natural language processing applications and statistical packages. The type of linguistic phenomenon singled out for study is four-character expressions (FCEXs), which are a very typical category of Chinese phraseology. The work opens with the creation of a descriptive framework for the annotation of linguistic data extracted from the parallel corpus of Don Quijote. Subsequently, the classified and extracted data are put through several statistical tests. The results of these tests prove to be very revealing regarding the different use of FCEXs in the two Chinese translations. The computational modelling of the linguistic data would seem to indicate that among other findings, while Liu???s use of archaic idioms has followed the general patterns of the original and also of Yang???s work in the first half of Don Quijote I, noticeable variations begin to emerge in the second half of Liu???s more recent version. Such an idiosyncratic use of archaisms by Liu, which may be defined as style shifting or style variation, is then analyzed in quantitative terms through the application of the proposed context-motivated theory (CMT). The results of applying the CMT-derived statistical models show that the detected stylistic variation may well point to the internal consistency of the translator in rendering the second half of Part I of the novel, which reflects his freer, more creative and experimental style of translation. Through the introduction and testing of quantitative research methods adapted from corpus linguistics and textual statistics, this thesis has made a major contribution to methodological innovation in the study of style within the context of corpus-based translation studies.Imperial Users onl

    Comparison of Neurological Activation Patterns of Children with and without Autism Spectrum Disorders When Verbally Responding to a Pragmatic Task

    Get PDF
    This study examined the neurological activation of children with autism spectrum disorders (ASD) while performing a pragmatic judgment task. In this study, children between the ages of 9 and 15 years responded to questions regarding a social situation, taken from the Comprehensive Assessment of Spoken Language, while concurrently having their brain activity measured. We targeted four brain regions for analysis: dorsolateral prefrontal cortex (DLPFC), orbitofrontal cortex (OFC), superior temporal gyrus (STG), and the inferior parietal lobule (IPL). Ten children with ASD and 20 typically developing (TD) children participated. Matching occurred in a bracketing manner with each child in the ASD group being matched to two control children to account for natural variability. Neuroimgaging was conducted utilizing functional Near‐Infrared Spectroscopy (fNIRS). Oxygenated and deoxygenated blood concentration levels were measured through Near‐Infrared light cap with 44 channels. The cap was placed over frontal lobe and the left lateral cortex. The placement was spatially registered using the Polhemus. Analysis indicated that children in the ASD group performed significantly poorer than their controls on the pragmatic judgment task. Mixed repeated measures analysis of variance of neurological data indicated that the children with ASD had lower concentration levels of oxygenated and total hemoglobin across the four regions. There were significantly higher concentration levels for oxygenated and total hemoglobin in the STG. Analysis of correct and incorrect responses revealed significantly more activation in the OFC when responses were correct. Additionally, there was a significant interaction of Accuracy and Group in left DLPFC. Children with ASD presented higher oxygenated hemoglobin concentration values when responding correctly, while children in the control group presented higher oxygenated hemoglobin concentration values for the incorrect items. Statistical Parametric Mapping was performed for each triad to assess the diffusion of neural activation across the frontal cortex and the left lateral cortex. Individual comparisons revealed that 7 out of 10 children with ASD demonstrated patterns consistent with more diffuse brain activation than their TD controls. Findings from this study suggest that an fNIRS study can provide important information about the level and diffusion of neural processing of verbal children and adolescents with ASD

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Forensic computing strategies for ethical academic writing.

    Get PDF
    Thesis (M.Com.)-University of KwaZulu-Natal, Westville, 2009.This study resulted in the creation of a conceptual framework for ethical academic writing that can be applied to cases of authorship identification. The framework is the culmination of research into various other forensic frameworks and aspects related to cyber forensics, in order to ensure maximum effectiveness of this newly developed methodology. The research shows how synergies between forensic linguistics and electronic forensics (computer forensics) create the conceptual space for a new, interdisciplinary, cyber forensic linguistics, along with forensic auditing procedures and tools for authorship identification. The research also shows that an individual’s unique word pattern usage can be used to determine document authorship, and that in other instances, authorship can be attributed with a significant degree of probability using the identified process. The importance of this fact cannot be understated, because accusations of plagiarism have to be based on facts that will withstand cross examination in a court of law. Therefore, forensic auditing procedures are required when attributing authorship in cases of suspected plagiarism, which is regarded as one of the most serious problems facing any academic institution. This study identifies and characterises various forms of plagiarism as well the responses that can be implemented to prevent and deter it. A number of online and offline tools for the detection and prevention of plagiarism are identified, over and above the more commonly used popular tools that, in the author’s view, are overrated because they are based on mechanistic identification of word similarities in source and target texts, rather than on proper grammatical and semantic principles. Linguistic analysis is a field not well understood and often underestimated. Yet it is a critical field of inquiry in determining specific cases of authorship. The research identifies the various methods of linguistic analysis that could be applied to help establish authorship identity, as well as how they can be applied within a forensic environment. Various software tools that could be used to identify and analyse source documents that were plagiarised are identified and briefly characterised. Concordance, function word analysis and other methods of corpus analysis are explained, along with some of their related software packages. Corpus analysis that in the past would have taken months to perform manually, could now only take a matter of hours using the correct programs, given the availability of computerised analysis tools. This research integrates the strengths of these tools within a structurally sound forensic auditing framework, the result of which is a conceptual framework that encompasses all the pertinent factors and ensures admissibility in a court of law by adhering to strict rules and features that are characteristic of the legal requirements for a forensic investigation
    • 

    corecore