Search CORE

198 research outputs found

Forensics Writer Identification using Text Mining and Machine Learning

Author: Alawar Saif Ali
Publication venue: RIT Scholar Works
Publication date: 01/04/2021
Field of study

Constant technological growth has resulted in the danger and seriousness of cyber-attacks, which has recently unmistakably developed in various institutions that have complex Information Technology (IT) infrastructure. For instance, for the last three (3) years, the most horrendous instances of cybercrimes were perceived globally with enormous information breaks, fake news spreading, cyberbullying, crypto-jacking, and cloud computing services. To this end, various agencies improvised techniques to curb this vice and bring perpetrators, both real and perceived, to book in relation to such serious cybersecurity issues. Consequently, Forensic Writer Identification was introduced as one of the most effective remedies to the concerned issue through a stylometry application. Indeed, the Forensic Writer Identification is a complex forensic science technology that utilizes Artificial Intelligence (AI) technology to safeguard, recognize proof, extraction, and documentation of the computer or digital explicit proof that can be utilized by the official courtroom, especially, the investigative officers in case of a criminal issue or just for data analytics. This research\u27s fundamental objective was to scrutinize Forensic Writer Identification technology aspects in twitter authorship analytics of various users globally and apply it to reduce the time to find criminals by providing the Police with the most accurate methodology. As well as compare the accuracy of different techniques. The report shall analytically follow a logical literature review that observes the vital text analysis techniques. Additionally, the research applied agile text mining methodology to extract and analyze various Twitter users\u27 texts. In essence, digital exploration for appropriate academics and scholarly artifacts was affected in various online and offline databases to expedite this research. Forensic Writer Identification for text extraction, analytics have recently appreciated reestablished attention, with extremely encouraging outcomes. In fact, this research presents an overall foundation and reason for text and author identification techniques. Scope of current techniques and applications are given, additionally tending to the issue of execution assessment. Results on various strategies are summed up, and a more inside and out illustration of two consolidated methodologies are introduced. By encompassing textural, algorithms, and allographic, emerging technologies are beginning to show valuable execution levels. Nevertheless, user acknowledgment would play a vital role with regards to the future of technology. To this end, the goal of coming up with a project proposal was to come up with an analytical system that would automate the process of authorship identification methodology in various Web 2.0 Technologies aspects globally, hence addressing the contemporary cybercrime issues

RIT Scholar Works

The Role of Social Media Forensics in Digital Forensics

Author: Dr. Vivekananth.P
Publication venue: 'Vandana Publications'
Publication date: 01/08/2022
Field of study

Social media forensics collects evidence from social media sites such as Facebook, WhatsApp, TikTok, and Snapchat to identify criminals. This paper discusses social media crimes such as hacking, photo morphing, shopping scams, cyberbullying, and link baiting. The paper deliberates the social media forensics techniques such as evidence collection, storing, analyzing, and preserving; the paper discusses the process of forensics examination in social media forensics. The paper examines the social media forensics tools such as WebPreserver, make a Website Hub, Pipl Search, TinEye, and TweetBeaver and discusses the applications of each device. The paper concludes by discussing the future of social media forensics

International Journal of Engineering and Management Research

Noise reduction and normalization of microblogging messages

Author: Gustavo Alexandre Teixeira Laboreiro
Publication venue
Publication date: 21/05/2018
Field of study

Repositório Aberto da Universidade do Porto

Who’s Blogging Now? Linguistic Features and Authorship Analysis in Sports Blogs

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: The field of authorship determination, previously largely falling under the umbrella of literary analysis but recently becoming a large subfield of forensic linguistics, has grown substantially over the last two decades. As its body of research and its record of successful forensic application continue to grow, this growth is paralleled by the demand for its application. However, methods which have undergone rigorous testing to show their reliability and replicability, allowing them to meet the strict Daubert criteria put forth by the US court system, have not truly been established. In this study, I set out to investigate how a list of parameters, many commonly used in the methodologies of previous researchers, would perform when used to test documents of bloggers from a sports blog, Winging It in Motown. Three prolific bloggers were chosen from the site, and a corpus of posts was created for each blogger which was then examined for each of the chosen parameters. One test document for each of the three bloggers which was not included in that blogger’s corpus was then chosen from the blog page, and these documents were examined for each of the parameters via the same methodologies as were used to examine the corpora. Once data for the corpora and all three test documents was obtained, the results were compared for similarity, and an author determination was made for each test document along each parameter. The findings indicated that overall the parameters were quite unsuccessful in determining authorship for these test documents based on the author corpora developed for the study. Only two parameters successfully identified the authors of the test documents at a rate higher than chance, and the possibility exists that other factors may be driving these successful identifications, demanding further research to confirm their validity as parameters for the purpose of authorship work.Dissertation/ThesisDoctoral Dissertation English 201

ASU Digital Repository

Post-authorship attribution using regularized deep neural network

Author: Celik Turgay
Marivate Vukosi
Modupe Abiodun
Olugbara Oludayo O.
Publication venue: 'MDPI AG'
Publication date: 26/07/2022
Field of study

Post-authorship attribution is a scientific process of using stylometric features to identify the genuine writer of an online text snippet such as an email, blog, forum post, or chat log. It has useful applications in manifold domains, for instance, in a verification process to proactively detect misogynistic, misandrist, xenophobic, and abusive posts on the internet or social networks. The process assumes that texts can be characterized by sequences of words that agglutinate the functional and content lyrics of a writer. However, defining an appropriate characterization of text to capture the unique writing style of an author is a complex endeavor in the discipline of computational linguistics. Moreover, posts are typically short texts with obfuscating vocabularies that might impact the accuracy of authorship attribution. The vocabularies include idioms, onomatopoeias, homophones, phonemes, synonyms, acronyms, anaphora, and polysemy. The method of the regularized deep neural network (RDNN) is introduced in this paper to circumvent the intrinsic challenges of post-authorship attribution. It is based on a convolutional neural network, bidirectional long short-term memory encoder, and distributed highway network. The neural network was used to extract lexical stylometric features that are fed into the bidirectional encoder to extract a syntactic feature-vector representation. The feature vector was then supplied as input to the distributed high networks for regularization to minimize the network-generalization error. The regularized feature vector was ultimately passed to the bidirectional decoder to learn the writing style of an author. The feature-classification layer consists of a fully connected network and a SoftMax function to make the prediction. The RDNN method was tested against thirteen state-of-the-art methods using four benchmark experimental datasets to validate its performance. Experimental results have demonstrated the effectiveness of the method when compared to the existing state-of-the-art methods on three datasets while producing comparable results on one dataset.The Department of Science and Technology (DST) and the Council for Scientific and Industrial Research (CSIR).https://www.mdpi.com/journal/applsciam2023Computer Scienc

UPSpace at the University of Pretoria

Implicit emotion detection in text

Author: Orizu Udochukwu
Publication venue
Publication date
Field of study

In text, emotion can be expressed explicitly, using emotion-bearing words (e.g. happy, guilty) or implicitly without emotion-bearing words. Existing approaches focus on the detection of explicitly expressed emotion in text. However, there are various ways to express and convey emotions without the use of these emotion-bearing words. For example, given two sentences: “The outcome of my exam makes me happy” and “I passed my exam”, both sentences express happiness, with the first expressing it explicitly and the other implying it. In this thesis, we investigate implicit emotion detection in text. We propose a rule-based approach for implicit emotion detection, which can be used without labeled corpora for training. Our results show that our approach outperforms the lexicon matching method consistently and gives competitive performance in comparison to supervised classifiers. Given that emotions such as guilt and admiration which often require the identification of blameworthiness and praiseworthiness, we also propose an approach for the detection of blame and praise in text, using an adapted psychology model, Path model to blame. Lack of benchmarking dataset led us to construct a corpus containing comments of individuals’ emotional experiences annotated as blame, praise or others. Since implicit emotion detection might be useful for conflict-of-interest (CoI) detection in Wikipedia articles, we built a CoI corpus and explored various features including linguistic and stylometric, presentation, bias and emotion features. Our results show that emotion features are important when using Nave Bayes, but the best performance is obtained with SVM on linguistic and stylometric features only. Overall, we show that a rule-based approach can be used to detect implicit emotion in the absence of labelled data; it is feasible to adopt the psychology path model to blame for blame/praise detection from text, and implicit emotion detection is beneficial for CoI detection in Wikipedia articles

Aston Publications Explorer

A Likelihood Ratio Based Forensic Text Comparison with Multiple Types of Features

Author: Sodabanlu Sirawit
Publication venue
Publication date: 01/01/2021
Field of study

This study aims at further improving forensic text comparison (FTC) under the likelihood ratio (LR) framework. While the use of the LR framework to conclude the strength of evidence is well recognised in forensic science, studies on forensic text evidence within the LR framework are limited, and this study is an attempt of alleviating this situation. There have already been initiatives to obtain LRs for textual evidence by adopting various approaches and using different sets of stylometric features. (Carne & Ishihara, 2020; Ishihara, 2014, 2017a, 2017b, 2021). However, only few features have been tested in the similarity-only score-based approach (Ishihara, 2021), and there are many features left to be further investigated. To achieve the aim of the study, we will investigate some of the features in LR-based FTC and demonstrate how they contribute to the further improvement of the LR-based FTC system. Statistic, word n-gram (n=1,2,3), character n-gram (n=1,2,3,4), and part of speech (POS) n-gram (n=1,2,3) features were separately tested first in this study, and then the separately estimated LRs were fused for overall LRs. The databased used was prepared by Ishihara (2021), and the documents of comparison were modelled into feature vectors using a bag-of-words model. Two groups of documents, which both contained documents of 700, 1,400, and 2,100 words, were concatenated for each author, resulting in the total of 719 same-author comparisons and 516,242 different-author comparisons. The Cosine similarity was used to measure the similarity of texts, and the similarity-only score-based approach was used to estimate the LRs from the scores of similarity (Helper et al., 2012; Bolck et al., 2015). Log-likelihood ratio cost (Cllr) and their composites—Cllrmin and Cllrcal—were used as assessment metrics. Findings indicate that (a) when the LRs of all the feature types are fused, the fused Cllr values are 0.56, 0.30, and 0.19 for 700, 1,400, and 2,100 words, respectively, and (b) feature selection depending on the nature of an FTC task matters to the performance of the FTC system and can contribute to the improvement of LR-based FTC

The Australian National University

The Rise of Marvel and DC\u27s Transmedia Superheroes: Comic Book Adaptations, Fanboy Auteurs, and Guiding Fan Reception

Author: Brundige Alex
Publication venue: Scholarship@Western
Publication date: 19/08/2015
Field of study

This thesis highlights the industrial strategy of Marvel Studios and DC Entertainment in adapting their comic book properties to the screen, engaging in an analysis of how these studios appeal to a mainstream audience by harnessing the enthusiasm of comic book fans. It proposes that the studios’ branding strategies were based in establishing their products as authentic representations of the source texts, strategically employing what Suzanne Scott calls “fanboy auteurs” – filmmakers with strong connections to the comic material – in order to lend credibility to their franchises. Situating the comic book films of Joss Whedon and Christopher Nolan as exemplary case studies, it proposes that these figures mediate fan interests and studio authority. Finally, this thesis traces how this industrial strategy has changed to accommodate unofficial modes of fan activity inherent to participatory culture

Scholarship@Western

Civic Engagement 2.0: A Blended Pedagogy of Multiliteracies and Activism

Author: Goodling Lauri B.
Publication venue: ScholarWorks @ Georgia State University
Publication date: 11/05/2015
Field of study

This study looks at the practice of teaching civic engagement through digital and Web 2.0 tools and examines the impact on agency and self-efficacy of first-year writing students. The primary focus is studying student attitudes toward use of these tools, civic engagement in general, and the perceived value of engaging civically through use of these tools with the hopes of better understanding the value of this work and the impact it will have on future civic, community, and political engagement. Based on the findings of a triad of studies published in 2012 – a CIRCLE study (“That’s Not Democracy”), Giovanna Mascheroni’s study of Italian youth and political uses of the web, and a study conducted by DoSomething.org – the researcher designed a first-year composition course that asked students to choose a cause or issue for the duration of the semester and take on roles of informer, reformer, advocate, and activist on three fronts: Twitter (microblogging), Wordpress (blogging), and YouTube (digital advocacy videos). A feminist methodology was used for this study, understanding that the participatory nature of the research was an essential part of the ethos of the researcher. Qualitative data was collected through analysis of student work, reflection essays, and semi-structured focus group conversations. Through the focus group discussions, the student participants and the researcher worked collaboratively to create knowledge. The findings of this study echoed those of the three studies mentioned above. In addition to showing that instruction and experience with digital civic engagement are linked to an increased likelihood to engage in the future, the study showed that there are numerous benefits to teaching new media, civic, and academic literacies through an activist lens in writing studies. Students acquire a host of academic and professional skills that will help them succeed in the classroom and their future careers. Beyond acquisition of research and 21st century writing skills, teaching digital activism empowers students, increases agency, and helps them grasp the value of disrupting existing, outdated, or oppressive power dynamics in effective ways. Finally, it helps develop lifelong learners who are self-motivated

ScholarWorks @ Georgia State University