5,060 research outputs found

    Text stylometry for chat bot identification and intelligence estimation.

    Get PDF
    Authorship identification is a technique used to identify the author of an unclaimed document, by attempting to find traits that will match those of the original author. Authorship identification has a great potential for applications in forensics. It can also be used in identifying chat bots, a form of intelligent software created to mimic the human conversations, by their unique style. The online criminal community is utilizing chat bots as a new way to steal private information and commit fraud and identity theft. The need for identifying chat bots by their style is becoming essential to overcome the danger of online criminal activities. Researchers realized the need to advance the understanding of chat bots and design programs to prevent criminal activities, whether it was an identity theft or even a terrorist threat. The more research work to advance chat bots’ ability to perceive humans, the more duties needed to be followed to confront those threats by the research community. This research went further by trying to study whether chat bots have behavioral drift. Studying text for Stylometry has been the goal for many researchers who have experimented many features and combinations of features in their experiments. A novel feature has been proposed that represented Term Frequency Inverse Document Frequency (TFIDF) and implemented that on a Byte level N-Gram. Term Frequency-Inverse Token Frequency (TF-ITF) used these terms and created the feature. The initial experiments utilizing collected data demonstrated the feasibility of this approach. Additional versions of the feature were created and tested for authorship identification. Results demonstrated that the feature was successfully used to identify authors of text, and additional experiments showed that the feature is language independent. The feature successfully identified authors of a German text. Furthermore, the feature was used in text similarities on a book level and a paragraph level. Finally, a selective combination of features was used to classify text that ranges from kindergarten level to scientific researches and novels. The feature combination measured the Quality of Writing (QoW) and the complexity of text, which were the first step to correlate that with the author’s IQ as a future goal

    Innovation processes for inference

    Full text link
    In this letter, we introduce a new approach to quantify the closeness of symbolic sequences and test it in the framework of the authorship attribution problem. The method, based on a recently discovered urn representation of the Pitman-Yor process, is highly accurate compared to other state-of-the-art methods, featuring a substantial gain in computational efficiency and theoretical transparency. Our work establishes a clear connection between urn models critical in interpreting innovation processes and nonparametric Bayesian inference. It opens the way to design more efficient inference methods in the presence of complex correlation patterns and non-stationary dynamics

    Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

    Get PDF
    Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

    Dealing with temporal inconsistency in automated computer forensic profiling

    Get PDF
    Computer profiling is the automated forensic examination of a computer system in order to provide a human investigator with a characterisation of the activities that have taken place on that system. As part of this process, the logical components of the computer system – components such as users, files and applications - are enumerated and the relationships between them discovered and reported. This information is enriched with traces of historical activity drawn from system logs and from evidence of events found in the computer file system. A potential problem with the use of such information is that some of it may be inconsistent and contradictory thus compromising its value. This work examines the impact of temporal inconsistency in such information and discusses two types of temporal inconsistency that may arise – inconsistency arising out of the normal errant behaviour of a computer system, and inconsistency arising out of deliberate tampering by a suspect – and techniques for dealing with inconsistencies of the latter kind. We examine the impact of deliberate tampering through experiments conducted with prototype computer profiling software. Based on the results of these experiments, we discuss techniques which can be employed in computer profiling to deal with such temporal inconsistencies

    Public awareness of the scientific consensus on climate

    Get PDF
    Questions about climate change elicit some of the widest political divisions of any items on recent U.S. surveys. Severe polarization affects even basic questions about the reality of anthropogenic climate change (ACC), or whether most scientists agree that humans are changing the Earth’s climate. Statements about scientific consensus have been contentious among social scientists, with some arguing for consensus awareness as a “gateway cognition” that leads to greater public acceptance of ACC, but others characterizing consensus messaging (deliberate communication about the level of scientific agreement) as a counterproductive tactic that exacerbates polarization. A series of statewide surveys, with nationwide benchmarks, repeated questions about the reality of ACC and scientific consensus many times over 2010 to 2016. These data permit tests for change in beliefs and polarization. ACC and consensus beliefs have similar trends and individual background predictors. Both rose gradually by about 10 points over 2010 to 2016, showing no abrupt shifts that might correspond to events such as scientific reports, leadership statements, or weather. Growing awareness of the scientific consensus, whether from deliberate messaging or the cumulative impact of many studies and publicly engaged scientists, provides the most plausible explanation for this rise in both series. In state-level data, the gap between liberal and conservative views on the reality of ACC did not widen over this period, whereas the liberal–conservative gap regarding existence of a scientific consensus narrowed

    Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

    Get PDF
    Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups
    • 

    corecore