377 research outputs found

    Text stylometry for chat bot identification and intelligence estimation.

    Get PDF
    Authorship identification is a technique used to identify the author of an unclaimed document, by attempting to find traits that will match those of the original author. Authorship identification has a great potential for applications in forensics. It can also be used in identifying chat bots, a form of intelligent software created to mimic the human conversations, by their unique style. The online criminal community is utilizing chat bots as a new way to steal private information and commit fraud and identity theft. The need for identifying chat bots by their style is becoming essential to overcome the danger of online criminal activities. Researchers realized the need to advance the understanding of chat bots and design programs to prevent criminal activities, whether it was an identity theft or even a terrorist threat. The more research work to advance chat bots’ ability to perceive humans, the more duties needed to be followed to confront those threats by the research community. This research went further by trying to study whether chat bots have behavioral drift. Studying text for Stylometry has been the goal for many researchers who have experimented many features and combinations of features in their experiments. A novel feature has been proposed that represented Term Frequency Inverse Document Frequency (TFIDF) and implemented that on a Byte level N-Gram. Term Frequency-Inverse Token Frequency (TF-ITF) used these terms and created the feature. The initial experiments utilizing collected data demonstrated the feasibility of this approach. Additional versions of the feature were created and tested for authorship identification. Results demonstrated that the feature was successfully used to identify authors of text, and additional experiments showed that the feature is language independent. The feature successfully identified authors of a German text. Furthermore, the feature was used in text similarities on a book level and a paragraph level. Finally, a selective combination of features was used to classify text that ranges from kindergarten level to scientific researches and novels. The feature combination measured the Quality of Writing (QoW) and the complexity of text, which were the first step to correlate that with the author’s IQ as a future goal

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Illicit Activity Detection in Large-Scale Dark and Opaque Web Social Networks

    Get PDF
    Many online chat applications live in a grey area between the legitimate web and the dark net. The Telegram network in particular can aid criminal activities. Telegram hosts “chats” which consist of varied conversations and advertisements. These chats take place among automated “bots” and human users. Classifying legitimate activity from illegitimate activity can aid law enforcement in finding criminals. Social network analysis of Telegram chats presents a difficult problem. Users can change their username or create new accounts. Users involved in criminal activity often do this to obscure their identity. This makes establishing the unique identity behind a given username challenging. Thus we explored classifying users from their language usage in their chat messages.The volume and velocity of Telegram chat data place it well within the domain of big data. Machine learning and natural language processing (NLP) tools are necessary to classify this chat data. We developed NLP tools for classifying users and the chat group to which their messages belong. We found that legitimate and illegitimate chat groups could be classified with high accuracy. We also were able to classify bots, humans, and advertisements within conversations

    Automatic IQ estimation using stylometry methods.

    Get PDF
    Stylometry is a study of text linguistic properties that brings together various field of research such as statistics, linguistics, computer science and more. Stylometry methods have been used for historic investigation, as forensic evidence and educational tool. This thesis presents a method to automatically estimate individual’s IQ based on quality of writing and discusses challenges associated with it. The method utilizes various text features and NLP techniques to calculate metrics which are used to estimate individual’s IQ. The results show a high degree of correlation between expected and estimated IQs in cases when IQ is within the average range. Obtaining good estimation for IQs on the high and low ends of the spectrum proves to be more challenging and this work offers several reasons for that. Over the years stylometry benefitted from wide exposure and interest among researches, however it appears that there aren’t many studies that focus on using stylometry methods to estimate individual’s intelligence. Perhaps this work presents the first in-depth attempt to do s

    Detecting AI generated text using neural networks

    Get PDF
    For humans, distinguishing machine generated text from human written text is men- tally taxing and slow. NLP models have been created to do this more effectively and faster. But, what if some adversarial changes have been added to the machine generated text? This thesis discusses this issue and text detectors in general. The primary goal of this thesis is to describe the current state of text detectors in research and to discuss a key adversarial issue in modern NLP transformers. To describe the current state of text detectors a Systematic Literature Review was done on 50 relevant papers to machine-centric detection in chapter 2. As for the key ad- versarial issue, chapter 3 describes an experiment where RoBERTa was used to test transformers against simple mutations which cause mislabelling. The state of the literature was written at length in the 2nd chapter, showing how viable text detection as a subject has become. Lastly, RoBERTa was shown to be vulnerable to mutation attacks. The solution was found to be fine-tuning it to some heuristics, as long as the mutations can be predicted the model can be fine tuned to detect them

    Modeling Human Group Behavior In Virtual Worlds

    Get PDF
    Virtual worlds and massively-multiplayer online games are rich sources of information about large-scale teams and groups, offering the tantalizing possibility of harvesting data about group formation, social networks, and network evolution. They provide new outlets for human social interaction that differ from both face-to-face interactions and non-physically-embodied social networking tools such as Facebook and Twitter. We aim to study group dynamics in these virtual worlds by collecting and analyzing public conversational patterns of users grouped in close physical proximity. To do this, we created a set of tools for monitoring, partitioning, and analyzing unstructured conversations between changing groups of participants in Second Life, a massively multi-player online user-constructed environment that allows users to construct and inhabit their own 3D world. Although there are some cues in the dialog, determining social interactions from unstructured chat data alone is a difficult problem, since these environments lack many of the cues that facilitate natural language processing in other conversational settings and different types of social media. Public chat data often features players who speak simultaneously, use jargon and emoticons, and only erratically adhere to conversational norms. Humans are adept social animals capable of identifying friendship groups from a combination of linguistic cues and social network patterns. But what is more important, the content of what people say or their history of social interactions? Moreover, is it possible to identify whether iii people are part of a group with changing membership merely from general network properties, such as measures of centrality and latent communities? These are the questions that we aim to answer in this thesis. The contributions of this thesis include: 1) a link prediction algorithm for identifying friendship relationships from unstructured chat data 2) a method for identifying social groups based on the results of community detection and topic analysis. The output of these two algorithms (links and group membership) are useful for studying a variety of research questions about human behavior in virtual worlds. To demonstrate this we have performed a longitudinal analysis of human groups in different regions of the Second Life virtual world. We believe that studies performed with our tools in virtual worlds will be a useful stepping stone toward creating a rich computational model of human group dynamics

    Good bot vs. bad bot: Opportunities and consequences of using automated software in corporate communications

    Get PDF
    The paper attempts to lay a foundation for research on the use of bots in corporate communications. The first step is to identify opportunities and challenges that may offer starting points for future regulations. In this research project, expert interviews were conducted in the form of guideline-based telephone interviews. A total of ten experts from the scientific community and experts from the practical field were interviewed. Following this, a qualitative-reductive content analysis was conducted with the aim of building categories and hypotheses based on them. The results show that experts from the scientific community and practical field clearly see advantages for corporate communications, but also highlight hurdles and ethical challenges that are currently seen as a major barrier to the use of bots. In this context, experts mention, among other things, the assumption of structured routine tasks, ensuring efficiency and quality in corporate communications, cost efficiency and relieving employees. On the other hand, weaknesses, like the lack of transparency, data protection and loss of control arise. Results clearly show that the ethical perspective has to be taken into account. In this context, data protection, the question of responsibility and possible manipulation intentions are particularly worth mentioning

    The Impact of Artificial Intelligence on the Evolution of Digital Education: A Comparative Study of OpenAI Text Generation Tools including ChatGPT, Bing Chat, Bard, and Ernie

    Full text link
    In the digital era, the integration of artificial intelligence (AI) in education has ushered in transformative changes, redefining teaching methodologies, curriculum planning, and student engagement. This review paper delves deep into the rapidly evolving landscape of digital education by contrasting the capabilities and impact of OpenAI's pioneering text generation tools like Bing Chat, Bard, Ernie with a keen focus on the novel ChatGPT. Grounded in a typology that views education through the lenses of system, process, and result, the paper navigates the multifaceted applications of AI. From decentralizing global education and personalizing curriculums to digitally documenting competence-based outcomes, AI stands at the forefront of educational modernization. Highlighting ChatGPT's meteoric rise to one million users in just five days, the study underscores its role in democratizing education, fostering autodidacticism, and magnifying student engagement. However, with such transformative power comes the potential for misuse, as text-generation tools can inadvertently challenge academic integrity. By juxtaposing the promise and pitfalls of AI in education, this paper advocates for a harmonized synergy between AI tools and the educational community, emphasizing the urgent need for ethical guidelines, pedagogical adaptations, and strategic collaborations
    • …
    corecore