2 research outputs found

    Computational behavioral analytics: estimating psychological traits in foreign languages.

    Get PDF
    The rise of technology proliferating into the workplace has increased the threat of loss of intellectual property, classified, and proprietary information for companies, governments, and academics. This can cause economic damage to the creators of new IP, companies, and whole economies. This technology proliferation has also assisted terror groups and lone wolf actors in pushing their message to a larger audience or finding similar tribal groups that share common, sometimes flawed, beliefs across various social media platforms. These types of challenges have created numerous studies in psycholinguistics, as well as commercial tools, that look to assist in identifying potential threats before they have an opportunity to conduct malicious acts. This has led to an area of study that this dissertation defines as ``Computational Behavioral Analytics. A common practice espoused in various Natural Language Processing studies (both commercial and academic) conducted on foreign language text is the use of Machine Translation (MT) systems before conducting NLP tasks. In this dissertation, we explore three psycholinguistic traits conducted on foreign language text. We explore the effects (and failures) of MT systems in these types of psycholinguistic tasks in order to help push the field of study into a direction that will greatly improve the efficacy of such systems. Given the results of the experimentation in this dissertation, it is highly recommended to avoid the use of translations whenever the greatest levels of accuracy are necessary, such as for National Security and Law Enforcement purposes. If translations must be used for any reason, scientist should conduct a full analysis of the impact of their chosen translation system on their estimates to determine which traits are more significantly affected. This will help ensure that analysts and scientists are better informed of the potential inaccuracies and change any resulting decisions from the data accordingly. This dissertation introduces psycholinguistics and the benefits of using Machine Learning technologies in estimating various psychological traits, and provides a brief discussion on the potential privacy and legal issues that should be addressed in order to avoid the abuse of such systems in Chapter I. Chapter II outlines the datasets that are used during the experimentation and evaluation of the algorithms. Chapter III discusses each of the various implementations of the algorithms used in the three psycholinguistic tasks - Affect Analysis, Authorship Attribution, and Personality Estimation. Chapter IV discusses the experiments that were run in order to understand the effects of MT on the psycholinguistic tasks, and to understand how these tasks can be accomplished in the face of MT limitations, including rationale on the selection of the MT system used in this study. The dissertation concludes with Chapter V, providing a discussion and speculating on the findings and future experimentation that should be done

    The Stylometric Processing of Sensory Open Source Data

    Get PDF
    This research project’s end goal is on the Lone Wolf Terrorist. The project uses an exploratory approach to the self-radicalisation problem by creating a stylistic fingerprint of a person's personality, or self, from subtle characteristics hidden in a person's writing style. It separates the identity of one person from another based on their writing style. It also separates the writings of suicide attackers from ‘normal' bloggers by critical slowing down; a dynamical property used to develop early warning signs of tipping points. It identifies changes in a person's moods, or shifts from one state to another, that might indicate a tipping point for self-radicalisation. Research into authorship identity using personality is a relatively new area in the field of neurolinguistics. There are very few methods that model how an individual's cognitive functions present themselves in writing. Here, we develop a novel algorithm, RPAS, which draws on cognitive functions such as aging, sensory processing, abstract or concrete thinking through referential activity emotional experiences, and a person's internal gender for identity. We use well-known techniques such as Principal Component Analysis, Linear Discriminant Analysis, and the Vector Space Method to cluster multiple anonymous-authored works. Here we use a new approach, using seriation with noise to separate subtle features in individuals. We conduct time series analysis using modified variants of 1-lag autocorrelation and the coefficient of skewness, two statistical metrics that change near a tipping point, to track serious life events in an individual through cognitive linguistic markers. In our journey of discovery, we uncover secrets about the Elizabethan playwrights hidden for over 400 years. We uncover markers for depression and anxiety in modern-day writers and identify linguistic cues for Alzheimer's disease much earlier than other studies using sensory processing. In using these techniques on the Lone Wolf, we can separate their writing style used before their attacks that differs from other writing
    corecore