2 research outputs found
Computational behavioral analytics: estimating psychological traits in foreign languages.
The rise of technology proliferating into the workplace has increased the threat of loss of intellectual property, classified, and proprietary information for companies, governments, and academics. This can cause economic damage to the creators of new IP, companies, and whole economies. This technology proliferation has also assisted terror groups and lone wolf actors in pushing their message to a larger audience or finding similar tribal groups that share common, sometimes flawed, beliefs across various social media platforms. These types of challenges have created numerous studies in psycholinguistics, as well as commercial tools, that look to assist in identifying potential threats before they have an opportunity to conduct malicious acts. This has led to an area of study that this dissertation defines as ``Computational Behavioral Analytics. A common practice espoused in various Natural Language Processing studies (both commercial and academic) conducted on foreign language text is the use of Machine Translation (MT) systems before conducting NLP tasks. In this dissertation, we explore three psycholinguistic traits conducted on foreign language text. We explore the effects (and failures) of MT systems in these types of psycholinguistic tasks in order to help push the field of study into a direction that will greatly improve the efficacy of such systems. Given the results of the experimentation in this dissertation, it is highly recommended to avoid the use of translations whenever the greatest levels of accuracy are necessary, such as for National Security and Law Enforcement purposes. If translations must be used for any reason, scientist should conduct a full analysis of the impact of their chosen translation system on their estimates to determine which traits are more significantly affected. This will help ensure that analysts and scientists are better informed of the potential inaccuracies and change any resulting decisions from the data accordingly. This dissertation introduces psycholinguistics and the benefits of using Machine Learning technologies in estimating various psychological traits, and provides a brief discussion on the potential privacy and legal issues that should be addressed in order to avoid the abuse of such systems in Chapter I. Chapter II outlines the datasets that are used during the experimentation and evaluation of the algorithms. Chapter III discusses each of the various implementations of the algorithms used in the three psycholinguistic tasks - Affect Analysis, Authorship Attribution, and Personality Estimation. Chapter IV discusses the experiments that were run in order to understand the effects of MT on the psycholinguistic tasks, and to understand how these tasks can be accomplished in the face of MT limitations, including rationale on the selection of the MT system used in this study. The dissertation concludes with Chapter V, providing a discussion and speculating on the findings and future experimentation that should be done
The Stylometric Processing of Sensory Open Source Data
This research project’s end goal is on the Lone Wolf Terrorist.
The project uses an exploratory approach to the
self-radicalisation problem by creating a stylistic fingerprint
of a person's personality, or self, from subtle characteristics
hidden in a person's writing style. It separates the identity of
one person from another based on their writing style. It also
separates the writings of suicide attackers from ‘normal'
bloggers by critical slowing down; a dynamical property used to
develop early warning signs of tipping points. It identifies
changes in a person's moods, or shifts from one state to another,
that might indicate a tipping point for self-radicalisation.
Research into authorship identity using personality is a
relatively new area in the field of neurolinguistics. There are
very few methods that model how an individual's cognitive
functions present themselves in writing. Here, we develop a
novel algorithm, RPAS, which draws on cognitive functions such as
aging, sensory processing, abstract or concrete thinking through
referential activity emotional experiences, and a person's
internal gender for identity. We use well-known techniques such
as Principal Component Analysis, Linear Discriminant Analysis,
and the Vector Space Method to cluster multiple
anonymous-authored works. Here we use a new approach, using
seriation with noise to separate subtle features in individuals.
We conduct time series analysis using modified variants of 1-lag
autocorrelation and the coefficient of skewness, two statistical
metrics that change near a tipping point, to track serious life
events in an individual through cognitive linguistic markers.
In our journey of discovery, we uncover secrets about the
Elizabethan playwrights hidden for over 400 years. We uncover
markers for depression and anxiety in modern-day writers and
identify linguistic cues for Alzheimer's disease much earlier
than other studies using sensory processing. In using these
techniques on the Lone Wolf, we can separate their writing style
used before their attacks that differs from other writing