Cyber-crime targeting children such as online pedophile activity are a major
and a growing concern to society. A deep understanding of predatory chat
conversations on the Internet has implications in designing effective solutions
to automatically identify malicious conversations from regular conversations.
We believe that a deeper understanding of the pedophile conversation can result
in more sophisticated and robust surveillance systems than majority of the
current systems relying only on shallow processing such as simple word-counting
or key-word spotting.
In this paper, we study pedophile conversations from the perspective of
online grooming theory and perform a series of linguistic-based empirical
analysis on several pedophile chat conversations to gain useful insights and
patterns. We manually annotated 75 pedophile chat conversations with six stages
of online grooming and test several hypothesis on it. The results of our
experiments reveal that relationship forming is the most dominant online
grooming stage in contrast to the sexual stage. We use a widely used
word-counting program (LIWC) to create psycho-linguistic profiles for each of
the six online grooming stages to discover interesting textual patterns useful
to improve our understanding of the online pedophile phenomenon. Furthermore,
we present empirical results that throw light on various aspects of a pedophile
conversation such as probability of state transitions from one stage to
another, distribution of a pedophile chat conversation across various online
grooming stages and correlations between pre-defined word categories and online
grooming stages