Location of Repository

A Language Modelling approach to linking criminal styles with offender characteristics

By Richard Bache, Fabio Crestani, David V. Canter and Donna E. Youngs


The ability to infer the characteristics of offenders from their criminal behaviour (‘offender profiling’) has only been partially successful since it has relied on subjective judgments based on limited data. Words and structured data used in crime descriptions recorded by the police relate to behavioural features. Thus Language Modelling was applied to an existing police archive to link behavioural features with significant characteristics of offenders. Both multinomial and multiple Bernoulli models were used. Although categories selected are gender, age group, ethnic appearance and broad occupation (employed or not), in principle this can be applied to any characteristic recorded. Results indicate that statistically significant relationships exist between all characteristics for many types of crime. Bernoulli models tend to perform better than multinomial ones. It is also possible to identify automatically specific terms which when taken together give insight into the style of offending related to a particular group

Topics: BF
Publisher: Elsevier BV
Year: 2010
OAI identifier: oai:eprints.hud.ac.uk:8052

Suggested articles



  1. (1998). (Ed.): WordNet - An Electronic Lexical Database doi
  2. (1998). A Comparison of Event Models for Na¨ıve Bayes Text Classification,
  3. (1998). A Language Modeling Approach to Information Retrieval, In: doi
  4. (2001). A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval, in: doi
  5. (2007). Application of Language Models to Suspect Prioritisation and Suspect Likelihood doi
  6. (2003). Augmenting Na¨ıve Bayes Classifiers with Statistical Language Models. doi
  7. (2003). Bennell and A Laurance , Differentiating Sex Offences: A Behaviorally Based Thematic Classification doi
  8. (2003). Combining Na¨ıve Bayes and n-gram Language Models for Text Classification, doi
  9. (2004). Crime Data Mining: A General Framework and Some Examples, doi
  10. (1998). Differentiating arsonists: A model of firesetting actions and characteristics, doi
  11. (1999). Foundations of Statistical Natural Language Processing,
  12. (1980). Interpolation Estimation of Markov Source Parameters from Sparse Data,
  13. (2005). Language Modeling for Sentence Retrieval: A comparison between Multiple-Bernoulli Models and Multinomial Models, Information Retrieval Workshop,
  14. (2000). Offender Profiling and Criminal Differentiation, doi
  15. (2007). Predicting Offender Profiles From Offense and Victim Characteristics in: doi
  16. (2003). Probabilistic Relevance Models based on Document and Language Generation doi
  17. (2004). Text Classification Using Language Models, Asia Information Retrieval Symposium, Poster Session,

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.