4 research outputs found

    Complex network tools to enable identification of a criminal community

    Get PDF
    Retrieving criminal ties and mining evidence from an organised crime incident, for example money laundering, has been a difficult task for crime investigators due to the involvement of different groups of people and their complex relationships. Extracting the criminal association from enormous amount of raw data and representing them explicitly is tedious and time consuming. A study of the complex networks literature reveals that graph-based detection methods have not, as yet, been used for money laundering detection. In this research, I explore the use of complex network analysis to identify the money laundering criminals’ communication associations, that is, the important people who communicate between known criminals and the reliance of the known criminals on the other individuals in a communication path. For this purpose, I use the publicly available Enron email database that happens to contain the communications of 10 criminals who were convicted of a money laundering crime. I show that my new shortest paths network search algorithm (SPNSA) combining shortest paths and network centrality measures is better able to isolate and identify criminals’ connections when compared with existing community detection algorithms and k-neighbourhood detection. The SPNSA is validated using three different investigative scenarios and in each scenario, the criminal network graphs formed are small and sparse hence suitable for further investigation. My research starts with isolating emails with ‘BCC’ recipients with a minimum of two recipients bcc-ed. ‘BCC’ recipients are inherently secretive and the email connections imply a trust relationship between sender and ‘BCC’ recipients. There are no studies on the usage of only those emails that have ‘BCC’ recipients to form a trust network, which leads me to analyse the ‘BCC’ email group separately. SPNSA is able to identify the group of criminals and their active intermediaries in this ‘BCC’ trust network. Corroborating this information with published information about the crimes that led to the collapse of Enron yields the discovery of persons of interest that were hidden between criminals, and could have contributed to the money laundering activity. For validation, larger email datasets that comprise of all ‘BCC’ and ‘TO/CC’ email transactions are used. On comparison with existing community detection algorithms, SPNSA is found to perform much better with regards to isolating the sub-networks that contain criminals. I have adapted the betweenness centrality measure to develop a reliance measure. This measure calculates the reliance of a criminal on an intermediate node and ranks the importance level of each intermediate node based on this reliability value. Both SPNSA and the reliance measure could be used as primary investigation tools to investigate connections between criminals in a complex network

    Automatic refinement of large-scale cross-domain knowledge graphs

    Get PDF
    Knowledge graphs are a way to represent complex structured and unstructured information integrated into an ontology, with which one can reason about the existing information to deduce new information or highlight inconsistencies. Knowledge graphs are divided into the terminology box (TBox), also known as ontology, and the assertions box (ABox). The former consists of a set of schema axioms defining classes and properties which describe the data domain. Whereas the ABox consists of a set of facts describing instances in terms of the TBox vocabulary. In the recent years, there have been several initiatives for creating large-scale cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO, and Wikidata being amongst the most successful free datasets. Those graphs are often constructed with the extraction of information from semi-structured knowledge, such as Wikipedia, or unstructured text from the web using NLP methods. It is unlikely, in particular when heuristic methods are applied and unreliable sources are used, that the knowledge graph is fully correct or complete. There is a tradeoff between completeness and correctness, which is addressed differently in each knowledge graph’s construction approach. There is a wide variety of applications for knowledge graphs, e.g. semantic search and discovery, question answering, recommender systems, expert systems and personal assistants. The quality of a knowledge graph is crucial for its applications. In order to further increase the quality of such large-scale knowledge graphs, various automatic refinement methods have been proposed. Those methods try to infer and add missing knowledge to the graph, or detect erroneous pieces of information. In this thesis, we investigate the problem of automatic knowledge graph refinement and propose methods that address the problem from two directions, automatic refinement of the TBox and of the ABox. In Part I we address the ABox refinement problem. We propose a method for predicting missing type assertions using hierarchical multilabel classifiers and ingoing/ outgoing links as features. We also present an approach to detection of relation assertion errors which exploits type and path patterns in the graph. Moreover, we propose an approach to correction of relation errors originating from confusions between entities. Also in the ABox refinement direction, we propose a knowledge graph model and process for synthesizing knowledge graphs for benchmarking ABox completion methods. In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using SHACL.We introduce an ILP refinement step which exploits correlations between numerical attributes and relations in order to the efficiently learn Horn rules with numerical attributes. Finally, we investigate the introduction of lexical information from textual corpora into the ILP algorithm in order to improve quality of induced class expressions

    Similarity measures and algorithms for cartographic schematization

    Get PDF

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore