6 research outputs found

    Social Network Analysis Using a Multi-agent System: A School System Case

    Get PDF
    The quality of k-12 education has been a major concern in the nation for years. School systems, just like many other social networks, appear to have a hierarchical structure. Understanding this structure could be the key to better evaluate student performance and improve school quality. Many researches have been focusing on detecting hierarchical structure by using hierarchical clustering algorithms. Compared to existing methods, we design an interaction-based similarity measure to accomplish hierarchical clustering in order to detect hierarchical structures in social networks (e.g. school district networks). This method uses a Multi-agent System for it is based on agent interactions. With the network structure detected, we also build a model, which is inspired by the MAXQ algorithm, to decompose funding policy task into subtask and then evaluate these subtasks by using funding distribution policies from past years and looking for possible relationships between student performances and funding policies. For experiment, we use real school data from Bexar county’s 15 school districts. The first result shows that our interaction based method is able to generate meaningful clustering and dendrogram for social networks. And our policy evaluation model is able to evaluate funding policies from past three years in Bexar County and conclude that increasing funding does not necessarily have a positive impact on student performance and it is generally not the case that the more spend the better

    Large-scale clique cover of real-world networks

    Get PDF
    The edge clique cover (ECC ) problem deals with discovering a set of (possibly overlapping) cliques in a given graph that covers each of the graph's edges. This problem finds applications ranging from social networks to compiler optimization and stringology. We consider several variants of the ECC problem, using classical quality measures (like the number of cliques) and new ones. We describe efficient heuristic algorithms, the fastest one taking O(mdG) time for a graph with m edges, degeneracy dG (also known as k-core number). For large real-world networks with millions of nodes, like social networks, an algorithm should have (almost) linear running time to be practical: Our algorithm for finding ECCs of large networks has linear-time performance in practice because dG is small, as our experiments show, on real-world networks with thousands to several million nodes

    Complex network tools to enable identification of a criminal community

    Get PDF
    Retrieving criminal ties and mining evidence from an organised crime incident, for example money laundering, has been a difficult task for crime investigators due to the involvement of different groups of people and their complex relationships. Extracting the criminal association from enormous amount of raw data and representing them explicitly is tedious and time consuming. A study of the complex networks literature reveals that graph-based detection methods have not, as yet, been used for money laundering detection. In this research, I explore the use of complex network analysis to identify the money laundering criminals’ communication associations, that is, the important people who communicate between known criminals and the reliance of the known criminals on the other individuals in a communication path. For this purpose, I use the publicly available Enron email database that happens to contain the communications of 10 criminals who were convicted of a money laundering crime. I show that my new shortest paths network search algorithm (SPNSA) combining shortest paths and network centrality measures is better able to isolate and identify criminals’ connections when compared with existing community detection algorithms and k-neighbourhood detection. The SPNSA is validated using three different investigative scenarios and in each scenario, the criminal network graphs formed are small and sparse hence suitable for further investigation. My research starts with isolating emails with ‘BCC’ recipients with a minimum of two recipients bcc-ed. ‘BCC’ recipients are inherently secretive and the email connections imply a trust relationship between sender and ‘BCC’ recipients. There are no studies on the usage of only those emails that have ‘BCC’ recipients to form a trust network, which leads me to analyse the ‘BCC’ email group separately. SPNSA is able to identify the group of criminals and their active intermediaries in this ‘BCC’ trust network. Corroborating this information with published information about the crimes that led to the collapse of Enron yields the discovery of persons of interest that were hidden between criminals, and could have contributed to the money laundering activity. For validation, larger email datasets that comprise of all ‘BCC’ and ‘TO/CC’ email transactions are used. On comparison with existing community detection algorithms, SPNSA is found to perform much better with regards to isolating the sub-networks that contain criminals. I have adapted the betweenness centrality measure to develop a reliance measure. This measure calculates the reliance of a criminal on an intermediate node and ranks the importance level of each intermediate node based on this reliability value. Both SPNSA and the reliance measure could be used as primary investigation tools to investigate connections between criminals in a complex network

    Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails

    Get PDF
    This thesis empirically investigates how a corpus linguistic approach can address the main theoretical and methodological challenges facing the field of forensic authorship analysis. Linguists approach the problem of questioned authorship from the theoretical position that each person has their own distinctive idiolect (Coulthard 2004: 431). However, the notion of idiolect has come under scrutiny in forensic linguistics over recent years for being too abstract to be of practical use (Grant 2010; Turell 2010). At the same time, two competing methodologies have developed in authorship analysis. On the one hand, there are qualitative stylistic approaches, and on the other there are statistical ‘stylometric’ techniques. This study uses a corpus of over 60,000 emails and 2.5 million words written by 176 employees of the former American company Enron to tackle these issues in the contexts of both authorship attribution (identifying authors using linguistic evidence) and author profiling (predicting authors’ social characteristics using linguistic evidence). Analyses reveal that even in shared communicative contexts, and when using very common lexical items, individual Enron employees produce distinctive collocation patterns and lexical co-selections. In turn, these idiolectal elements of linguistic output can be captured and quantified by word n-grams (strings of n words). An attribution experiment is performed using word n-grams to identify the authors of anonymised email samples. Results of the experiment are encouraging, and it is argued that the approach developed here offers a means by which stylistic and statistical techniques can complement each other. Finally, quantitative and qualitative analyses are combined in the sociolinguistic profiling of Enron employees by gender and occupation. Current author profiling research is exclusively statistical in nature. However, the findings here demonstrate that when statistical results are augmented by qualitative evidence, the complex relationship between language use and author identity can be more accurately observed
    corecore