11 research outputs found

    Detecting fraud: Utilizing new technology to advance the audit profession

    Get PDF

    Modeling Suspicious Email Detection using Enhanced Feature Selection

    Full text link
    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algorithms achieved good accuracy for the desired task. However, the results achieved by those algorithms can be further improved by using appropriate feature selection mechanisms. We have identified the use of a specific feature selection scheme that improves the performance of the existing algorithms

    Continuous Auditing: Technology Involved

    Get PDF
    This study will concentrate on the latest factor causing changes in the domain of accountancy: technological advances. With a great deal of creativity and ingenuity, accountants around the world were able to find solutions to one of the problems that arose: increased fraudulent behavior. These, at times, involved a level of technology that was still not fully understood by all its users. This paper is going to focus on one of the ways that technology was applied to react to these changes: continuous auditing and monitoring. The idea of continuously auditing/monitoring the events and transactions of companies is not a new one, but innovations in technology have redefined it. Through explanation and demonstration of three continuous auditing models, this paper will attempt to bring some light on the topic and give an insight on the technology required for such a practice to be carried out effectively. Possible drawbacks and obstacles of incorporating the system in a company’s day-to-day activities will be also looked at, and recommendations will be made

    E-Mail Management: A Techno-Managerial Research Perspective

    Get PDF
    A panel session on e-mail management was organized at ICIS 2005 in Las Vegas, Nev. The panelists provided perspectives from industry as well as academia and discussed various problems in e-mail management, research methodologies to address these problems, various research opportunities, and an integrative framework for research on e-mail management. This paper succinctly summarizes the presentations made by the panelists during the session and issues raised by the audience. A rich bibliography and Web links are provided at the end for researchers interested in this area of research

    Fool’s Errand:Looking at April Fools Hoaxes as Disinformation through the Lens of Deception and Humour

    Get PDF
    Every year on April 1st, people play practical jokes on one another and news websites fabricate false stories with the goal of making fools of their audience. In an age of disinformation, with Facebook under fire for allowing “Fake News” to spread on their platform, every day can feel like April Fools’ day. We create a dataset of April Fools’ hoax news articles and build a set of features based on past research examining deception, humour, and satire. Analysis of our dataset and features suggests that looking at the structural complexity and levels of detail in a text are the most important types of feature in characterising April Fools’. We propose that these features are also very useful for understanding Fake News, and disinformation more widely

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Let’s lie together:Co-presence effects on children’s deceptive skills

    Get PDF

    IDENTITY RESOLUTION IN EMAIL COLLECTIONS

    Get PDF
    Access to historically significant email collections poses challenges that arise less often in personal collections. Most notably, people exploring a large collection of emails, in which they were not sending or receiving, may not be very familiar with the discussions that exist in this collection. They would not only need to focus on understanding the topical content of those discussions, but would also find it useful to understand who the people sending, receiving, or mentioned in these discussions were. In this dissertation, the problem of resolving personal identity in the context of large email collections is tackled. In such collections, a common name (e.g., John) might easily refer to any one of several hundred people; when one of these people was mentioned in an email, the question then arises: "who is that John?'' To "resolve identity'' of people in an email collection, two problems need to be solved: (1) modeling the identity of the participants in that collection, and (2) resolving name-mentions (that appeared in the body of the messages) to these identities. To tackle the first problem, a simple computational model of identity, that is built on extracting unambiguous references (e.g., full names from headers, or nicknames from free-text signatures) to people from the whole collection, is presented. To tackle the second problem, a generative probabilistic approach that leverages the model of identity to resolve mentions is presented. The approach is motivated by intuitions about the way people might refer to others in an email; it expands the context surrounding a mention in four directions: the message where the mention was observed, the thread that includes that message, topically-related messages, and messages sent or received by the original communicating parties. It relies on less ambiguous references (e.g., email addresses or full names) that are observed in some context of a given mention to rank potential referents of that mention. In order to jointly resolve all mentions in the collection, a parallel implementation is presented using the MapReduce distributed-programming framework. The implementation decomposes the structure of the resolution process into subcomponents that fit the MapReduce task model well. At the heart of that implementation, a parallel algorithm for efficient computation of pairwise document similarity in large collections is proposed as a general solution that can be used for scalable context expansion of all mentions and other applications as well. The resolution approach compares favorably with previously-reported techniques on small test collections (sets of mention-queries that were manually resolved beforehand) that were used to evaluate the task in the literature. However, the mention-queries in those collections, besides being relatively few in number, are limited in that all refer to people for whom a substantial amount of evidence would be expected to be available in the collection thus omitting the "long tail'' of the identity distribution for which less evidence is available. This motivated the development of a new test collection that now is the largest and best-balanced test collection available for the task. To build this collection, a user study was conducted that also provided some insight into the difficulty of the task and how time-consuming it is when humans perform it, and the reliability of their task performance. The study revealed that at least 80% of the 584 annotated mentions were resolvable to people who had sent or received email within the same collection. The new test collection was used to experimentally evaluate the resolution system. The results highlight the importance of the social context (that includes messages sent or received by the original communicating parties) when resolving mentions in email. Moreover, the results show that combining evidence from multiple types of contexts yields better resolution than what can be achieved using any individual context. The one-best selection is correct 74% of the time when tested on the full set of the mention-queries, and 51% of the time when tested on the mention-queries labeled as "hard'' by the annotators. Experiments run with iterative reformulation of the resolution algorithm resulted in modest gains only for the second iteration in the social context expansion

    Complex network tools to enable identification of a criminal community

    Get PDF
    Retrieving criminal ties and mining evidence from an organised crime incident, for example money laundering, has been a difficult task for crime investigators due to the involvement of different groups of people and their complex relationships. Extracting the criminal association from enormous amount of raw data and representing them explicitly is tedious and time consuming. A study of the complex networks literature reveals that graph-based detection methods have not, as yet, been used for money laundering detection. In this research, I explore the use of complex network analysis to identify the money laundering criminals’ communication associations, that is, the important people who communicate between known criminals and the reliance of the known criminals on the other individuals in a communication path. For this purpose, I use the publicly available Enron email database that happens to contain the communications of 10 criminals who were convicted of a money laundering crime. I show that my new shortest paths network search algorithm (SPNSA) combining shortest paths and network centrality measures is better able to isolate and identify criminals’ connections when compared with existing community detection algorithms and k-neighbourhood detection. The SPNSA is validated using three different investigative scenarios and in each scenario, the criminal network graphs formed are small and sparse hence suitable for further investigation. My research starts with isolating emails with ‘BCC’ recipients with a minimum of two recipients bcc-ed. ‘BCC’ recipients are inherently secretive and the email connections imply a trust relationship between sender and ‘BCC’ recipients. There are no studies on the usage of only those emails that have ‘BCC’ recipients to form a trust network, which leads me to analyse the ‘BCC’ email group separately. SPNSA is able to identify the group of criminals and their active intermediaries in this ‘BCC’ trust network. Corroborating this information with published information about the crimes that led to the collapse of Enron yields the discovery of persons of interest that were hidden between criminals, and could have contributed to the money laundering activity. For validation, larger email datasets that comprise of all ‘BCC’ and ‘TO/CC’ email transactions are used. On comparison with existing community detection algorithms, SPNSA is found to perform much better with regards to isolating the sub-networks that contain criminals. I have adapted the betweenness centrality measure to develop a reliance measure. This measure calculates the reliance of a criminal on an intermediate node and ranks the importance level of each intermediate node based on this reliability value. Both SPNSA and the reliance measure could be used as primary investigation tools to investigate connections between criminals in a complex network

    Exploring Novel Datasets and Methods for the Study of False Information

    Get PDF
    False information has increasingly become a subject of much discussion. Recently, disinformation has been linked to causing massive social harm, leading to the decline of democracy, and hindering global efforts in an international health crisis. In computing, and specifically Natural Language Processing (NLP), much effort has been put into tackling this problem. This has led to an increase of research in automated fact-checking and the language of disinformation. However, current research suffers from looking at a limited variety of sources. Much focus has, understandably, been given to platforms such as Twitter, Facebook and WhatsApp, as well as on traditional news articles online. Few works in NLP have looked at the specific communities where false information ferments. There has also been something of a topical constraint, with most examples of “Fake News” relating to current political issues. This thesis contributes to this rapidly growing research area by looking wider for new sources of data, and developing methods to analyse them. Specifically, it introduces two new datasets to the field and performs analyses on both. The first of these, a corpus of April Fools hoaxes, is analysed with a feature-driven approach to examine the generalisability of different features in the classification of false information. This is the first corpus of April Fools news articles, and is publicly available for researchers. The second dataset, a corpus of online Flat Earth communities, is also the first of its kind. In addition to performing the first NLP analysis of the language of Flat Earth fora, an exploration is performed to look for the existence of sub-groups within these communities, as well as an analysis of language change. To support this analysis, language change methods are surveyed, and a new method for comparing the language change of groups over time is developed. The methods used, brought together from both NLP and Corpus Linguistics, provide new insight into the language of false information, and the way communities discuss it
    corecore