1,028 research outputs found

    Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

    Get PDF
    Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

    Imbalanced learning in assessing the risk of corruption in public administration

    Get PDF
    This research aims to identify the corruption of the civil servants in the Federal District, Brazilian Public Administration. For this purpose, a predictive model was created integrating data from eight different systems and applying logistic regression to real datasets that, by their nature, present a low percentage of examples of interest in identifying patterns for machine learning, a situation defined as a class imbalance. In this study, the imbalance of classes was considered extreme at a ratio of 1:707 or, in percentage terms, 0.14% of the interest class to the population. Two possible approaches were used, balancing with resampling techniques using synthetic minority oversampling technique SMOTE and applying algorithms with specific parameterization to obtain the desired standards of the minority class without generating bias from the dominant class. The best modeling result was obtained by applying it to the second approach, generating an area value on the ROC curve of around 0.69. Based on sixty-eight features, the respective coefficients that correspond to the risk factors for corruption were found. A subset of twenty features is discussed in order to find practical utility after the discovery process.s. L.Cavique would like to thank the FCT Projects of Scientific Research and Technological Development in Data Science and Artificial Intelligence in Public Administration, 2018–2022 (DSAIPA/DS/0039/2018), for its support.info:eu-repo/semantics/publishedVersio

    The forensic accounting and corporate fraud

    Get PDF
    This study is aimed at analyzing the characteristics of forensic accounting services performed by accounting firms in Brazil, using an exploratory approach. At the end of the study, there is a discourse analysis of a speech made by the CEO of one of the key players in forensic accounting services (Kroll) in Brazil. In order to guide this reflection, we pose the following question: what is the characteristic of forensic accounting that substantiates professional accountants' innovation to curb corporate accounting malpractices? In this intent, we accept the premise that the bone of contention in some unhealthy business environments is the inability of an auditor to track frauds. We used the icons (categories and/or nodes) that dynamically represent formalism in the theory of self re-production to explain the patterns found in the speech. Our findings make us conclude that the idea that frauds have been least detected by auditors begins to gain shape as auditors are more adequately trained to detect frauds instead of emphasizing the traditional segregation of duties and safeguard of assets

    The role of technology in improving the Customer Experience in the banking sector: a systematic mapping study

    Get PDF
    Information Technology (IT) has revolutionized the way we manage our money. The adoption of innovative technologies in banking scenarios allows to access old and new financial services but in a faster and more secure, comfortable, rewarding and engaging way. The number, the performances and the seamless integration of these innovations is a driver for banks to retain their customers and avoid costly change of hearts. The literature is rich in works reporting on the use of technology with direct or indirect impact on the experience of banking customers. Some mapping studies about the adoption of technologies in the field exist, but they are specific to particular technologies (e.g., only Artificial Intelligence), or vice versa too generic (e.g., reviewing the adoption of technologies to support any kind of banking process). So a specific research effort on the crossed domain of technology and Customer Experience (CX) is missing. This paper aims to overcome the following gaps: the lack of a comprehensive map of the research made in the field in the past decade; a discussion on the current research trends of top publications and journals is missing; the next research challenges are yet to be identified. To face these limitations, we designed and submitted 7 different queries to pull papers out of 4 popular scientific databases. From an initial set of 6,756 results, we identified a set of 89 primary studies that we thoroughly analyzed. A selection of the top 20% works allowed us to seek the most performant technologies as well as other promising ones that have not been experimented yet in the field. Main results prove that the combined study of technology and CX in the banking sector is not approached systematically and thus the development of a new specific research line is needed

    Deep learning and explainable artificial intelligence techniques applied for detecting money laundering – a critical review

    Full text link
    Money laundering has been a global issue for decades, which is one of the major threat for economy and society. Government, regulatory and financial institutions are combating it together in their respective capacity, however still billions of dollars in fines by authorities make the headlines in the news. High-speed internet services have enabled financial institutions to deliver better customer experience through multi-channel engagements, which has led to exponential growth in transactions and new avenues for laundering the money for fraudsters. Literature shows the usage of statistical methods, data mining and Machine Learning (ML) techniques for money laundering detection, but limited research on Deep Learning (DL) techniques, primarily due to lack of model interpretability and explainability of the decisions made. Several studies are conducted on application of ML for Anti-Money Laundering (AML), and Explainable Artificial Intelligence (XAI) techniques in general, but lacks the study on usage of DL techniques together with XAI. This paper aims to review the current state-of-the-art literature on DL together with XAI for identifying suspicious money laundering transactions and identify future research areas. Key findings of the review are, researchers have preferred variants of Convolutional Neural Networks, and AutoEncoder; graph deep learning together with natural language processing is emerging as an important technology for AML; XAI use is not seen in AML domain; 51% ML methods used in AML are non-interpretable, 58% studies used sample of old real data; key challenges for researchers are access to recent real transaction data and scarcity of labelled training data; and data being highly imbalanced. Future research directions are, application of XAI techniques to bring-out explainability, graph deep learning using natural language processing (NLP), unsupervised and reinforcement learning to handle lack of labelled data; and joint research programs between research community and industry to benefit from domain knowledge and controlled access to data

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel

    Systematic Literature Review: Current Products, Topic, and Implementation of Graph Database

    Get PDF
    Planning, developing, and updating software cannot be separated from the role of the database. From various types of databases, graph databases are considered to have various advantages over their predecessor, relational databases. Graph databases then become the latest trend in the software and data science industry, apart from the development of graph theory itself. The proliferation of research on GDB in the last decade raises questions about what topics are associated with GDB, what industries use GDB in its data processing, what the GDB models are, and what types of GDB have been used most frequently by users in the last few years. This article aims to answer these questions through a Literature Review, which is carried out by determining objectives, determining the limits of review coverage, determining inclusion and exclusion criteria for data retrieval, data extraction, and quality assessment. Based on a review of 60 studies, several research topics related to GDB are Semantic Web, Big Data, and Parallel computing. A total of 19 (30%) studies used Neo4j as their database. Apart from Social Networks, the industries that implement GDB the most are the Transportation sector, Scientific Article Networks, and general sectors such as Enterprise Data, Biological data, and History data. This Literature Review concludes that research on the topic of the Graph Database is still developing in the future. This is shown by the breadth of application and the variety of new derivatives of GDB products offered by researchers to address existing problems

    Analyzing the Production and Use of Fossil Fuels: A Case for Data Mining and GIS

    Get PDF
    As technology progresses and data grows both larger and more complex, techniques are being developed to keep up with the exponential growth of information. The term “data mining” is a blanket term used to describe an approach to find anomalies and correlations in a large dataset. This approach involves leveraging data mining software to manipulate and prepare data, apply statistics to quantify trends and characteristics in the data from a high level, and potentially apply advanced techniques like machine learning to identify patterns that wouldn’t be apparent otherwise. In this case study, data mining aided a GIS in displaying substantial amounts of oil, gas, and coal data to make observations regarding two groups: OPEC and the largest non-OPEC fossil fuel producers from 1980 to 2020. To make more sophisticated observations and apply additional context to the trends observed in the data, populations and GDP data for the same period were included in the analysis to enrich the hydrocarbon production and consumption data and to help explain how these valuable resources are traded and consumed. This case study will apply appropriate data mining methods to feed data to a GIS and showcase trends that wouldn’t be apparent otherwise and will additionally identify topics for further research

    Combining Network Visualization and Data Mining for Tax Risk Assessment

    Get PDF
    This paper presents a novel approach, called MALDIVE, to support tax administrations in the tax risk assessment for discovering tax evasion and tax avoidance. MALDIVE relies on a network model describing several kinds of relationships among taxpayers. Our approach suitably combines various data mining and visual analytics methods to support public officers in identifying risky taxpayers. MALDIVE consists of a 4-step pipeline: ( i{i} ) A social network is built from the taxpayers data and several features of this network are extracted by computing both classical social network indexes and domain-specific indexes; ( ii ) an initial set of risky taxpayers is identified by applying machine learning algorithms; ( iii ) the set of risky taxpayers is possibly enlarged by means of an information diffusion strategy and the output is shown to the analyst through a network visualization system; ( iv ) a visual inspection of the network is performed by the analyst in order to validate and refine the set of risky taxpayers. We discuss the effectiveness of the MALDIVE approach through both quantitative analyses and case studies performed on real data in collaboration with the Italian Revenue Agency

    Distributed detection of anomalous internet sessions

    Get PDF
    Financial service providers are moving many services online reducing their costs and facilitating customersÂż interaction. Unfortunately criminals have quickly found several ways to avoid most security measures applied to browsers and banking sites. The use of highly dangerous malware has become the most significant threat and traditional signature-detection methods are nowadays easily circumvented due to the amount of new samples and the use of sophisticated evasion techniques. Antivirus vendors and malware experts are pushed to seek for new methodologies to improve the identification and understanding of malicious applications behavior and their targets. Financial institutions are now playing an important role by deploying their own detection tools against malware that specifically affect their customers. However, most detection approaches tend to base on sequence of bytes in order to create new signatures. This thesis approach is based on new sources of information: the web logs generated from each banking session, the normal browser execution and customers mobile phone behavior. The thesis can be divided in four parts: The first part involves the introduction of the thesis along with the presentation of the problems and the methodology used to perform the experimentation. The second part describes our contributions to the research, which are based in two areas: *Server side: Weblogs analysis. We first focus on the real time detection of anomalies through the analysis of web logs and the challenges introduced due to the amount of information generated daily. We propose different techniques to detect multiple threats by deploying per user and global models in a graph based environment that will allow increase performance of a set of highly related data. *Customer side: Browser analysis. We deal with the detection of malicious behaviors from the other side of a banking session: the browser. Malware samples must interact with the browser in order to retrieve or add information. Such relation interferes with the normal behavior of the browser. We propose to develop models capable of detecting unusual patterns of function calls in order to detect if a given sample is targeting an specific financial entity. In the third part, we propose to adapt our approaches to mobile phones and Critical Infrastructures environments. The latest online banking attack techniques circumvent protection schemes such password verification systems send via SMS. Man in the Mobile attacks are capable of compromising mobile devices and gaining access to SMS traffic. Once the Transaction Authentication Number is obtained, criminals are free to make fraudulent transfers. We propose to model the behavior of the applications related messaging services to automatically detect suspicious actions. Real time detection of unwanted SMS forwarding can improve the effectiveness of second channel authentication and build on detection techniques applied to browsers and Web servers. Finally, we describe possible adaptations of our techniques to another area outside the scope of online banking: critical infrastructures, an environment with similar features since the applications involved can also be profiled. Just as financial entities, critical infrastructures are experiencing an increase in the number of cyber attacks, but the sophistication of the malware samples utilized forces to new detection approaches. The aim of the last proposal is to demonstrate the validity of out approach in different scenarios. Conclusions. Finally, we conclude with a summary of our findings and the directions for future work
    • …
    corecore