14 research outputs found

    Artificial Intelligence in Banking Industry: A Review on Fraud Detection, Credit Management, and Document Processing

    Get PDF
    AI is likely to alter the banking industry during the next several years. It is progressively being utilized by banks for analyzing and executing credit applications and examining vast volumes of data. This helps to avoid fraud and enables resource-heavy, repetitive procedures and client operations to be automated without any sacrifice in quality. This study reviews how the three most promising AI applications can make the banking sector robust and efficient. Specifically, we review AI fraud detection and prevention, AI credit management, and intelligent document processing. Since the majority of transactions have become digital, there is a great need for enhanced fraud detection algorithms and fraud prevention systems in banking. We argued that the conventional strategy for identifying bank fraud may be inadequate to combat complex fraudulent activity. Instead, artificial intelligence algorithms might be very useful.  Credit management is time-consuming and expensive in terms of resources. Furthermore, because of the number of phases involved, these processes need a significant amount of work involving many laborious tasks. Banks can assess new clients for credit services, calculate loan amounts and pricing, and decrease the risk of fraud by using strong AA/ML models to assess these large and varied data sets in real-time. Documents perform critical functions in the financial system and have a substantial influence on day-to-day operations. Currently, a large percentage of this data is preserved in email messages, online forms, PDFs, scanned images, and other digital formats. Using such a massive dataset is a difficult undertaking for any bank. We discuss how the artificial intelligence techniques that automatically pull critical data from all documents received by the bank, regardless of format, and feed it to the bank's existing portals/systems while maintaining consistency

    Big data, factor clave para la sociedad del conocimiento

    Get PDF
    We are currently in an era of information explosion that affects our life in one way or another. Because of this, the transformation of huge databases into knowledge has become one of the tasks of greatest interest to society in general. Big Data was born as an instrument for knowledge due to the inability of current computer systems to store and process large volumes of data. The knowledge society arises from the use of technologies such as Big Data. The purpose of this article is to analyze the influence of Big Data on the knowledge society through a review of the state of the art supported by research articles and books published in the last 15 years, which allow us to put these two terms into context, understand their relationship and highlight the influence of Big Data as a generator of knowledge for today's society. The concept of Big Data, and its main applications to society will be defined. The concept of the Information Society is addressed and the main challenges it has are established. The relationship between both concepts is determined. And finally the conclusions are established. In order to reduce the digital divide, it is imperative to make profound long-term changes in educational models and public policies on investment, technology and employment that allow the inclusion of all social classes. In this sense, knowledge societies with the help of Big Data are called to be integrative elements and transform the way they are taught and learned, the way they are investigated, new social and economic scenarios are simulated, the brand decisions in Companies and share knowledge.Actualmente estamos en una época de explosión de información que afecta de una u otra manera nuestra vida. Debido a esto, la transformación de enormes bases de datos en conocimiento se ha convertido en una de las tareas de mayor interés para la sociedad en general. Big Data nace como instrumento para el conocimiento ante la incapacidad de los sistemas informáticos actuales para  almacenar  y  procesar  grandes  volúmenes  de  datos.  La  sociedad  de  conocimiento  surge del uso de tecnologías como del Big Data. El presente artículo tiene por objetivo analizar la influencia del Big Data sobre la sociedad del conocimiento por medio de una revisión del estado del arte soportada en artículos de investigación y libros publicados en los últimos 15 años, que permitan colocar en contexto estos dos términos, entender su relación y poner de manifiesto la influencia del Big Data como generador de conocimiento para la sociedad actual. Se definirá el concepto de Big Data, y sus principales aplicaciones a la sociedad. Se aborda el concepto de  Sociedad  de  la  Información  y  se  establecen  los  principales  desafíos  que  esta  posee.  Se determina la relación entre ambos conceptos. Y Finalmente se establecen las conclusiones. A fin de disminuir la brecha digital, es imperativo realizar cambios profundos a largo plazo en los modelos  educativos  y  las  políticas  públicas  sobre  inversión,  tecnología  y  empleo  que  permitan  la  inclusión  de  todas  las  clases  sociales.  En  este  sentido,  las  sociedades  del  conocimiento con  la  ayuda  de  Big  Data  están  llamadas  a  ser  elementos  integradores  y  a  transformar  la  forma  en  que  se  enseñan  y  aprenden,  la  forma  en  que  se  investigan,  se  simulan  nuevos  escenarios sociales y económicos, la marca decisiones en empresas y compartir conocimiento

    Jekyll: Attacking Medical Image Diagnostics using Deep Generative Models

    Full text link
    Advances in deep neural networks (DNNs) have shown tremendous promise in the medical domain. However, the deep learning tools that are helping the domain, can also be used against it. Given the prevalence of fraud in the healthcare domain, it is important to consider the adversarial use of DNNs in manipulating sensitive data that is crucial to patient healthcare. In this work, we present the design and implementation of a DNN-based image translation attack on biomedical imagery. More specifically, we propose Jekyll, a neural style transfer framework that takes as input a biomedical image of a patient and translates it to a new image that indicates an attacker-chosen disease condition. The potential for fraudulent claims based on such generated 'fake' medical images is significant, and we demonstrate successful attacks on both X-rays and retinal fundus image modalities. We show that these attacks manage to mislead both medical professionals and algorithmic detection schemes. Lastly, we also investigate defensive measures based on machine learning to detect images generated by Jekyll.Comment: Published in proceedings of the 5th European Symposium on Security and Privacy (EuroS&P '20

    Big Data Analytics Framework for Childhood Infectious Disease Surveillance and Response System using Modified MapReduce Algorithm

    Get PDF
    This research article published by International Journal of Advanced Computer Science and Applications,Vol. 12, No. 3, 2021Tanzania, like most East African countries, faces a great burden from the spread of preventable infectious childhood diseases. Diarrhea, acute respiratory infections (ARI), pneumonia, malnutrition, hepatitis, and measles are responsible for the majority of deaths amongst children aged 0-5 years. Infectious disease surveillance and response is the foundation of public healthcare practices, and it is increasingly being undertaken using information technology. Tanzania however, due to challenges in information technology infrastructure and public health resources, still relies on paper-based disease surveillance. Thus, only traditional clinical patient data is used. Nontraditional and pre-diagnostic infectious disease report case data are excluded. In this paper, the development of the Big Data Analytics Framework for Childhood Infectious Disease Surveillance and Response System is presented. The framework was designed to guide healthcare professionals to track, monitor, and analyze infectious disease report cases from sources such as social media for prevention and control of infectious diseases affecting children. The proposed framework was validated through use-cases scenario and performance-based comparison

    Interactive Learning in Decision Support

    Get PDF
    De acordo com o dicionário priberam da língua portuguesa, o conceito de Fraude pode ser definido como uma “ação ilícita, punível por lei, que procura enganar alguém ou alguma entidade ou escapar a obrigações legais”. Este tópico tem vindo a ganhar cada vez mais relevância em tempos recentes, com novos casos a se tornarem públicos de uma forma frequente. Desta forma, existe uma procura contínua por soluções que permitam, numa primeira fase, prevenir a ocorrência de fraude, ou, caso a mesma já tenha ocorrido, a detetar o mais rapidamente possível. Isto representa um grande desafio: em primeiro lugar, a evolução tecnológica permite que se elaborem esquemas fraudulentos cada vez mais complexos e eficazes e, portanto, mais difíceis de detetar e parar. Para além disto, os dados e a informação que deles se pode retirar são vistos como algo cada vez mais importante no contexto social. Consequentemente, indivíduos e empresas começaram a recolher e armazenar grandes quantidades de todo o tipo de dados. Isto representa o conceito de Big Data – grandes quantidades de dados de diferentes tipos, com diferentes graus de complexidade, produzidos a ritmos diferentes e provenientes de diferentes fontes. Isto veio, por sua vez, tornar inviável a utilização de tecnologias e algoritmos tradicionais de deteção de fraude, uma vez que estes não possuem capacidade para processar um tão grande conjunto de dados, tão diversos. É neste contexto que a área de Machine Learning tem vindo a ser cada vez mais explorada, na busca por soluções que permitam dar resposta a este problema. Normalmente, os sistemas de Machine Learning são vistos como algo completamente autónomo. Nos últimos anos, no entanto, sistemas interativos nos quais especialistas humanos contribuem ativamente no processo de aprendizagem têm vindo a apresentar um desempenho superior quando comparados com sistemas completamente automatizados. Isto pode verificar-se em cenários em que existe um grande conjunto de dados de diversos tipos e de diferentes origens (Big Data), cenários em que o input é um fluxo de dados ou quando existe uma alteração do contexto no qual os dados estão inseridos, num fenómeno conhecido por concept drift. Tendo isto em conta, neste documento é descrito um projeto cujo tema se insere no contexto da utilização de aprendizagem interativa no suporte à decisão, abordando a temática das auditorias digitais e, mais concretamente, o caso da deteção de fraude fiscal. Desta forma, a solução proposta passa pelo desenvolvimento de um sistema de Machine Learning interativo e dinâmico, na medida em que um dos principais objetivos passa por permitir a um humano especialista no domínio não só contribuir com o seu conhecimento no processo de aprendizagem do sistema, mas também que este possa contribuir com novo conhecimento, através da sugestão de uma nova variável ou um novo valor para uma variável já existente, em qualquer altura. O sistema deve então ser capaz de integrar o novo conhecimento de uma forma autónoma e continuar com o seu normal funcionamento. Esta é, na verdade, a principal característica inovadora da solução proposta, uma vez que em sistemas de Machine Learning tradicionais isto não é possível, visto que estes implicam uma estrutura do dataset rígida, e em que qualquer alteração neste sentido implicaria um reinício de todo o processo de treino de modelos, desta vez com o novo dataset.Machine Learning has been evolving rapidly over the past years, with new algorithms and approaches being devised to solve the challenges that the new properties of data pose. Specifically, algorithms must now learn continuously and in real time, from very large and possibly distributed datasets. Usually, Machine Learning systems are seen as something fully automatic. Recently, however, interactive systems in which the human experts actively contribute towards the learning process have shown improved performance when compared to fully automated ones. This may be so on scenarios of Big Data, scenarios in which the input is a data stream, or when there is concept drift. In this paper, we present a system that learns and adapts in real-time by continuously incorporating user feedback, in a fully autonomous way. Moreover, it allows for users to manage variables (e.g. add, edit, remove), reflecting these changes on-the-fly in the Machine Learning pipeline. This paper describes the main functionalities of the system, which despite being of general-purpose, is being developed in the context of a project in the domain of financial fraud detection

    Fraud detection for online banking for scalable and distributed data

    Get PDF
    Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.Doctor of Philosoph

    Strategies to Prevent Medicare Claims Fraud Committed by Speech Pathologists

    Get PDF
    Fraudulent business activities have the potential for adverse business outcomes. Speech pathology business owners who lack a robust fraud prevention strategy may subject their practices to severe civil and criminal penalties. Grounded in Wilhelm’s eight-stage fraud management lifecycle theory, the purpose of this qualitative multiple case study was to explore strategies speech pathologist business owners used to prevent Medicare claims fraud. The participants were five speech pathologist business owners from three practices in a western U.S. state who successfully implemented fraud prevention strategies. Data were collected using semistructured interviews and reviewing organization policies and procedures manuals. Three themes emerged using Yin’s five-step data analysis process: ethics policies, fraud prevention training, and techniques. One key recommendation is for speech pathologist business owners to ensure billing codes are cross-referenced with therapy notes to ensure the correct billing codes match the services rendered. The implications for positive social change include the potential to increase access to needed care for the Medicare population and reduce potential physical harm to patients

    Detecção de indícios de fraudes no Programa Farmácia Popular do Brasil

    Get PDF
    Esta dissertação tem por objetivo desenvolver um método de detecção de indícios de fraudes no Programa Farmácia Popular do Brasil. Nesta pesquisa, foram avaliadas intervenções utilizadas para combater a fraude na atenção à saúde e identificaram-se resultados relacionados a fatores de importância para os profissionais de área, como maior confiabilidade. Para esse propósito o estudo aproveita os métodos de machine learning. O estudo começa com um breve relato de artigos relacionados à detecção de outliers e análise de técnicas encontradas na literatura especializada nesse contexto. Posteriormente, técnicas não supervisionadas de detecção de outliers são aplicadas a dados empíricos. Os resultados são comparados e mostram que o Método Mahalanobis tem o melhor desempenho de detecção de indícios de fraudes em potencial.This Master Thesis aims to develop a method of detecting evidence of fraud in the Programa Farmácia Popular do Brasil. In this research, interventions used to combat fraud in health care were evaluated and results related to factors of importance to professionals in the area were identified, such as greater reliability. For this purpose, the study takes advantage of machine learning methods. The study begins with a brief report of articles related to the detection of outliers and analysis of techniques found in the specialized literature in this context. Subsequently, unsupervised outlier detection techniques are applied to empirical data. The results are compared and show that the Mahalanobis Method has the best performance of detecting evidence of potential fraud

    Big Data fraud detection using multiple medicare data sources

    No full text
    Abstract In the United States, advances in technology and medical sciences continue to improve the general well-being of the population. With this continued progress, programs such as Medicare are needed to help manage the high costs associated with quality healthcare. Unfortunately, there are individuals who commit fraud for nefarious reasons and personal gain, limiting Medicare’s ability to effectively provide for the healthcare needs of the elderly and other qualifying people. To minimize fraudulent activities, the Centers for Medicare and Medicaid Services (CMS) released a number of “Big Data” datasets for different parts of the Medicare program. In this paper, we focus on the detection of Medicare fraud using the following CMS datasets: (1) Medicare Provider Utilization and Payment Data: Physician and Other Supplier (Part B), (2) Medicare Provider Utilization and Payment Data: Part D Prescriber (Part D), and (3) Medicare Provider Utilization and Payment Data: Referring Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS). Additionally, we create a fourth dataset which is a combination of the three primary datasets. We discuss data processing for all four datasets and the mapping of real-world provider fraud labels using the List of Excluded Individuals and Entities (LEIE) from the Office of the Inspector General. Our exploratory analysis on Medicare fraud detection involves building and assessing three learners on each dataset. Based on the Area under the Receiver Operating Characteristic (ROC) Curve performance metric, our results show that the Combined dataset with the Logistic Regression (LR) learner yielded the best overall score at 0.816, closely followed by the Part B dataset with LR at 0.805. Overall, the Combined and Part B datasets produced the best fraud detection performance with no statistical difference between these datasets, over all the learners. Therefore, based on our results and the assumption that there is no way to know within which part of Medicare a physician will commit fraud, we suggest using the Combined dataset for detecting fraudulent behavior when a physician has submitted payments through any or all Medicare parts evaluated in our study
    corecore