149,823 research outputs found

    Credit Card Fraud Detection Using Machine Learning As Data Mining Technique

    Get PDF
    The rapid participation in online based transactional activities raises the fraudulent cases all over the world and causes tremendous losses to the individuals and financial industry. Although there are many criminal activities occurring in financial industry, credit card fraudulent activities are among the most prevalent and worried about by online customers. Thus, countering the fraud activities through data mining and machine learning is one of the prominent approaches introduced by scholars intending to prevent the losses caused by these illegal acts. Primarily, data mining techniques were employed to study the patterns and characteristics of suspicious and non-suspicious transactions based on normalized and anomalies data. On the other hand, machine learning (ML) techniques were employed to predict the suspicious and non-suspicious transactions automatically by using classifiers. Therefore, the combination of machine learning and data mining techniques were able to identify the genuine and non-genuine transactions by learning the patterns of the data. This paper discusses the supervised based classification using Bayesian network classifiers namely K2, Tree Augmented NaĂŻve Bayes (TAN), and NaĂŻve Bayes, logistics and J48 classifiers. After preprocessing the dataset using normalization and Principal Component Analysis, all the classifiers achieved more than 95.0% accuracy compared to results attained before preprocessing the dataset

    Data mining and predictive analytics application on cellular networks to monitor and optimize quality of service and customer experience

    Get PDF
    This research study focuses on the application models of Data Mining and Machine Learning covering cellular network traffic, in the objective to arm Mobile Network Operators with full view of performance branches (Services, Device, Subscribers). The purpose is to optimize and minimize the time to detect service and subscriber patterns behaviour. Different data mining techniques and predictive algorithms have been applied on real cellular network datasets to uncover different data usage patterns using specific Key Performance Indicators (KPIs) and Key Quality Indicators (KQI). The following tools will be used to develop the concept: RStudio for Machine Learning and process visualization, Apache Spark, SparkSQL for data and big data processing and clicData for service Visualization. Two use cases have been studied during this research. In the first study, the process of Data and predictive Analytics are fully applied in the field of Telecommunications to efficiently address users’ experience, in the goal of increasing customer loyalty and decreasing churn or customer attrition. Using real cellular network transactions, prediction analytics are used to predict customers who are likely to churn, which can result in revenue loss. Prediction algorithms and models including Classification Tree, Random Forest, Neural Networks and Gradient boosting have been used with an exploratory Data Analysis, determining relationship between predicting variables. The data is segmented in to two, a training set to train the model and a testing set to test the model. The evaluation of the best performing model is based on the prediction accuracy, sensitivity, specificity and the Confusion Matrix on the test set. The second use case analyses Service Quality Management using modern data mining techniques and the advantages of in-memory big data processing with Apache Spark and SparkSQL to save cost on tool investment; thus, a low-cost Service Quality Management model is proposed and analyzed. With increase in Smart phone adoption, access to mobile internet services, applications such as streaming, interactive chats require a certain service level to ensure customer satisfaction. As a result, an SQM framework is developed with Service Quality Index (SQI) and Key Performance Index (KPI). The research concludes with recommendations and future studies around modern technology applications in Telecommunications including Internet of Things (IoT), Cloud and recommender systems.Cellular networks have evolved and are still evolving, from traditional GSM (Global System for Mobile Communication) Circuit switched which only supported voice services and extremely low data rate, to LTE all Packet networks accommodating high speed data used for various service applications such as video streaming, video conferencing, heavy torrent download; and for say in a near future the roll-out of the Fifth generation (5G) cellular networks, intended to support complex technologies such as IoT (Internet of Things), High Definition video streaming and projected to cater massive amount of data. With high demand on network services and easy access to mobile phones, billions of transactions are performed by subscribers. The transactions appear in the form of SMSs, Handovers, voice calls, web browsing activities, video and audio streaming, heavy downloads and uploads. Nevertheless, the stormy growth in data traffic and the high requirements of new services introduce bigger challenges to Mobile Network Operators (NMOs) in analysing the big data traffic flowing in the network. Therefore, Quality of Service (QoS) and Quality of Experience (QoE) turn in to a challenge. Inefficiency in mining, analysing data and applying predictive intelligence on network traffic can produce high rate of unhappy customers or subscribers, loss on revenue and negative services’ perspective. Researchers and Service Providers are investing in Data mining, Machine Learning and AI (Artificial Intelligence) methods to manage services and experience. This research study focuses on the application models of Data Mining and Machine Learning covering network traffic, in the objective to arm Mobile Network Operators with full view of performance branches (Services, Device, Subscribers). The purpose is to optimize and minimize the time to detect service and subscriber patterns behaviour. Different data mining techniques and predictive algorithms will be applied on cellular network datasets to uncover different data usage patterns using specific Key Performance Indicators (KPIs) and Key Quality Indicators (KQI). The following tools will be used to develop the concept: R-Studio for Machine Learning, Apache Spark, SparkSQL for data processing and clicData for Visualization.Electrical and Mining EngineeringM. Tech (Electrical Engineering

    A data mining approach for cardiovascular diagnosis

    Get PDF
    The large amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analysed by traditional methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complex data. In the healthcare industry specifically, data mining can be used to decrease costs by increasing efficiency, improve patient quality of life, and perhaps most importantly, save the lives of more patients. The main goal of this project is to apply data mining techniques in order to make possible the prediction of the degree of disability that patients will present when they leave hospitalization. The clinical data that will compose the data set was obtained from one single hospital and contains information about patients who were hospitalized in Cardio Vascular Disease's (CVD) unit in 2016 for having suffered a cardiovascular accident. To develop this project, it will be used the Waikato Environment for Knowledge Analysis (WEKA) machine learning Workbench since this one allows users to quickly try out and compare different machine learning methods on new data setsThis work has been supported by Compete: POCI-01-0145-FEDER-007043 and FCT within the Project Scope UID/CEC/00319/2013.info:eu-repo/semantics/publishedVersio

    A TOOL FOR EFFECTIVE DETECTION OF FRAUD IN CREDIT CARD SYSTEM

    Get PDF
    Due to the rise and rapid growth of E-Commerce, use of credit cards for online purchases has dramatically increased and it caused an explosion in the credit card fraud. Fraud is one of the major ethical issues in the credit card industry. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising. In real life, fraudulent transactions are scattered with genuine transactions and simple pattern matching techniques are not often sufficient to detect those frauds accurately. Implementation of efficient fraud detection systems has thus become imperative for all credit card issuing banks to minimize their losses. Many modern techniques based on Artificial Intelligence, Data mining, Fuzzy logic, Machine learning, Sequence Alignment, Genetic Programming etc., has evolved in detecting various credit card fraudulent transactions

    Screening of Murabaha business process through Quran and hadith: a text mining analysis

    Get PDF
    © 2020, Emerald Publishing Limited. Purpose: This paper revolves around the usage of data analytics in the Qur’an and Hadith through a new text mining technique to answer the main research question of whether the activities and the data flows of the Murabaha financing contract is compatible with Sharia law. The purpose of this paper is to provide a thorough and comprehensive database that will be used to examine existing practices in Islamic banks’ and improve compliancy with Islamic financial law (Sharia). Design/methodology/approach: To design a Sharia-compliant Murabaha business process originated on text mining, the authors start by identifying the factors deemed necessary in their text mining techniques of both texts; using a four-step strategy to analyze those text mining analytics; then, they list the three basic approaches in text mining used for new knowledge discovery in databases: the co-occurrence approach based on the recursive co-occurrence algorithm; the machine learning or statistical-based; and the knowledge-based. They identify any variation and association between the Murabaha business processes produced using text mining against the one developed through data collection. Findings: The main finding attained in this paper is to confirm the compatibility of all activities and the data flows in the Murabaha financing contract produced using data analytics of the Quran and Hadith texts against the Murabaha business process that was developed based on data collection. Another key finding is revealing some shortcomings regarding Islamic banks business process compliance with Sharia law. Practical implications: Given Murabaha as the most popular mode of Islamic financing with more than 75% in total transactions, this research has managed to touch-base on an area that is interesting to the vast majority of those dealing with Islamic finance instruments. By reaching findings that could improve the existing Islamic Murabaha business process and concluding on Sharia compliance of the existing Murabaha business process, this research is quite relevant and could be used in practice as well as in influencing public policy. In fact, Islamic Sharia law experts, Islamic finance professionals and Islamic banks may find the results of this study very useful in improving at least one aspect of the Islamic finance transactions. Originality/value: By using a novel, fresh text mining methods built on recursive occurrence of synonym words from the Qur’an and Hadith to enrich Islamic finance, this research study can claim to have been the first of its kind in using machine learning to mine the Quran, Hadith and in extracting valuable knowledge to support and consolidate the Islamic financial business processes and make them more compliant with the i

    Predicting fraud behaviour in online betting

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Information Analysis and ManagementFraud isn’t a new issue, there are discussions about this matter since the beginning of commerce. With the advance of the Internet this technique gained strain and became a billion-dollar business. There are many different types of online financial fraud: account takeover; identity theft; chargeback; credit card transactions; etc. Online betting is one of the markets where fraud is increasing every day. In Portugal, the regulation of gambling and online betting was approved in April 2015. In one hand, this legislation made possible the exploration of this business in a controlled and regulated environment, but on the other hand it encouraged customers to develop new ways of fraud. Traditional data analysis used to detect fraud involved different domains such as economics, finance and law. The complexity of these investigations soon became obsolete. Being fraud an adaptive crime, different areas such as Data Mining and Machine Learning were developed to identify and prevent fraud. The main goal of this Project is to develop a predicting model, using a data mining approach and machine learning methods, able to identify and prevent online financial fraud on the Portuguese Betting Market, a new but already strong business

    Counting Causal Paths in Big Times Series Data on Networks

    Full text link
    Graph or network representations are an important foundation for data mining and machine learning tasks in relational data. Many tools of network analysis, like centrality measures, information ranking, or cluster detection rest on the assumption that links capture direct influence, and that paths represent possible indirect influence. This assumption is invalidated in time-stamped network data capturing, e.g., dynamic social networks, biological sequences or financial transactions. In such data, for two time-stamped links (A,B) and (B,C) the chronological ordering and timing determines whether a causal path from node A via B to C exists. A number of works has shown that for that reason network analysis cannot be directly applied to time-stamped network data. Existing methods to address this issue require statistics on causal paths, which is computationally challenging for big data sets. Addressing this problem, we develop an efficient algorithm to count causal paths in time-stamped network data. Applying it to empirical data, we show that our method is more efficient than a baseline method implemented in an OpenSource data analytics package. Our method works efficiently for different values of the maximum time difference between consecutive links of a causal path and supports streaming scenarios. With it, we are closing a gap that hinders an efficient analysis of big time series data on complex networks.Comment: 10 pages, 2 figure

    Data mining for detecting Bitcoin Ponzi schemes

    Full text link
    Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives

    Perspectives On Data-Driven failure diagnosis : With a case study on failure diagnosis at an Payment Service Provider

    Get PDF
    Data-driven failure diagnosis aims to extract relevant information from a dataset in an automatic way. In this paper it is being proposed a data driven model for classifying the transactions of a Payment Service Provider based on relevant shared characteristics that would provide the business users relevant insights about the data analyzed. The proposed solution aims to mimic processes applied in industrial organizations. However, the methods discussed in this paper from these organizations does not directly deal with the human component in information systems. Therefore, the proposed solution aims to offer the relevant error paths to help the business users in their daily tasks while dealing with the human factor in IT systems. The built artifact follow the next set of steps: • Categorization of variables following data mining techniques. • Assignation of importance for variables affecting the transaction process using predictive machine learning method. • Classification of transactions in groups with similar characteristics. The solution developed effectively and consistently classify more than 90% of the faults in the database by grouping them in paths with shared characteristics and with a relevant failure rate. The artifact does not depends in any predefined fault distribution and satisfactorily deal with highly correlated input variables. Therefore, the artifact has a scalable potential if previously, a data mining categorization of variables is performed. Specially, in companies that deals with rigid processes
    • …
    corecore