3,171 research outputs found
Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains
There has been increased interest in devising learning techniques that
combine unlabeled data with labeled data ? i.e. semi-supervised learning.
However, to the best of our knowledge, no study has been performed across
various techniques and different types and amounts of labeled and unlabeled
data. Moreover, most of the published work on semi-supervised learning
techniques assumes that the labeled and unlabeled data come from the same
distribution. It is possible for the labeling process to be associated with a
selection bias such that the distributions of data points in the labeled and
unlabeled sets are different. Not correcting for such bias can result in biased
function approximation with potentially poor performance. In this paper, we
present an empirical study of various semi-supervised learning techniques on a
variety of datasets. We attempt to answer various questions such as the effect
of independence or relevance amongst features, the effect of the size of the
labeled and unlabeled sets and the effect of noise. We also investigate the
impact of sample-selection bias on the semi-supervised learning techniques
under study and implement a bivariate probit technique particularly designed to
correct for such bias
An academic review: applications of data mining techniques in finance industry
With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
A Review of Algorithms for Credit Risk Analysis
The interest collected by the main borrowers is collected to pay back the principal borrowed from the depositary bank. In financial risk management, credit risk assessment is becoming a significant sector. For the credit risk assessment of client data sets, many credit risk analysis methods are used. The assessment of the credit risk datasets leads to the choice to cancel the customer\u27s loan or to dismiss the customer\u27s request is a challenging task involving a profound assessment of the information set or client information. In this paper, we survey diverse automatic credit risk analysis methods used for credit risk assessment. Data mining approach, as the most often used approach for credit risk analysis was described with the focus to various algorithms, such as neural networks.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</p
Development of a Machine Learning-Based Financial Risk Control System
With the gradual end of the COVID-19 outbreak and the gradual recovery of the economy, more and more individuals and businesses are in need of loans. This demand brings business opportunities to various financial institutions, but also brings new risks. The traditional loan application review is mostly manual and relies on the business experience of the auditor, which has the disadvantages of not being able to process large quantities and being inefficient. Since the traditional audit processing method is no longer suitable some other method of reducing the rate of non-performing loans and detecting fraud in applications is urgently needed by financial institutions.
In this project, a financial risk control model is built by using various machine learning algorithms. The model is used to replace the traditional manual approach to review loan applications. It improves the speed of review as well as the accuracy and approval rate of the review. Machine learning algorithms were also used in this project to create a loan user scorecard system that better reflects changes in user information compared to the credit card systems used by financial institutions today. In this project, the data imbalance problem and the performance improvement problem are also explored
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
Risk Assessment Approaches in Banking Sector –A Survey
Prediction analysis is a method that makes predictions based on the data currently available. Bank loans come with a lot of risks to both the bank and the borrowers. One of the most exciting and important areas of research is data mining, which aims to extract information from vast amounts of accumulated data sets. The loan process is one of the key processes for the banking industry, and this paper examines various prior studies that used data mining techniques to extract all served entities and attributes necessary for analytical purposes, categorize these attributes, and forecast the future of their business using historical data, using a model, banks\u27 business, and strategic goals
Network based scoring models to improve credit risk management in peer to peer lending platforms
Financial intermediation has changed extensively over the course of the last two decades. One of the most significant change has been the emergence of FinTech. In the context of credit services, fintech peer to peer lenders have introduced many opportunities, among which improved speed, better customer experience, and reduced costs. However, peer-to-peer lending platforms lead to higher risks, among which higher credit risk: not owned by the lenders, and systemic risks: due to the high interconnectedness among borrowers generated by the platform. This calls for new and more accurate credit risk models to protect consumers and preserve financial stability. In this paper we propose to enhance credit risk accuracy of peer-to-peer platforms by leveraging topological information embedded into similarity networks, derived from borrowers' financial information. Topological coefficients describing borrowers' importance and community structures are employed as additional explanatory variables, leading to an improved predictive performance of credit scoring models
Data mining for detecting Bitcoin Ponzi schemes
Soon after its introduction in 2009, Bitcoin has been adopted by
cyber-criminals, which rely on its pseudonymity to implement virtually
untraceable scams. One of the typical scams that operate on Bitcoin are the
so-called Ponzi schemes. These are fraudulent investments which repay users
with the funds invested by new users that join the scheme, and implode when it
is no longer possible to find new investments. Despite being illegal in many
countries, Ponzi schemes are now proliferating on Bitcoin, and they keep
alluring new victims, who are plundered of millions of dollars. We apply data
mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our
starting point is a dataset of features of real-world Ponzi schemes, that we
construct by analysing, on the Bitcoin blockchain, the transactions used to
perform the scams. We use this dataset to experiment with various machine
learning algorithms, and we assess their effectiveness through standard
validation protocols and performance metrics. The best of the classifiers we
have experimented can identify most of the Ponzi schemes in the dataset, with a
low number of false positives
- …