98,640 research outputs found
Predicting Takeover Success Using Machine Learning Techniques
A takeover success prediction model aims at predicting the probability that a takeover attempt will succeed by using publicly available information at the time of the announcement. We perform a thorough study using machine learning techniques to predict takeover success. Specifically, we model takeover success prediction as a binary classification problem, which has been widely studied in the machine learning community. Motivated by the recent advance in machine learning, we empirically evaluate and analyze many state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines with different kernels, decision trees, random forest, and Adaboost. The experiments validate the effectiveness of applying machine learning in takeover success prediction, and we found that the support vector machine with linear kernel and the Adaboost with stump weak classifiers perform the best for the task. The result is consistent with the general observations of these two approaches
Using neural networks and support vector machines for default prediction in South Africa
A thesis submitted to the Faculty of Computer Science and Applied Mathematics,
University of Witwatersrand,
in fulfillment of the requirements for the
Master of Science (MSc)
Johannesburg
Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates
the application of machine learning techniques such as support vector machines and
neural networks for this purpose. This is not a thesis on support vector machines
and neural networks, it simply looks at using these functions as tools to preform the
analysis.
Neural networks are a type of machine learning algorithm. They are nonlinear mod-
els inspired from biological network of neurons found in the human central nervous
system. They involve a cascade of simple nonlinear computations that when aggre-
gated can implement robust and complex nonlinear functions. Neural networks can
approximate most nonlinear functions, making them a quite powerful class of models.
Support vector machines (SVM) are the most recent development from the machine
learning community. In machine learning, support vector machines (SVMs) are su-
pervised learning algorithms that analyze data and recognize patterns, used for clas-
si cation and regression analysis. SVM takes a set of input data and predicts, for
each given input, which of two possible classes comprises the input, making the SVM
a non-probabilistic binary linear classi er. A support vector machine constructs a
hyperplane or set of hyperplanes in a high or in nite dimensional space, which can
be used for classi cation into the two di erent data classes.
Traditional bankruptcy prediction medelling has been criticised as it makes certain
underlying assumptions on the underlying data. For instance, a frequent requirement
for multivarate analysis is a joint normal distribution and independence of variables.
Support vector machines (and neural networks) are a useful tool for default analysis
because they make far fewer assumptions on the underlying data.
In this framework support vector machines are used as a classi er to discriminate
defaulting and non defaulting companies in a South African context. The input data
required is a set of nancial ratios constructed from the company's historic nancial
statements. The data is then Divided into the two groups: a company that has
defaulted and a company that is healthy (non default). The nal data sample used
for this thesis consists of 23 nancial ratios from 67 companies listed on the jse.
Furthermore for each company the company's probability of default is predicted.
The results are benchmarked against more classical methods that are commonly used
for bankruptcy prediction such as linear discriminate analysis and logistic regression.
Then the results of the support vector machines, neural networks, linear discriminate
analysis and logistic regression are assessed via their receiver operator curves and
pro tability ratios to gure out which model is more successful at predicting default.MT 201
Recommended from our members
Estimation of cellularity in tumours treated with Neoadjuvant therapy: A comparison of Machine Learning algorithms
This paper describes a method for residual tumour cellularity (TC) estimation in Neoadjuvant treatment (NAT) of advanced breast cancer. This is determined manually by visual inspection by a radiologist, then an automated computation will contribute to reduce time workload and increase precision and accuracy. TC is estimated as the ratio of tumour area by total image area estimated after the NAT. The method proposed computes TC by using machine learning techniques trained with information on morphological parameters of segmented nuclei in order to classify regions of the image as tumour or normal. The data is provided by the 2019 SPIE Breast challenge, which was proposed to develop automated TC computation algorithms. Three algorithms were implemented: Support Vector Machines, Nearest K-means and Adaptive Boosting (AdaBoost) decision trees. Performance based on accuracy is compared and evaluated and the best result was obtained with Support Vector Machines. Results obtained by the methods implemented were submitted during ongoing challenge with a maximum of 0.76 of prediction probability of success
Hedging predictions in machine learning
Recent advances in machine learning make it possible to design efficient
prediction algorithms for data sets with huge numbers of parameters. This paper
describes a new technique for "hedging" the predictions output by many such
algorithms, including support vector machines, kernel ridge regression, kernel
nearest neighbours, and by many other state-of-the-art methods. The hedged
predictions for the labels of new objects include quantitative measures of
their own accuracy and reliability. These measures are provably valid under the
assumption of randomness, traditional in machine learning: the objects and
their labels are assumed to be generated independently from the same
probability distribution. In particular, it becomes possible to control (up to
statistical fluctuations) the number of erroneous predictions by selecting a
suitable confidence level. Validity being achieved automatically, the remaining
goal of hedged prediction is efficiency: taking full account of the new
objects' features and other available information to produce as accurate
predictions as possible. This can be done successfully using the powerful
machinery of modern machine learning.Comment: 24 pages; 9 figures; 2 tables; a version of this paper (with
discussion and rejoinder) is to appear in "The Computer Journal
Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes
Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects
Recommended from our members
Data-driven uncertainty quantification for predictive subsurface flow and transport modeling
Specification of hydraulic conductivity as a model parameter in groundwater flow and transport equations is an essential step in predictive simulations. It is often infeasible in practice to characterize this model parameter at all points in space due to complex hydrogeological environments leading to significant parameter uncertainties. Quantifying these uncertainties requires the formulation and solution of an inverse problem using data corresponding to observable model responses. Several types of inverse problems may be formulated under various physical and statistical assumptions on the model parameters, model response, and the data. Solutions to most types of inverse problems require large numbers of model evaluations. In this study, we incorporate the use of surrogate models based on support vector machines to increase the number of samples used in approximating a solution to an inverse problem at a relatively low computational cost. To test the global capabilities of this type of surrogate model for quantifying uncertainties, we use a framework for constructing pullback and push-forward probability measures to study the data-to-parameter-to-prediction propagation of uncertainties under minimal statistical assumptions. Additionally, we demonstrate that it is possible to build a support vector machine using relatively low-dimensional representations of the hydraulic conductivity to propagate distributions. The numerical examples further demonstrate that we can make reliable probabilistic predictions of contaminant concentration at spatial locations corresponding to data not used in the solution to the inverse problem.
This dissertation is based on the article entitled Data-driven uncertainty quantification for predictive flow and transport modeling using support vector machines by Jiachuan He, Steven Mattis, Troy Butler and Clint Dawson [32]. This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC0009286 as part of the DiaMonD Multifaceted Mathematics Integrated Capability Center.Engineering Mechanic
Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam.
The main objective of this study is to assess regional landslide hazards in the Hoa Binh province of Vietnam. A landslide inventory map was constructed from various sources with data mainly for a period of 21 years from 1990 to 2010. The historic inventory of these failures shows that rainfall is the main triggering factor in this region. The probability of the occurrence of episodes of rainfall and the rainfall threshold were deduced from records of rainfall for the aforementioned period. The rainfall threshold model was generated based on daily and cumulative values of antecedent rainfall of the landslide events. The result shows that 15-day antecedent rainfall gives the best fit for the existing landslides in the inventory. The rainfall threshold model was validated using the rainfall and landslide events that occurred in 2010 that were not considered in building the threshold model. The result was used for estimating temporal probability of a landslide to occur using a Poisson probability model. Prior to this work, five landslide susceptibility maps were constructed for the study area using support vector machines, logistic regression, evidential belief functions, Bayesian-regularized neural networks, and neuro-fuzzy models. These susceptibility maps provide information on the spatial prediction probability of landslide occurrence in the area. Finally, landslide hazard maps were generated by integrating the spatial and the temporal probability of landslide. A total of 15 specific landslide hazard maps were generated considering three time periods of 1, 3, and 5 years
- …