Search CORE

98,640 research outputs found

Predicting Takeover Success Using Machine Learning Techniques

Author: Johnson Gregory
Wang Jia
Zhang Mei
Publication venue: 'Clute Institute'
Publication date: 19/09/2012
Field of study

A takeover success prediction model aims at predicting the probability that a takeover attempt will succeed by using publicly available information at the time of the announcement. We perform a thorough study using machine learning techniques to predict takeover success. Specifically, we model takeover success prediction as a binary classification problem, which has been widely studied in the machine learning community. Motivated by the recent advance in machine learning, we empirically evaluate and analyze many state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines with different kernels, decision trees, random forest, and Adaboost. The experiments validate the effectiveness of applying machine learning in takeover success prediction, and we found that the support vector machine with linear kernel and the Adaboost with stump weak classifiers perform the best for the task. The result is consistent with the general observations of these two approaches

Crossref

Clute Institute: Journals

Using neural networks and support vector machines for default prediction in South Africa

Author: Meltzer Frances
Publication venue
Publication date: 01/01/2017
Field of study

A thesis submitted to the Faculty of Computer Science and Applied Mathematics, University of Witwatersrand, in fulfillment of the requirements for the Master of Science (MSc) Johannesburg Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates the application of machine learning techniques such as support vector machines and neural networks for this purpose. This is not a thesis on support vector machines and neural networks, it simply looks at using these functions as tools to preform the analysis. Neural networks are a type of machine learning algorithm. They are nonlinear mod- els inspired from biological network of neurons found in the human central nervous system. They involve a cascade of simple nonlinear computations that when aggre- gated can implement robust and complex nonlinear functions. Neural networks can approximate most nonlinear functions, making them a quite powerful class of models. Support vector machines (SVM) are the most recent development from the machine learning community. In machine learning, support vector machines (SVMs) are su- pervised learning algorithms that analyze data and recognize patterns, used for clas- si cation and regression analysis. SVM takes a set of input data and predicts, for each given input, which of two possible classes comprises the input, making the SVM a non-probabilistic binary linear classi er. A support vector machine constructs a hyperplane or set of hyperplanes in a high or in nite dimensional space, which can be used for classi cation into the two di erent data classes. Traditional bankruptcy prediction medelling has been criticised as it makes certain underlying assumptions on the underlying data. For instance, a frequent requirement for multivarate analysis is a joint normal distribution and independence of variables. Support vector machines (and neural networks) are a useful tool for default analysis because they make far fewer assumptions on the underlying data. In this framework support vector machines are used as a classi er to discriminate defaulting and non defaulting companies in a South African context. The input data required is a set of nancial ratios constructed from the company's historic nancial statements. The data is then Divided into the two groups: a company that has defaulted and a company that is healthy (non default). The nal data sample used for this thesis consists of 23 nancial ratios from 67 companies listed on the jse. Furthermore for each company the company's probability of default is predicted. The results are benchmarked against more classical methods that are commonly used for bankruptcy prediction such as linear discriminate analysis and logistic regression. Then the results of the support vector machines, neural networks, linear discriminate analysis and logistic regression are assessed via their receiver operator curves and pro tability ratios to gure out which model is more successful at predicting default.MT 201

Wits Institutional Repository on DSPACE

Recommended from our members

Estimation of cellularity in tumours treated with Neoadjuvant therapy: A comparison of Machine Learning algorithms

Author: Garduño V. G.
Karabağ C.
Ortega-Ruiz M. A.
Reyes-Aldasoro C. C.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date
Field of study

This paper describes a method for residual tumour cellularity (TC) estimation in Neoadjuvant treatment (NAT) of advanced breast cancer. This is determined manually by visual inspection by a radiologist, then an automated computation will contribute to reduce time workload and increase precision and accuracy. TC is estimated as the ratio of tumour area by total image area estimated after the NAT. The method proposed computes TC by using machine learning techniques trained with information on morphological parameters of segmented nuclei in order to classify regions of the image as tumour or normal. The data is provided by the 2019 SPIE Breast challenge, which was proposed to develop automated TC computation algorithms. Three algorithms were implemented: Support Vector Machines, Nearest K-means and Adaptive Boosting (AdaBoost) decision trees. Performance based on accuracy is compared and evaluated and the best result was obtained with Support Vector Machines. Results obtained by the methods implemented were submitted during ongoing challenge with a maximum of 0.76 of prediction probability of success

City Research Online

Hedging predictions in machine learning

Author: Gammerman Alexander
Vovk Vladimir
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/02/2006
Field of study

Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This paper describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.Comment: 24 pages; 9 figures; 2 tables; a version of this paper (with discussion and rejoinder) is to appear in "The Computer Journal

arXiv.org e-Print Archive

CiteSeerX

Royal Holloway Research Online

Royal Holloway - Pure

Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes

Author: Chen Tianle
Wang Yuanjia
Zeng Donglin
Publication venue
Publication date: 01/01/2016
Field of study

Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects

PubMed Central

Carolina Digital Repository

Recommended from our members

Data-driven uncertainty quantification for predictive subsurface flow and transport modeling

Author: He Jiachuan
Publication venue
Publication date: 11/04/2019
Field of study

Specification of hydraulic conductivity as a model parameter in groundwater flow and transport equations is an essential step in predictive simulations. It is often infeasible in practice to characterize this model parameter at all points in space due to complex hydrogeological environments leading to significant parameter uncertainties. Quantifying these uncertainties requires the formulation and solution of an inverse problem using data corresponding to observable model responses. Several types of inverse problems may be formulated under various physical and statistical assumptions on the model parameters, model response, and the data. Solutions to most types of inverse problems require large numbers of model evaluations. In this study, we incorporate the use of surrogate models based on support vector machines to increase the number of samples used in approximating a solution to an inverse problem at a relatively low computational cost. To test the global capabilities of this type of surrogate model for quantifying uncertainties, we use a framework for constructing pullback and push-forward probability measures to study the data-to-parameter-to-prediction propagation of uncertainties under minimal statistical assumptions. Additionally, we demonstrate that it is possible to build a support vector machine using relatively low-dimensional representations of the hydraulic conductivity to propagate distributions. The numerical examples further demonstrate that we can make reliable probabilistic predictions of contaminant concentration at spatial locations corresponding to data not used in the solution to the inverse problem. This dissertation is based on the article entitled Data-driven uncertainty quantification for predictive flow and transport modeling using support vector machines by Jiachuan He, Steven Mattis, Troy Butler and Clint Dawson [32]. This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC0009286 as part of the DiaMonD Multifaceted Mathematics Integrated Capability Center.Engineering Mechanic

Texas ScholarWorks

Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam.

Author: A Carrara
A Pasuto
A Sengupta
AF Chleborad
B Pradhan
B Pradhan
B Pradhan
B Pradhan
B Pradhan
B Pradhan
Biswajeet Pradhan
C Ballabio
C Melchiorre
C Xu
CJ Westen Van
CJF Chung
D Salciarini
D Salciarini
D Tien Bui
D Tien Bui
D Tien Bui
D Tien Bui
D Tien Bui
Dieu Tien Bui
DJ Varnes
DR Montgomery
DV Thinh
EA Sezer
EL Harp
F Guzzetti
F Guzzetti
F Guzzetti
F Guzzetti
G Crosta
GB Crosta
H Pourghasemi
H Saito
H-J Oh
HR Pourghasemi
I Das
I Yilmaz
Inge Revhaug
J Chacon
J Corominas
J Corominas
J Lopez Saez
JL Zezere
JW Godt
M Jakob
M Jakob
M Marjanovic
M Polemio
M Schmidt
MTJ Terlien
N Caine
N Osanai
NN Thach
NQ My
O Petrucci
O Petrucci
Owe Lofman
P Aleotti
P Frattini
P Jaiswal
P Jaiswal
P Reichenbach
R Dahal
R Giannecchini
R Marques
RC Wilson
RK Dahal
T Glade
X Yao
Y Matsushi
YK Yeon
Øystein B. Dick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The main objective of this study is to assess regional landslide hazards in the Hoa Binh province of Vietnam. A landslide inventory map was constructed from various sources with data mainly for a period of 21 years from 1990 to 2010. The historic inventory of these failures shows that rainfall is the main triggering factor in this region. The probability of the occurrence of episodes of rainfall and the rainfall threshold were deduced from records of rainfall for the aforementioned period. The rainfall threshold model was generated based on daily and cumulative values of antecedent rainfall of the landslide events. The result shows that 15-day antecedent rainfall gives the best fit for the existing landslides in the inventory. The rainfall threshold model was validated using the rainfall and landslide events that occurred in 2010 that were not considered in building the threshold model. The result was used for estimating temporal probability of a landslide to occur using a Poisson probability model. Prior to this work, five landslide susceptibility maps were constructed for the study area using support vector machines, logistic regression, evidential belief functions, Bayesian-regularized neural networks, and neuro-fuzzy models. These susceptibility maps provide information on the spatial prediction probability of landslide occurrence in the area. Finally, landslide hazard maps were generated by integrating the spatial and the temporal probability of landslide. A total of 15 specific landslide hazard maps were generated considering three time periods of 1, 3, and 5 years

Crossref

Universiti Putra Malaysia Institutional Repository