98,617 research outputs found

    Predicting Takeover Success Using Machine Learning Techniques

    Get PDF
    A takeover success prediction model aims at predicting the probability that a takeover attempt will succeed by using publicly available information at the time of the announcement. We perform a thorough study using machine learning techniques to predict takeover success. Specifically, we model takeover success prediction as a binary classification problem, which has been widely studied in the machine learning community. Motivated by the recent advance in machine learning, we empirically evaluate and analyze many state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines with different kernels, decision trees, random forest, and Adaboost. The experiments validate the effectiveness of applying machine learning in takeover success prediction, and we found that the support vector machine with linear kernel and the Adaboost with stump weak classifiers perform the best for the task. The result is consistent with the general observations of these two approaches

    Using neural networks and support vector machines for default prediction in South Africa

    Get PDF
    A thesis submitted to the Faculty of Computer Science and Applied Mathematics, University of Witwatersrand, in fulfillment of the requirements for the Master of Science (MSc) Johannesburg Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates the application of machine learning techniques such as support vector machines and neural networks for this purpose. This is not a thesis on support vector machines and neural networks, it simply looks at using these functions as tools to preform the analysis. Neural networks are a type of machine learning algorithm. They are nonlinear mod- els inspired from biological network of neurons found in the human central nervous system. They involve a cascade of simple nonlinear computations that when aggre- gated can implement robust and complex nonlinear functions. Neural networks can approximate most nonlinear functions, making them a quite powerful class of models. Support vector machines (SVM) are the most recent development from the machine learning community. In machine learning, support vector machines (SVMs) are su- pervised learning algorithms that analyze data and recognize patterns, used for clas- si cation and regression analysis. SVM takes a set of input data and predicts, for each given input, which of two possible classes comprises the input, making the SVM a non-probabilistic binary linear classi er. A support vector machine constructs a hyperplane or set of hyperplanes in a high or in nite dimensional space, which can be used for classi cation into the two di erent data classes. Traditional bankruptcy prediction medelling has been criticised as it makes certain underlying assumptions on the underlying data. For instance, a frequent requirement for multivarate analysis is a joint normal distribution and independence of variables. Support vector machines (and neural networks) are a useful tool for default analysis because they make far fewer assumptions on the underlying data. In this framework support vector machines are used as a classi er to discriminate defaulting and non defaulting companies in a South African context. The input data required is a set of nancial ratios constructed from the company's historic nancial statements. The data is then Divided into the two groups: a company that has defaulted and a company that is healthy (non default). The nal data sample used for this thesis consists of 23 nancial ratios from 67 companies listed on the jse. Furthermore for each company the company's probability of default is predicted. The results are benchmarked against more classical methods that are commonly used for bankruptcy prediction such as linear discriminate analysis and logistic regression. Then the results of the support vector machines, neural networks, linear discriminate analysis and logistic regression are assessed via their receiver operator curves and pro tability ratios to gure out which model is more successful at predicting default.MT 201

    Hedging predictions in machine learning

    Get PDF
    Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This paper describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.Comment: 24 pages; 9 figures; 2 tables; a version of this paper (with discussion and rejoinder) is to appear in "The Computer Journal

    Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes

    Get PDF
    Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects

    Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam.

    Get PDF
    The main objective of this study is to assess regional landslide hazards in the Hoa Binh province of Vietnam. A landslide inventory map was constructed from various sources with data mainly for a period of 21 years from 1990 to 2010. The historic inventory of these failures shows that rainfall is the main triggering factor in this region. The probability of the occurrence of episodes of rainfall and the rainfall threshold were deduced from records of rainfall for the aforementioned period. The rainfall threshold model was generated based on daily and cumulative values of antecedent rainfall of the landslide events. The result shows that 15-day antecedent rainfall gives the best fit for the existing landslides in the inventory. The rainfall threshold model was validated using the rainfall and landslide events that occurred in 2010 that were not considered in building the threshold model. The result was used for estimating temporal probability of a landslide to occur using a Poisson probability model. Prior to this work, five landslide susceptibility maps were constructed for the study area using support vector machines, logistic regression, evidential belief functions, Bayesian-regularized neural networks, and neuro-fuzzy models. These susceptibility maps provide information on the spatial prediction probability of landslide occurrence in the area. Finally, landslide hazard maps were generated by integrating the spatial and the temporal probability of landslide. A total of 15 specific landslide hazard maps were generated considering three time periods of 1, 3, and 5 years