2,822 research outputs found
Financial crises and bank failures: a review of prediction methods
In this article we analyze financial and economic circumstances associated with the U.S. subprime mortgage crisis and the global financial turmoil that has led to severe crises in many countries. We suggest that the level of cross-border holdings of long-term securities between the United States and the rest of the world may indicate a direct link between the turmoil in the securitized market originated in the United States and that in other countries. We provide a summary of empirical results obtained in several Economics and Operations Research papers that attempt to explain, predict, or suggest remedies for financial crises or banking defaults; we also extensively outline the methodologies used in them. The intent of this article is to promote future empirical research for preventing financial crises.Subprime mortgage ; Financial crises
SCALABLE ALGORITHMS FOR HIGH DIMENSIONAL STRUCTURED DATA
Emerging technologies and digital devices provide us with increasingly large volume
of data with respect to both the sample size and the number of features. To explore the benefits of massive data sets, scalable statistical models and machine learning algorithms are more and more important in different research disciplines. For robust and accurate prediction, prior knowledge regarding dependency structures within data needs to be formulated appropriately in these models. On the other hand, scalability and computation complexity of existing algorithms may not meet the needs to analyze massive high-dimensional data. This dissertation presents several novel methods to scale up sparse learning models to analyze massive data sets. We first present our novel safe active incremental feature (SAIF) selection algorithm for LASSO (least absolute shrinkage and selection operator), with the time complexity analysis to show the advantages over state of the art existing methods. As SAIF is targeting general convex loss functions, it potentially can be extended to many learning models and big-data applications, and we show how support vector machines (SVM) can be scaled up based on the idea of SAIF. Secondly, we propose screening methods to generalized LASSO (GL), which specifically considers the dependency structure among features. We also propose a scalable feature selection method for non-parametric, non-linear models based on sparse structures and kernel methods. Theoretical analysis and
experimental results in this dissertation show that model complexity can be significantly
reduced with the sparsity and structure assumptions
Screening for Sparse Online Learning
Sparsity promoting regularizers are widely used to impose low-complexity
structure (e.g. l1-norm for sparsity) to the regression coefficients of
supervised learning. In the realm of deterministic optimization, the sequence
generated by iterative algorithms (such as proximal gradient descent) exhibit
"finite activity identification", namely, they can identify the low-complexity
structure in a finite number of iterations. However, most online algorithms
(such as proximal stochastic gradient descent) do not have the property owing
to the vanishing step-size and non-vanishing variance. In this paper, by
combining with a screening rule, we show how to eliminate useless features of
the iterates generated by online algorithms, and thereby enforce finite
activity identification. One consequence is that when combined with any
convergent online algorithm, sparsity properties imposed by the regularizer can
be exploited for computational gains. Numerically, significant acceleration can
be obtained
ν-SVM solutions of constrained lasso and elastic net
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to an SVM-like problem, for which the LIBSVM library provides very efficient algorithms. This suggests that it could also be used advantageously to solve CL. In this work we will refine Jaggi’s arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem, which in turn can be rewritten as an appropriate ν-SVM problem solvable by LIBSVM. We will also show experimentally that the well-known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones. Screening is another ingredient to speed up solving Lasso. Shrinking can be seen as the simpler alternative of SVM to screening and we will discuss how it also may in some cases reduce the cost of an SVM-based CL solutionWith partial support from Spanish government grants TIN2013-42351-P, TIN2016-76406-P, TIN2015-70308-REDT and S2013/ICE-2845 CASI-CAM-CM; work also supported by project FACIL–Ayudas Fundación BBVA a Equipos de Investigación Científica 2016 and the UAM–ADIC Chair for Data Science and Machine Learning. The first author is also supported by the FPU–MEC grant AP-2012-5163. We gratefully acknowledge the use of the facilities of Centro de Computación Científica (CCC) at UAM and thank Red Eléctrica de España for kindly supplying wind energy dat
Financial crises and bank failures: a review of prediction methods
In this article we provide a summary of empirical results obtained in several economics and operations research papers that attempt to explain, predict, or suggest remedies for financial crises or banking defaults, as well as outlines of the methodologies used. We analyze financial and economic circumstances associated with the US subprime mortgage crisis and the global financial turmoil that has led to severe crises in many countries. The intent of the article is to promote future empirical research that might help to prevent bank failures and financial crises.financial crises; banking failures; operations research; early warning methods; leading indicators; subprime markets
Reliability analysis of discrete-state performance functions via adaptive sequential sampling with detection of failure surfaces
The paper presents a new efficient and robust method for rare event
probability estimation for computational models of an engineering product or a
process returning categorical information only, for example, either success or
failure. For such models, most of the methods designed for the estimation of
failure probability, which use the numerical value of the outcome to compute
gradients or to estimate the proximity to the failure surface, cannot be
applied. Even if the performance function provides more than just binary
output, the state of the system may be a non-smooth or even a discontinuous
function defined in the domain of continuous input variables. In these cases,
the classical gradient-based methods usually fail. We propose a simple yet
efficient algorithm, which performs a sequential adaptive selection of points
from the input domain of random variables to extend and refine a simple
distance-based surrogate model. Two different tasks can be accomplished at any
stage of sequential sampling: (i) estimation of the failure probability, and
(ii) selection of the best possible candidate for the subsequent model
evaluation if further improvement is necessary. The proposed criterion for
selecting the next point for model evaluation maximizes the expected
probability classified by using the candidate. Therefore, the perfect balance
between global exploration and local exploitation is maintained automatically.
The method can estimate the probabilities of multiple failure types. Moreover,
when the numerical value of model evaluation can be used to build a smooth
surrogate, the algorithm can accommodate this information to increase the
accuracy of the estimated probabilities. Lastly, we define a new simple yet
general geometrical measure of the global sensitivity of the rare-event
probability to individual variables, which is obtained as a by-product of the
proposed algorithm.Comment: Manuscript CMAME-D-22-00532R1 (Computer Methods in Applied Mechanics
and Engineering
Predictive Modelling Approach to Data-Driven Computational Preventive Medicine
This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus.
Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance.
In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics
The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
- …