8,546 research outputs found
Sequences of regressions and their independences
Ordered sequences of univariate or multivariate regressions provide
statistical models for analysing data from randomized, possibly sequential
interventions, from cohort or multi-wave panel studies, but also from
cross-sectional or retrospective studies. Conditional independences are
captured by what we name regression graphs, provided the generated distribution
shares some properties with a joint Gaussian distribution. Regression graphs
extend purely directed, acyclic graphs by two types of undirected graph, one
type for components of joint responses and the other for components of the
context vector variable. We review the special features and the history of
regression graphs, derive criteria to read all implied independences of a
regression graph and prove criteria for Markov equivalence that is to judge
whether two different graphs imply the same set of independence statements.
Knowledge of Markov equivalence provides alternative interpretations of a given
sequence of regressions, is essential for machine learning strategies and
permits to use the simple graphical criteria of regression graphs on graphs for
which the corresponding criteria are in general more complex. Under the known
conditions that a Markov equivalent directed acyclic graph exists for any given
regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited
discussion paper in the journal TES
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to study the impacts of public records and firmographics and predict the bankruptcy in a 12-month-ahead period with using different classification models and adding values to traditionally used financial ratios. Univariate analysis shows the statistical association and significance of public records and firmographics indicators with the bankruptcy. Further, seven statistical models and machine learning methods were developed, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and Neural Network. The performance of models were evaluated and compared based on classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset. Moreover, an experiment was set up to show the importance of oversampling for rare event prediction. The result also shows that Bayesian Network is comparatively more robust than other models without oversampling
- …