50,403 research outputs found
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers
The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studie
Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk
This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients
Automated supervised classification of variable stars I. Methodology
The fast classification of new variable stars is an important step in making
them available for further research. Selection of science targets from large
databases is much more efficient if they have been classified first. Defining
the classes in terms of physical parameters is also important to get an
unbiased statistical view on the variability mechanisms and the borders of
instability strips. Our goal is twofold: provide an overview of the stellar
variability classes that are presently known, in terms of some relevant stellar
parameters; use the class descriptions obtained as the basis for an automated
`supervised classification' of large databases. Such automated classification
will compare and assign new objects to a set of pre-defined variability
training classes. For every variability class, a literature search was
performed to find as many well-known member stars as possible, or a
considerable subset if too many were present. Next, we searched on-line and
private databases for their light curves in the visible band and performed
period analysis and harmonic fitting. The derived light curve parameters are
used to describe the classes and define the training classifiers. We compared
the performance of different classifiers in terms of percentage of correct
identification, of confusion among classes and of computation time. We describe
how well the classes can be separated using the proposed set of parameters and
how future improvements can be made, based on new large databases such as the
light curves to be assembled by the CoRoT and Kepler space missions.Comment: This paper has been accepted for publication in Astronomy and
Astrophysics (reference AA/2007/7638) Number of pages: 27 Number of figures:
1
- …