Search CORE

50,403 research outputs found

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Author: Alizadeh
Alon
Baker
Bayly
Blanco
Bontempi
Bouchard
Bouckaert
Braga-Neto
Causton
Duda
Efron
Francí
Friedman
Friedman
Friedman
Friedman
Fujita
Fukao
García
García
Garey
Golub
Greenbaum
Hall
Hall
Hartemink
Heckerman
Iñaki Inza
Kerber
Larrañaga
Lee
Li
Liang
Lin
Matusiak
Michiels
Minsky
Monti
Murayama
Pedro Larrañaga
Peña
Peña
Pe’er
Pe’er
Polyak
Rapaport
Rubén Armañanzas
Saeys
Sahami
Sakakura
Schwartz
Shmulevich
Simon
Stamatos
Statnikov
Swift
Takahashi
Wang
Wang
Wang
Yang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studie

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk

Author: Sariev Eduard
Publication venue: UCL (University College London)
Publication date: 28/02/2021
Field of study

This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients

UCL Discovery

Automated supervised classification of variable stars I. Methodology

Author: Aerts C.
Cuypers J.
Debosscher J.
Garrido R.
Sarro L. M.
Solano E.
Vandenbussche B.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2007
Field of study

The fast classification of new variable stars is an important step in making them available for further research. Selection of science targets from large databases is much more efficient if they have been classified first. Defining the classes in terms of physical parameters is also important to get an unbiased statistical view on the variability mechanisms and the borders of instability strips. Our goal is twofold: provide an overview of the stellar variability classes that are presently known, in terms of some relevant stellar parameters; use the class descriptions obtained as the basis for an automated `supervised classification' of large databases. Such automated classification will compare and assign new objects to a set of pre-defined variability training classes. For every variability class, a literature search was performed to find as many well-known member stars as possible, or a considerable subset if too many were present. Next, we searched on-line and private databases for their light curves in the visible band and performed period analysis and harmonic fitting. The derived light curve parameters are used to describe the classes and define the training classifiers. We compared the performance of different classifiers in terms of percentage of correct identification, of confusion among classes and of computation time. We describe how well the classes can be separated using the proposed set of parameters and how future improvements can be made, based on new large databases such as the light curves to be assembled by the CoRoT and Kepler space missions.Comment: This paper has been accepted for publication in Astronomy and Astrophysics (reference AA/2007/7638) Number of pages: 27 Number of figures: 1

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Radboud Repository