7,505 research outputs found

    A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

    Get PDF
    Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

    Feature selection in a credit scoring model

    Get PDF
    This article belongs to the Special Issue Mathematics and Mathematical Physics Applied to Financial Markets.This paper proposes different classification algorithmsā€”logistic regression, support vector machine, K-nearest neighbors, and random forestā€”in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed

    A neural network architecture for data editing in the Bank of ItalyƂā€™s business surveys

    Get PDF
    This paper presents an application of neural network models to predictive classification for data quality control. Our aim is to identify data affected by measurement error in the Bank of ItalyƂā€™s business surveys. We build an architecture consisting of three feed-forward networks for variables related to employment, sales and investment respectively: the networks are trained on input matrices extracted from the error-free final survey database for the 2003 wave, and subjected to stochastic transformations reproducing known error patterns. A binary indicator of unit perturbation is used as the output variable. The networks are trained with the Resilient Propagation learning algorithm. On the training and validation sets, correct predictions occur in about 90 per cent of the records for employment, 94 per cent for sales, and 75 per cent for investment. On independent test sets, the respective quotas average 92, 80 and 70 per cent. On our data, neural networks perform much better as classifiers than logistic regression, one of the most popular competing methods, on our data. They appear to provide a valid means of improving the efficiency of the quality control process and, ultimately, the reliability of survey data.data quality, data editing, binary classification, neural networks, measurement error
    • ā€¦
    corecore