435 research outputs found

    Integrated smoothed location model and data reduction approaches for multi variables classification

    Get PDF
    Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct a practical classification analysis, the objects must first be sorted into the cells of a multinomial table generated from the binary variables. Then, the parameters in each cell will be estimated using the sorted objects. However, in many situations, the estimated parameters are poor if the number of binary is large relative to the size of sample. Large binary variables will create too many multinomial cells which are empty, leading to high sparsity problem and finally give exceedingly poor performance for the constructed rule. In the worst case scenario, the rule cannot be constructed. To overcome such shortcomings, this study proposes new strategies to extract adequate variables that contribute to optimum performance of the rule. Combinations of two extraction techniques are introduced, namely 2PCA and PCA+MCA with new cutpoints of eigenvalue and total variance explained, to determine adequate extracted variables which lead to minimum misclassification rate. The outcomes from these extraction techniques are used to construct the smoothed location models, which then produce two new approaches of classification called 2PCALM and 2DLM. Numerical evidence from simulation studies demonstrates that the computed misclassification rate indicates no significant difference between the extraction techniques in normal and non-normal data. Nevertheless, both proposed approaches are slightly affected for non-normal data and severely affected for highly overlapping groups. Investigations on some real data sets show that the two approaches are competitive with, and better than other existing classification methods. The overall findings reveal that both proposed approaches can be considered as improvement to the location model, and alternatives to other classification methods particularly in handling mixed variables with large binary size

    Adapting image processing and clustering methods to productive efficiency analysis and benchmarking: A cross disciplinary approach

    Get PDF
    This dissertation explores the interdisciplinary applications of computational methods in quantitative economics. Particularly, this thesis focuses on problems in productive efficiency analysis and benchmarking that are hardly approachable or solvable using conventional methods. In productive efficiency analysis, null or zero values are often produced due to the wrong skewness or low kurtosis of the inefficiency distribution as against the distributional assumption on the inefficiency term. This thesis uses the deconvolution technique, which is traditionally used in image processing for noise removal, to develop a fully non-parametric method for efficiency estimation. Publications 1 and 2 are devoted to this topic, with focus being laid on the cross-sectional case and panel case, respectively. Through Monte-Carlo simulations and empirical applications to Finnish electricity distribution network data and Finnish banking data, the results show that the Richardson-Lucy blind deconvolution method is insensitive to the distributio-nal assumptions, robust to the data noise levels and heteroscedasticity on efficiency estimation. In benchmarking, which could be the next step of productive efficiency analysis, the 'best practice' target may not perform under the same operational environment with the DMU under study. This would render the benchmarks impractical to follow and adversely affects the managers to make the correct decisions on performance improvement of a DMU. This dissertation proposes a clustering-based benchmarking framework in Publication 3. The empirical study on Finnish electricity distribution network reveals that the proposed framework novels not only in its consideration on the differences of the operational environment among DMUs, but also its extreme flexibility. We conducted a comparison analysis on the different combinations of the clustering and efficiency estimation techniques using computational simulations and empirical applications to Finnish electricity distribution network data, based on which Publication 4 specifies an efficient combination for benchmarking in energy regulation.  This dissertation endeavors to solve problems in quantitative economics using interdisciplinary approaches. The methods developed benefit this field and the way how we approach the problems open a new perspective

    Data Envelopment Analysis may Obfuscate Corporate Financial Data: Using Support Vector Machine and Data Envelopment Analysis to Predict Corporate Failure for Nonmanufacturing Firms

    Get PDF
    This is an Accepted Manuscript of an article published by Taylor & Francis in INFOR: Information Systems and Operational Research in 2017, available online: https://doi.org/10.1080/03155986.2017.1282290Corporate failure prediction has drawn numerous scholars’ attention because of its usefulness in corporate risk management, as well as in regulating corporate operational status. Most research on this topic focuses on manufacturing companies and relies heavily on corporate assets. The asset size of manufacturing companies play a vital role in traditional research methods; Altman’s score model is one such traditional method. However, a limited number of researchers studied corporate failure prediction for nonmanufacturing companies as the operational status of such companies is not solely correlated to their assets. In this paper we use support vector machines (SVMs) and data envelopment analysis (DEA) to provide a new method for predicting corporate failure of nonmanufacturing firms. We show that using only DEA scores provides better predictions of corporate failure predictions than using the original, raw, data for the provided dataset. To determine the DEA scores, we first generate efficiency scores using a slack-based measure (SBM) DEA model, using the recent three years historical data of nonmanufacturing firms; then we used SVMs to classify bankrupt and non-bankrupt firms. We show that using DEA scores as the only inputs into SVMs predict corporate failure more accurately than using the entire raw data available.Natural Sciences and Engineering Research Council of Canad

    An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

    Get PDF
    Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on the performance of imbalanced classifiers, which are generally neglected by existing evaluation methods. The objective of this study is to introduce a new criterion to comprehensively evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established using data envelopment analysis without explicit inputs (DEA-WEI), to determine the trade-off between the benefits of improved minority class accuracy and the cost of reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced ratio and typical imbalanced data characteristics on the efficiency of the classifiers. Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble and undersampling techniques are more effective for overlapping and noisy data. The efficiency of cost-sensitive classifiers decreases dramatically when the imbalanced ratio increases. Finally, we investigate the reasons for the different efficiencies of classifiers on imbalanced data and recommend steps to select appropriate classifiers for imbalanced data based on data characteristics.National Natural Science Foundation of China (NSFC) 71874023 71725001 71771037 7197104

    Breast Cancer Diagnosis from Perspective of Class Imbalance

    Get PDF
    Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the diagnosis of breast cancer and establish a more accurate diagnosis. Material and Methods: The proposed hybrid approach was based on improved Laplacian score (LS) andK-nearest neighbor (KNN) algorithms called LS-KNN. An improved LS algorithm was used for obtaining the optimal feature subset. The KNN with automatic K was utilized for classifying the data which guaranteed the effectiveness of the proposed method by reducing the computational effort and making the classification more faster. The effectiveness of LS-KNN was also examined on two biased-representative breast cancer datasets using classification accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient. Results: Applying the proposed algorithm on two breast cancer datasets indicated that the efficiency of the new method was higher than the previously introduced methods. The obtained values of accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient were 99.27%, 99.12%, 99.51%, 99.42%, respectively. Conclusion: Experimental results showed that the proposed approach worked well with breast cancer datasets and could be a good alternative to the well-known machine learning method

    Performance of small and medium enterprises and the impact of environmental variables: evidence from Vietnam

    Get PDF
    thesis is developed from a real life application of performance evaluation of small and medium-sized enterprises (SMEs) in Vietnam. The thesis presents two main methodological developments on evaluation of dichotomous environment variable impacts on technical efficiency. Taking into account the selection bias the thesis proposes a revised frontier separation approach for the seminal Data Envelopment Analysis (DEA) model which was developed by Charnes, Cooper, and Rhodes (1981). The revised frontier separation approach is based on a nearest neighbour propensity score matching pairing treated SMEs with their counterfactuals on the propensity score. The thesis develops order-m frontier conditioning on propensity score from the conditional order-m approach proposed by Cazals, Florens, and Simar (2002), advocated by Daraio and Simar (2005). By this development, the thesis allows the application of the conditional order-m approach with a dichotomous environment variable taking into account the existence of the self-selection problem of impact evaluation. Monte Carlo style simulations have been built to examine the effectiveness of the aforementioned developments. Methodological developments of the thesis are applied in empirical studies to evaluate the impact of training programmes on the performance of food processing SMEs and the impact of exporting on technical efficiency of textile and garment SMEs of Vietnam. The analysis shows that training programmes have no significant impact on the technical efficiency of food processing SMEs. Moreover, the analysis confirms the conclusion of the export literature that exporters are self selected into the sector. The thesis finds no significant impact from exporting activities on technical efficiency of textile and garment SMEs. However, large bias has been eliminated by the proposed approach. Results of empirical studies contribute to the understanding of the impact of different environmental variables on the performance of SMEs. It helps policy makers to design proper policy supporting the development of Vietnamese SMEs

    Nonparametric production and frontier analysis: applications in economics

    Get PDF
    • …
    corecore