373 research outputs found

    Analytical study of computer vision-based pavement crack quantification using machine learning techniques

    Get PDF
    Image-based techniques are a promising non-destructive approach for road pavement condition evaluation. The main objective of this study is to extract, quantify and evaluate important surface defects, such as cracks, using an automated computer vision-based system to provide a better understanding of the pavement deterioration process. To achieve this objective, an automated crack-recognition software was developed, employing a series of image processing algorithms of crack extraction, crack grouping, and crack detection. Bottom-hat morphological technique was used to remove the random background of pavement images and extract cracks, selectively based on their shapes, sizes, and intensities using a relatively small number of user-defined parameters. A technical challenge with crack extraction algorithms, including the Bottom-hat transform, is that extracted crack pixels are usually fragmented along crack paths. For de-fragmenting those crack pixels, a novel crack-grouping algorithm is proposed as an image segmentation method, so called MorphLink-C. Statistical validation of this method using flexible pavement images indicated that MorphLink-C not only improves crack-detection accuracy but also reduces crack detection time. Crack characterization was performed by analysing imagerial features of the extracted crack image components. A comprehensive statistical analysis was conducted using filter feature subset selection (FSS) methods, including Fischer score, Gini index, information gain, ReliefF, mRmR, and FCBF to understand the statistical characteristics of cracks in different deterioration stages. Statistical significance of crack features was ranked based on their relevancy and redundancy. The statistical method used in this study can be employed to avoid subjective crack rating based on human visual inspection. Moreover, the statistical information can be used as fundamental data to justify rehabilitation policies in pavement maintenance. Finally, the application of four classification algorithms, including Artificial Neural Network (ANN), Decision Tree (DT), k-Nearest Neighbours (kNN) and Adaptive Neuro-Fuzzy Inference System (ANFIS) is investigated for the crack detection framework. The classifiers were evaluated in the following five criteria: 1) prediction performance, 2) computation time, 3) stability of results for highly imbalanced datasets in which, the number of crack objects are significantly smaller than the number of non-crack objects, 4) stability of the classifiers performance for pavements in different deterioration stages, and 5) interpretability of results and clarity of the procedure. Comparison results indicate the advantages of white-box classification methods for computer vision based pavement evaluation. Although black-box methods, such as ANN provide superior classification performance, white-box methods, such as ANFIS, provide useful information about the logic of classification and the effect of feature values on detection results. Such information can provide further insight for the image-based pavement crack detection application

    Predicting dental implant failures by integrating multiple classifiers

    Get PDF
    El campo de la ciencia de datos ha tenido muchos avances respecto a la aplicación y desarrollo de técnicas en el sector de la salud. Estos avances se ven reflejados en la predicción de enfermedades, clasificación de imágenes, identificación y reducción de riesgos, así como muchos otros. Este trabajo tiene por objetivo investigar el beneficio de la utilización de múltiples algoritmos de clasificación, para la predicción de fracasos en implantes dentales de la provincia de Misiones, Argentina y proponer un procedimiento validado por expertos humanos. El modelo abarca la combinación de los clasificadores: Random Forest, C-Support Vector, K-Nearest Neighbors, Multinomial Naive Bayes y Multi-layer Perceptron. La integración de los modelos se realiza con el weighted soft voting method. La experimentación es realizada con cuatro conjuntos de datos, un conjunto de implantes dentales confeccionado para el estudio de caso, un conjunto generado artificialmente y otros dos conjuntos obtenidos de distintos repositorios de datos. Los resultados arrojados del enfoque propuesto sobre el conjunto de datos de implantes dentales, es validado con el desempeño en la clasificación por expertos humanos. Nuestro enfoque logra un porcentaje de acierto del 93% de casos correctamente identificados, mientras que los expertos humanos consiguen un 87% de precisión.The field of data science has made many advances in the application and development of techniques in several aspects of the health sector, such as in disease prediction, image classification, risk identification and risk reduction. Based on this, the objectives of this work were to investigate the benefit of using multiple classification algorithms to predict dental implant failures in patients from Misiones province, Argentina, and to propose a procedure validated by human experts. The model used the integration of several types of classifiers.The experimentation was performed with four data sets: a data set of dental implants made for the case study, an artificially generated data set, and two other data sets obtained from different data repositories. The results of the approach proposed were validated by the performance in classification made by human experts. Our approach achieved a success rate of 93% of correctly identified cases, whereas human experts achieved 87% accuracy. Based on this, we can argue that multi-classifier systems are a good approach to predict dental implant failures.Fil: Ganz, Nancy Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Ares, Alicia Esther. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Kuna, Horacio Daniel. Universidad Nacional de Misiones. Facultad de Cs.exactas Quimicas y Naturales. Instituto de Investigacion Desarrollo E Innovacion En Informatica.; Argentin

    Improved Feature Weight Algorithm and Its Application to Text Classification

    Get PDF
    Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification. Text preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the classifiers’ accuracy and performance. Therefore, choosing the appropriate algorithm for feature selection and feature weighting to preprocess the document can greatly improve the performance of classifiers. According to the Gini Index theory, this paper proposes an Improved Gini Index algorithm. This algorithm constructs a new feature selection and feature weighting function. The experimental results show that this algorithm can improve the classifiers’ performance effectively. At the same time, this algorithm is applied to a sensitive information identification system and has achieved a good result. The algorithm’s precision and recall are higher than those of traditional ones. It can identify sensitive information on the Internet effectively

    Procedimiento para mejorar la precisión en el acierto de los fracasos en implantes dentales mediante técnicas de ciencia de datos

    Get PDF
    Nowadays, the prediction about dental implant failure is determined through clinical and radiological evaluation. For this reason, predictions are highly dependent on the Implantologists’ experience. In addition, it is extremely crucial to detect in time if a dental implant is going to fail, due to time, cost, trauma to the patient, postoperative problems, among others. This paper proposes a procedure using multiple feature selection methods and classification algorithms to improve the accuracy of dental implant failures in the province of Misiones, Argentina, validated by human experts. The experimentation is performed with two data sets, a set of dental implants made for the case study and an artificially generated set. The proposed approach allows to know the most relevant features and improve the accuracy in the classification of the target class (dental implant failure), to avoid biasing the decision making based on the application and results of individual methods. The proposed approach achieves an accuracy of 79% of failures, while individual classifiers achieve a maximum of 72%.Hoy en día, la predicción del fracaso de un implante dental está determinado a través de una evaluación clínica y radiológica. Por esta razón, las predicciones dependen en gran medida de la experiencia del implantólogo. Además, es extremadamente crucial detectar a tiempo si un implante dental va a fallar, por cuestiones de tiempo, costo, traumas al paciente, problemas postoperatorios, entre otros. En este trabajo se propone un procedimiento mediante la utilización de múltiples métodos de selección de características y algoritmos de clasificación, para mejorar la precisión en el acierto de los fracasos en implantes dentales de la provincia de Misiones, Argentina validado por expertos humanos. La experimentación es realizada con cuatro conjuntos de datos, un conjunto de implantes dentales confeccionado para el estudio de caso, un conjunto generado artificialmente y otros dos conjuntos obtenidos de distintos repositorios de datos. El procedimiento propuesto permitió conocer las características más relevantes y mejoró la precisión en la clasificación de la clase objetivo (fracaso del implante dental), permitiendo no sesgar la toma de decisión en base a la aplicación y resultados de método individuales. El procedimiento propuesto consigue una precisión del 79% de los fracasos, mientras que los clasificadores individuales alcanzan un máximo del 72%.Fil: Ganz, Nancy Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Ares, Alicia Esther. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Kuna, Horacio Daniel. Universidad Nacional de Misiones; Argentin

    A new unsupervised feature selection method for text clustering based on genetic algorithms

    Get PDF
    Nowadays a vast amount of textual information is collected and stored in various databases around the world, including the Internet as the largest database of all. This rapidly increasing growth of published text means that even the most avid reader cannot hope to keep up with all the reading in a field and consequently the nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. Text clustering is one of the most important areas in text mining, which includes text preprocessing, dimension reduction by selecting some terms (features) and finally clustering using selected terms. Feature selection appears to be the most important step in the process. Conventional unsupervised feature selection methods define a measure of the discriminating power of terms to select proper terms from corpus. However up to now the valuation of terms in groups has not been investigated in reported works. In this paper a new and robust unsupervised feature selection approach is proposed that evaluates terms in groups. In addition a new Modified Term Variance measuring method is proposed for evaluating groups of terms. Furthermore a genetic based algorithm is designed and implemented for finding the most valuable groups of terms based on the new measure. These terms then will be utilized to generate the final feature vector for the clustering process . In order to evaluate and justify our approach the proposed method and also a conventional term variance method are implemented and tested using corpus collection Reuters-21578. For a more accurate comparison, methods have been tested on three corpuses and for each corpus clustering task has been done ten times and results are averaged. Results of comparing these two methods are very promising and show that our method produces better average accuracy and F1-measure than the conventional term variance method

    An efficient emotion classification system using EEG

    Get PDF
    Emotion classification via Electroencephalography (EEG) is used to find the relationships between EEG signals and human emotions. There are many available channels, which consist of electrodes capturing brainwave activity. Some applications may require a reduced number of channels and frequency bands to shorten the computation time, facilitate human comprehensibility, and develop a practical wearable. In prior research, different sets of channels and frequency bands have been used. In this study, a systematic way of selecting the set of channels and frequency bands has been investigated, and results shown that by using the reduced number of channels and frequency bands, it can achieve similar accuracies. The study also proposed a method used to select the appropriate features using the Relief F method. The experimental results of this study showed that the method could reduce and select appropriate features confidently and efficiently. Moreover, the Fuzzy Support Vector Machine (FSVM) is used to improve emotion classification accuracy, as it was found from this research that it performed better than the Support Vector Machine (SVM) in handling the outliers, which are typically presented in the EEG signals. Furthermore, the FSVM is treated as a black-box model, but some applications may need to provide comprehensible human rules. Therefore, the rules are extracted using the Classification and Regression Trees (CART) approach to provide human comprehensibility to the system. The FSVM and rule extraction experiments showed that The FSVM performed better than the SVM in classifying the emotion of interest used in the experiments, and rule extraction from the FSVM utilizing the CART (FSVM-CART) had a good trade-off between classification accuracy and human comprehensibility

    An empirical study on credit evaluation of SMEs based on detailed loan data

    Get PDF
    Small and micro-sized Enterprises (SMEs) are an important part of Chinese economic system.The establishment of credit evaluating model of SMEs can effectively help financial intermediaries to reveal credit risk of enterprises and reduce the cost of enterprises information acquisition. Besides it can also serve as a guide to investors which also helps companies with good credit. This thesis conducts an empirical study based on loan data from a Chinese bank of loans granted to SMEs. The study aims to develop a data-driven model that can accurately predict if a given loan has an acceptable risk from the bank’s perspective, or not. Furthermore, we test different methods to deal with the problem of unbalanced class and uncredible sample. Lastly, the importance of variables is analyzed. Remaining Unpaid Principal, Floating Interest Rate, Time Until Maturity Date, Real Interest Rate, Amount of Loan all have significant effects on the final result of the prediction.The main contribution of this study is to build a credit evaluation model of small and micro enterprises, which not only helps commercial banks accurately identify the credit risk of small and micro enterprises, but also helps to overcome creditdifficulties of small and micro enterprises.As pequenas e microempresas constituem uma parte importante do sistema económico chinês. A definição de um modelo de avaliação de crédito para estas empresas pode ajudar os intermediários financeiros a revelarem o risco de crédito das empresas e a reduzirem o custo de aquisição de informação das empresas. Além disso, pode igualmente servir como guia para os investidores, auxiliando também empresas com bom crédito. Na presente tese apresenta-se um estudo empírico baseado em dados de um banco chinês relativos a empréstimos concedidos a pequenas e microempresas. O estudo visa desenvolver um modelo empírico que possa prever com precisão se um determinado empréstimo tem um risco aceitável do ponto de vista do banco, ou não. Além disso, são efetuados testes com diferentes métodos que permitem lidar com os problemas de classes de dados não balanceadas e de amostras que não refletem o problema real a modelar. Finalmente, é analisada a importância relativa das variáveis. O montante da dívida por pagar, a taxa de juro variável, o prazo até a data de vencimento, a taxa de juro real, o montante do empréstimo, todas têm efeitos significativos no resultado final da previsão. O principal contributo deste estudo é, assim, a construção de um modelo de avaliação de crédito que permite apoiar os bancos comerciais a identificarem com precisão o risco de crédito das pequenas e micro empresas e ajudar também estas empresas a superarem as suas dificuldades de crédito

    Machine learning based data pre-processing for the purpose of medical data mining and decision support

    Get PDF
    Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. Sometimes, improved data quality is itself the goal of the analysis, usually to improve processes in a production database and the designing of decision support. As medicine moves forward there is a need for sophisticated decision support systems that make use of data mining to support more orthodox knowledge engineering and Health Informatics practice. However, the real-life medical data rarely complies with the requirements of various data mining tools. It is often inconsistent, noisy, containing redundant attributes, in an unsuitable format, containing missing values and imbalanced with regards to the outcome class label.Many real-life data sets are incomplete, with missing values. In medical data mining the problem with missing values has become a challenging issue. In many clinical trials, the medical report pro-forma allow some attributes to be left blank, because they are inappropriate for some class of illness or the person providing the information feels that it is not appropriate to record the values for some attributes. The research reported in this thesis has explored the use of machine learning techniques as missing value imputation methods. The thesis also proposed a new way of imputing missing value by supervised learning. A classifier was used to learn the data patterns from a complete data sub-set and the model was later used to predict the missing values for the full dataset. The proposed machine learning based missing value imputation was applied on the thesis data and the results are compared with traditional Mean/Mode imputation. Experimental results show that all the machine learning methods which we explored outperformed the statistical method (Mean/Mode).The class imbalance problem has been found to hinder the performance of learning systems. In fact, most of the medical datasets are found to be highly imbalance in their class label. The solution to this problem is to reduce the gap between the minority class samples and the majority class samples. Over-sampling can be applied to increase the number of minority class sample to balance the data. The alternative to over-sampling is under-sampling where the size of majority class sample is reduced. The thesis proposed one cluster based under-sampling technique to reduce the gap between the majority and minority samples. Different under-sampling and over-sampling techniques were explored as ways to balance the data. The experimental results show that for the thesis data the new proposed modified cluster based under-sampling technique performed better than other class balancing techniques.In further research it is found that the class imbalance problem not only affects the classification performance but also has an adverse effect on feature selection. The thesis proposed a new framework for feature selection for class imbalanced datasets. The research found that, using the proposed framework the classifier needs less attributes to show high accuracy, and more attributes are needed if the data is highly imbalanced.The research described in the thesis contains the flowing four novel main contributions.a) Improved data mining methodology for mining medical datab) Machine learning based missing value imputation methodc) Cluster Based semi-supervised class balancing methodd) Feature selection framework for class imbalance datasetsThe performance analysis and comparative study show that the use of proposed method of missing value imputation, class balancing and feature selection framework can provide an effective approach to data preparation for building medical decision support
    • …
    corecore