31,602 research outputs found

    classification of oncologic data with genetic programming

    Get PDF
    Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification

    Genetic programming and serial processing for time series classification

    Full text link
    This work describes an approach devised by the authors for time series classification. In our approach genetic programming is used in combination with a serial processing of data, where the last output is the result of the classification. The use of genetic programming for classification, although still a field where more research in needed, is not new. However, the application of genetic programming to classification tasks is normally done by considering the input data as a feature vector. That is, to the best of our knowledge, there are not examples in the genetic programming literature of approaches where the time series data are processed serially and the last output is considered as the classification result. The serial processing approach presented here fills a gap in the existing literature. This approach was tested in three different problems. Two of them are real world problems whose data were gathered for online or conference competitions. As there are published results of these two problems this gives us the chance to compare the performance of our approach against top performing methods. The serial processing of data in combination with genetic programming obtained competitive results in both competitions, showing its potential for solving time series classification problems. The main advantage of our serial processing approach is that it can easily handle very large datasets.Alfaro Cid, E.; Sharman, KC.; Esparcia AlcĆ”zar, AI. (2014). Genetic programming and serial processing for time series classification. Evolutionary Computation. 22(2):265-285. doi:10.1162/EVCO_a_00110S265285222Adeodato, P. J. L., Arnaud, A. L., Vasconcelos, G. C., Cunha, R. C. L. V., Gurgel, T. B., & Monteiro, D. S. M. P. (2009). The role of temporal feature extraction and bagging of MLP neural networks for solving the WCCI 2008 Ford Classification Challenge. 2009 International Joint Conference on Neural Networks. doi:10.1109/ijcnn.2009.5178965Alfaro-Cid, E., Merelo, J. J., de Vega, F. F., Esparcia-AlcĆ”zar, A. I., & Sharman, K. (2010). Bloat Control Operators and Diversity in Genetic Programming: A Comparative Study. Evolutionary Computation, 18(2), 305-332. doi:10.1162/evco.2010.18.2.18206Alfaro-Cid, E., Sharman, K., & Esparcia-Alcazar, A. I. (s.Ā f.). Evolving a Learning Machine by Genetic Programming. 2006 IEEE International Conference on Evolutionary Computation. doi:10.1109/cec.2006.1688316Arenas, M. G., Collet, P., Eiben, A. E., Jelasity, M., Merelo, J. J., Paechter, B., ā€¦ Schoenauer, M. (2002). A Framework for Distributed Evolutionary Algorithms. Lecture Notes in Computer Science, 665-675. doi:10.1007/3-540-45712-7_64Blankertz, B., Muller, K.-R., Curio, G., Vaughan, T. M., Schalk, G., Wolpaw, J. R., ā€¦ Birbaumer, N. (2004). The BCI Competition 2003: Progress and Perspectives in Detection and Discrimination of EEG Single Trials. IEEE Transactions on Biomedical Engineering, 51(6), 1044-1051. doi:10.1109/tbme.2004.826692Borrelli, A., De Falco, I., Della Cioppa, A., Nicodemi, M., & Trautteur, G. (2006). Performance of genetic programming to extract the trend in noisy data series. Physica A: Statistical Mechanics and its Applications, 370(1), 104-108. doi:10.1016/j.physa.2006.04.025Eads, D. R., Hill, D., Davis, S., Perkins, S. J., Ma, J., Porter, R. B., & Theiler, J. P. (2002). Genetic Algorithms and Support Vector Machines for Time Series Classification. Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V. doi:10.1117/12.453526Eggermont, J., Eiben, A. E., & van Hemert, J. I. (1999). A Comparison of Genetic Programming Variants for Data Classification. Lecture Notes in Computer Science, 281-290. doi:10.1007/3-540-48412-4_24Holladay, K. L., & Robbins, K. A. (2007). Evolution of Signal Processing Algorithms using Vector Based Genetic Programming. 2007 15th International Conference on Digital Signal Processing. doi:10.1109/icdsp.2007.4288629Kaboudan, M. A. (2000). Computational Economics, 16(3), 207-236. doi:10.1023/a:1008768404046Kishore, J. K., Patnaik, L. M., Mani, V., & Agrawal, V. K. (2000). Application of genetic programming for multicategory pattern classification. IEEE Transactions on Evolutionary Computation, 4(3), 242-258. doi:10.1109/4235.873235Kishore, J. K., Patnaik, L. M., Mani, V., & Agrawal, V. K. (2001). Genetic programming based pattern classification with feature space partitioning. Information Sciences, 131(1-4), 65-86. doi:10.1016/s0020-0255(00)00081-5Langdon, W. B., McKay, R. I., & Spector, L. (2010). Genetic Programming. International Series in Operations Research & Management Science, 185-225. doi:10.1007/978-1-4419-1665-5_7Yi Liu, & Khoshgoftaar, T. (s.Ā f.). Reducing overfitting in genetic programming models for software quality classification. Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings. doi:10.1109/hase.2004.1281730Luke, S. (2000). Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation, 4(3), 274-283. doi:10.1109/4235.873237Luke, S., & Panait, L. (2006). A Comparison of Bloat Control Methods for Genetic Programming. Evolutionary Computation, 14(3), 309-344. doi:10.1162/evco.2006.14.3.309Mensh, B. D., Werfel, J., & Seung, H. S. (2004). BCI Competition 2003ā€”Data Set Ia: Combining Gamma-Band Power With Slow Cortical Potentials to Improve Single-Trial Classification of Electroencephalographic Signals. IEEE Transactions on Biomedical Engineering, 51(6), 1052-1056. doi:10.1109/tbme.2004.827081Muni, D. P., Pal, N. R., & Das, J. (2006). Genetic programming for simultaneous feature selection and classifier design. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 36(1), 106-117. doi:10.1109/tsmcb.2005.854499Oltean, M., & Dioşan, L. (2009). An autonomous GP-based system for regression and classification problems. Applied Soft Computing, 9(1), 49-60. doi:10.1016/j.asoc.2008.03.008Otero, F. E. B., Silva, M. M. S., Freitas, A. A., & Nievola, J. C. (2003). Genetic Programming for Attribute Construction in Data Mining. Genetic Programming, 384-393. doi:10.1007/3-540-36599-0_36Poli, R. (2010). Genetic programming theory. Proceedings of the 12th annual conference comp on Genetic and evolutionary computation - GECCO ā€™10. doi:10.1145/1830761.1830905Tsakonas, A. (2006). A comparison of classification accuracy of four genetic programming-evolved intelligent structures. Information Sciences, 176(6), 691-724. doi:10.1016/j.ins.2005.03.012Wolpaw, J. R., Birbaumer, N., Heetderks, W. J., McFarland, D. J., Peckham, P. H., Schalk, G., ā€¦ Vaughan, T. M. (2000). Brain-computer interface technology: a review of the first international meeting. IEEE Transactions on Rehabilitation Engineering, 8(2), 164-173. doi:10.1109/tre.2000.84780

    Automatic programming methodologies for electronic hardware fault monitoring

    Get PDF
    This paper presents three variants of Genetic Programming (GP) approaches for intelligent online performance monitoring of electronic circuits and systems. Reliability modeling of electronic circuits can be best performed by the Stressor - susceptibility interaction model. A circuit or a system is considered to be failed once the stressor has exceeded the susceptibility limits. For on-line prediction, validated stressor vectors may be obtained by direct measurements or sensors, which after pre-processing and standardization are fed into the GP models. Empirical results are compared with artificial neural networks trained using backpropagation algorithm and classification and regression trees. The performance of the proposed method is evaluated by comparing the experiment results with the actual failure model values. The developed model reveals that GP could play an important role for future fault monitoring systems.This research was supported by the International Joint Research Grant of the IITA (Institute of Information Technology Assessment) foreign professor invitation program of the MIC (Ministry of Information and Communication), Korea

    Semantic variation operators for multidimensional genetic programming

    Full text link
    Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    Accelerated Particle Swarm Optimization and Support Vector Machine for Business Optimization and Applications

    Full text link
    Business optimization is becoming increasingly important because all business activities aim to maximize the profit and performance of products and services, under limited resources and appropriate constraints. Recent developments in support vector machine and metaheuristics show many advantages of these techniques. In particular, particle swarm optimization is now widely used in solving tough optimization problems. In this paper, we use a combination of a recently developed Accelerated PSO and a nonlinear support vector machine to form a framework for solving business optimization problems. We first apply the proposed APSO-SVM to production optimization, and then use it for income prediction and project scheduling. We also carry out some parametric studies and discuss the advantages of the proposed metaheuristic SVM.Comment: 12 page

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research
    • ā€¦
    corecore