Search CORE

124,819 research outputs found

Feature Selection with Mutual Information for Regression Problems

Author: Jane Labadin
Muhammad Aliyu Sulaiman
Publication venue
Publication date: 01/01/2015
Field of study

Selecting relevant features for machine learning modeling improves the performance of the learning methods. Mutual information (MI) is known to be used as relevant criterion for selecting feature subsets from input dataset with a nonlinear relationship to the predicting attribute. However, mutual information estimator suffers the following limitation; it depends on smoothing parameters, the feature selection greedy methods lack theoretically justified stopping criteria and in theory it can be used for both classification and regression problems, however in practice more often it formulation is limited to classification problems. This paper investigates a proposed improvement on the three limitations of the Mutual Information estimator (as mentioned above), through the use of resampling techniques and formulation of mutual information based on differential entropic for regression problems

Unimas Institutional Repository

Data mining for vehicle telemetry

Author: Anand Sarabjot Singh
Bhalerao Abhir
Gelencser Adam
Griffiths Nathan
Popham T. J.
Taylor Phillip M.
Xu Zhou
Publication venue: 'Informa UK Limited'
Publication date: 15/03/2016
Field of study

This article presents a data mining methodology for driving-condition monitoring via CAN-bus data that is based on the general data mining process. The approach is applicable to many driving condition problems, and the example of road type classification without the use of location information is investigated. Location information from Global Positioning Satellites and related map data are often not available (for business reasons), or cannot represent the full dynamics of road conditions. In this work, Controller Area Network (CAN)-bus signals are used instead as inputs to models produced by machine learning algorithms. Road type classification is formulated as two related labeling problems: Road Type (A, B, C, and Motorway) and Carriageway Type (Single or Dual). An investigation is presented into preprocessing steps required prior to applying machine learning algorithms, that is, signal selection, feature extraction, and feature selection. The selection methods used include principal components analysis (PCA) and mutual information (MI), which are used to determine the relevance and redundancy of extracted features and are performed in various combinations. Finally, because there is an inherent bias toward certain road and carriageway labelings, the issue of class imbalance in classification is explained and investigated. A system is produced, which is demonstrated to successfully ascertain road type from CAN-bus data, and it is shown that the classification correlates well with input signals such as vehicle speed, steering wheel angle, and suspension height

Crossref

Warwick Research Archives Portal Repository

Data mining for vehicle telemetry

Author: Bhalerao Abhir
Gelencser Adam
Griffiths Nathan
Popham Thomas
Sarabjot Anand
Taylor Phillip M.
Zhou Xu
Publication venue: 'Informa UK Limited'
Publication date: 15/03/2016
Field of study

This paper presents a data mining methodology for driving condition monitoring via CAN-bus data that is based on the general data mining process. The approach is applicable to many driving condition problems and the example of road type classification without the use of location information is investigated. Location information from Global Positioning Satellites and related map data are often not available (for business reasons), or cannot represent the full dynamics of road conditions. In this work, Controller Area Network (CAN)-bus signals are used instead as inputs to models produced by machine learning algorithms. Road type classification is formulated as two related labelling problems: Road Type (A, B, C and Motorway) and Carriageway Type (Single or Dual). An investigation is presented into preprocessing steps required prior to applying machine learning algorithms, namely, signal selection, feature extraction, and feature selection. The selection methods used include Principal Components Analysis (PCA) and Mutual Information (MI), which are used to determine the relevance and redundancy of extracted features, and are performed in various combinations. Finally, as there is an inherent bias towards certain road and carriageway labellings, the issue of class imbalance in classification is explained and investigated. A system is produced, which is demonstrated to successfully ascertain road type from CAN-bus data, and it is shown that the classification correlates well with input signals such as vehicle speed, steering wheel angle, and suspension heigh

Warwick Research Archives Portal Repository

A Novel Feature Selection Scheme and a Diversified-Input SVM-Based Classifier for Sensor Fault Classification

Author: Jan Sana Ullah
Koo Insoo
Publication venue: Hindawi
Publication date: 01/01/2018
Field of study

The efficiency of a binary support vector machine- (SVM-) based classifier depends on the combination and the number of input features extracted from raw signals. Sometimes, a combination of individual good features does not perform well in discriminating a class due to a high level of relevance to a second class also. Moreover, an increase in the dimensions of an input vector also degrades the performance of a classifier in most cases. To get efficient results, it is needed to input a combination of the lowest possible number of discriminating features to a classifier. In this paper, we propose a framework to improve the performance of an SVM-based classifier for sensor fault classification in two ways: firstly, by selecting the best combination of features for a target class from a feature pool and, secondly, by minimizing the dimensionality of input vectors. To obtain the best combination of features, we propose a novel feature selection algorithm that selects m out of M features having the maximum mutual information (or relevance) with a target class and the minimum mutual information with nontarget classes. This technique ensures to select the features sensitive to the target class exclusively. Furthermore, we propose a diversified-input SVM (DI-SVM) model for multiclass classification problems to achieve our second objective which is to reduce the dimensions of the input vector. In this model, the number of SVM-based classifiers is the same as the number of classes in the dataset. However, each classifier is fed with a unique combination of features selected by a feature selection scheme for a target class. The efficiency of the proposed feature selection algorithm is shown by comparing the results obtained from experiments performed with and without feature selection. Furthermore, the experimental results in terms of accuracy, receiver operating characteristics (ROC), and the area under the ROC curve (AUC-ROC) show that the proposed DI-SVM model outperforms the conventional model of SVM, the neural network, and the -nearest neighbor algorithm for sensor fault detection and classification

Directory of Open Access Journals

Repository@Napier

Feature selection and hierarchical classifier design with applications to human motion recognition

Author: Freeman Cecille
Publication venue: 'University of Waterloo'
Publication date: 01/01/2014
Field of study

The performance of a classifier is affected by a number of factors including classifier type, the input features and the desired output. This thesis examines the impact of feature selection and classification problem division on classification accuracy and complexity. Proper feature selection can reduce classifier size and improve classifier performance by minimizing the impact of noisy, redundant and correlated features. Noisy features can cause false association between the features and the classifier output. Redundant and correlated features increase classifier complexity without adding additional information. Output selection or classification problem division describes the division of a large classification problem into a set of smaller problems. Problem division can improve accuracy by allocating more resources to more difficult class divisions and enabling the use of more specific feature sets for each sub-problem. The first part of this thesis presents two methods for creating feature-selected hierarchical classifiers. The feature-selected hierarchical classification method jointly optimizes the features and classification tree-design using genetic algorithms. The multi-modal binary tree (MBT) method performs the class division and feature selection sequentially and tolerates misclassifications in the higher nodes of the tree. This yields a piecewise separation for classes that cannot be fully separated with a single classifier. Experiments show that the accuracy of MBT is comparable to other multi-class extensions, but with lower test time. Furthermore, the accuracy of MBT is significantly higher on multi-modal data sets. The second part of this thesis focuses on input feature selection measures. A number of filter-based feature subset evaluation measures are evaluated with the goal of assessing their performance with respect to specific classifiers. Although there are many feature selection measures proposed in literature, it is unclear which feature selection measures are appropriate for use with different classifiers. Sixteen common filter-based measures are tested on 20 real and 20 artificial data sets, which are designed to probe for specific feature selection challenges. The strengths and weaknesses of each measure are discussed with respect to the specific feature selection challenges in the artificial data sets, correlation with classifier accuracy and their ability to identify known informative features. The results indicate that the best filter measure is classifier-specific. K-nearest neighbours classifiers work well with subset-based RELIEF, correlation feature selection or conditional mutual information maximization, whereas Fisher's interclass separability criterion and conditional mutual information maximization work better for support vector machines. Based on the results of the feature selection experiments, two new filter-based measures are proposed based on conditional mutual information maximization, which performs well but cannot identify dependent features in a set and does not include a check for correlated features. Both new measures explicitly check for dependent features and the second measure also includes a term to discount correlated features. Both measures correctly identify known informative features in the artificial data sets and correlate well with classifier accuracy. The final part of this thesis examines the use of feature selection for time-series data by using feature selection to determine important individual time windows or key frames in the series. Time-series feature selection is used with the MBT algorithm to create classification trees for time-series data. The feature selected MBT algorithm is tested on two human motion recognition tasks: full-body human motion recognition from joint angle data and hand gesture recognition from electromyography data. Results indicate that the feature selected MBT is able to achieve high classification accuracy on the time-series data while maintaining a short test time

University of Waterloo's Institutional Repository

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Author: Bach F.
Cortes C.
Cover T. M.
Eric P. Xing
Fukumizu K.
Leonid Sigal
Li F.
Liu H.
Makoto Yamada
Masaeli M.
Masashi Sugiyama
Nocedal J.
Raskutti G.
Rodriguez-Lujan I.
Schölkopf B.
Seeger M.
Song L.
Tibshirani R.
Tomioka R.
Wittawat Jitkrittum
Xing E. P.
Zhao Z.
Publication venue: 'MIT Press - Journals'
Publication date: 03/01/2019
Field of study

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.Comment: 18 page

arXiv.org e-Print Archive

Crossref

Application of mutual information-based sequential feature selection to ISBSG mixed data

Author: Fernández-Diego Marta
González-Ladrón-de-Guevara Fernando
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2017
Field of study

[EN] There is still little research work focused on feature selection (FS) techniques including both categorical and continuous features in Software Development Effort Estimation (SDEE) literature. This paper addresses the problem of selecting the most relevant features from ISBSG (International Software Benchmarking Standards Group) dataset to be used in SDEE. The aim is to show the usefulness of splitting the ranked list of features provided by a mutual information-based sequential FS approach in two, regarding categorical and continuous features. These lists are later recombined according to the accuracy of a case-based reasoning model. Thus, four FS algorithms are compared using a complete dataset with 621 projects and 12 features from ISBSG. On the one hand, two algorithms just consider the relevance, while the remaining two follow the criterion of maximizing relevance and also minimizing redundancy between any independent feature and the already selected features. On the other hand, the algorithms that do not discriminate between continuous and categorical features consider just one list, whereas those that differentiate them use two lists that are later combined. As a result, the algorithms that use two lists present better performance than those algorithms that use one list. Thus, it is meaningful to consider two different lists of features so that the categorical features may be selected more frequently. We also suggest promoting the usage of Application Group, Project Elapsed Time, and First Data Base System features with preference over the more frequently used Development Type, Language Type, and Development Platform.Fernández-Diego, M.; González-Ladrón-De-Guevara, F. (2018). Application of mutual information-based sequential feature selection to ISBSG mixed data. Software Quality Journal. 26(4):1299-1325. https://doi.org/10.1007/s11219-017-9391-5S12991325264Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Empirical Software Engineering, 5(1), 35–68. https://doi.org/10.1023/A:1009897800559 .Auer, M., Trendowicz, A., Graser, B., Haunschmid, E., & Biffl, S. (2006). Optimal project feature weights in analogy-based cost estimation: improvement and limitations. Software Engineering, IEEE Transactions on, 32(2), 83–92.Awada, W., Khoshgoftaar, T. M., Dittman, D., Wald, R., Napolitano, A. (2012). A review of the stability of feature selection techniques for bioinformatics data. In 2012 I.E. 13th International Conference on Information Reuse and Integration (IRI) (pp. 356–363). Presented at the 2012 I.E. 13th International Conference on Information Reuse and Integration (IRI). https://doi.org/10.1109/IRI.2012.6303031 .Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. Neural Networks, IEEE Transactions, 5(4), 537–550.Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, 42(22), 8520–8532. https://doi.org/10.1016/j.eswa.2015.07.007 .Bibi, S., Tsoumakas, G., Stamelos, I., & Vlahavas, I. (2008). Regression via classification applied on software defect estimation. Expert Systems with Applications, 34(3), 2091–2101. https://doi.org/10.1016/j.eswa.2007.02.012 .Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.Chatzipetrou, P., Papatheocharous, E., Angelis, L., Andreou, A. S. (2012). An investigation of software effort phase distribution using compositional data analysis. In 2012 38th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA) (pp. 367–375). Presented at the 2012 38th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA). https://doi.org/10.1109/SEAA.2012.50 .Chen, Z., Menzies, T., Port, D., & Boehm, B. (2005). Feature subset selection can improve software cost estimation accuracy. In Proceedings of the 2005 workshop on predictor models in software engineering (pp. 1–6). New York: ACM. https://doi.org/10.1145/1082983.1083171 .Chiu, N.-H., & Huang, S.-J. (2007). The adjusted analogy-based software effort estimation based on similarity distances. Journal of Systems and Software, 80(4), 628–640.Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1), 155–176.Dejaeger, K., Verbeke, W., Martens, D., & Baesens, B. (2012). Data mining techniques for software effort estimation: a comparative study. Software Engineering, IEEE Transactions on, 38(2), 375–397. https://doi.org/10.1109/TSE.2011.55 .Deng, K., & MacDonell, S. G. (2008). Maximising data retention from the ISBSG repository. In Proceedings of the 12th international conference on evaluation and assessment in software engineering (pp. 21–30). Swinton: British Computer Society http://dl.acm.org/citation.cfm?id=2227115.2227118 . Accessed 21 Jan 2014.Doquire, G., & Verleysen, M. (2011). An hybrid approach to feature selection for mixed categorical and continuous data. In International Conference on Knowledge Discovery and Information Retrieval. http://hdl.handle.net/2078.1/90765 . Accessed 2 Nov 2015.Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, SMC, 6(4), 325–327. https://doi.org/10.1109/TSMC.1976.5408784 .Estévez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201. https://doi.org/10.1109/TNN.2008.2005601 .Fayyad, U.M., & Irani, K.B. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of the International Joint Conference on Uncertainty in AI (pp. 1022–1027). Presented at the International Joint Conference on Uncertainty in AI. https://www.researchgate.net/publication/220815890_Multi-Interval_Discretization_of_Continuous-Valued_Attributes_for_Classification_Learning . Accessed 22 June 2016.Fernández-Diego, M., & González-Ladrón-de-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: a mapping review. Information and Software Technology, 56(6), 527–544. https://doi.org/10.1016/j.infsof.2014.01.003 .Ferreira, A., & Figueiredo, M. (2011). Unsupervised joint feature discretization and selection. In J. Vitrià, J. M. Sanches, & M. Hernández (Eds.), Pattern recognition and image analysis (Vol. 6669, pp. 200–207). Berlin, Heidelberg: Springer Berlin Heidelberg http://link.springer.com/10.1007/978-3-642-21257-4_25 . Accessed 4 Mar 2016.Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5, 1531–1555.González-Ladrón-de-Guevara, F., Fernández-Diego, M., & Lokan, C. (2016). The usage of ISBSG data fields in software effort estimation: a systematic mapping study. Journal of Systems and Software, 113, 188–215. https://doi.org/10.1016/j.jss.2015.11.040 .Gupta, P., Jain, S., & Jain, A. (2014). A review of fast clustering-based feature subset selection algorithm. International Journal of Scientific & Technology Research, 3(11), 86–91.Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15(6), 1437–1447. https://doi.org/10.1109/TKDE.2003.1245283 .Hausser, J., & Strimmer, K. (2009). Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. Journal of Machine Learning Research, 10(Jul), 1469–1484.Hill, P. (2010). Practical software project estimation: a toolkit for estimating software development effort & duration. McGraw Hill Professional.Hsu, H.-H., Hsieh, C.-W., & Lu, M.-D. (2011). Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 38(7), 8144–8150.Huang, S.-J., & Chiu, N.-H. (2006). Optimization of analogy weights by genetic algorithm for software effort estimation. Information and Software Technology, 48(11), 1034–1045. https://doi.org/10.1016/j.infsof.2005.12.020 .Huang, S.-J., Chiu, N.-H., & Liu, Y.-J. (2008). A comparative evaluation on the accuracies of software effort estimates from clustered data. Information and Software Technology, 50(9–10), 879–888. https://doi.org/10.1016/j.infsof.2008.02.005 .Huang, J., Li, Y.-F., & Xie, M. (2015). An empirical analysis of data preprocessing for machine learning-based software cost estimation. Information and Software Technology, 67, 108–127. https://doi.org/10.1016/j.infsof.2015.07.004 .ISBSG. (2013a). ISBSG Dataset Release 12. ISBSG. http://isbsg.org/ . Accessed 1 Mar 2016.ISBSG. (2013b). ISBSG Guidelines Release 12.ISBSG. (2013c). ISBSG Data Demographics Release 12.Jeffery, R., Ruhe, M., Wieczorek, I. (2001). Using public domain metrics to estimate software development effort. In Software Metrics Symposium, 2001. METRICS 2001. Proceedings. Seventh International (pp. 16–27). https://doi.org/10.1109/METRIC.2001.915512 .Jiang, Z., & Comstock, C. (2007). The factors significant to software development productivity. In C. Ardil (Ed.), Proceedings of World Academy of Science, Engineering and Technology, Vol 19 (Vol. 19, pp. 160–164). Presented at the Conference of the World-Academy-of-Science-Engineering-and-Technology, Bangkok: World Acad Sci, Eng & Tech-Waset.Jørgensen, M., Indahl, U., & Sjøberg, D. (2003). Software effort estimation by analogy and ‘regression toward the mean’. Journal of Systems and Software, 68(3), 253–262. https://doi.org/10.1016/S0164-1212(03)00066-9 .Kabir, M. M., Shahjahan, M., & Murase, K. (2011). A new local search based hybrid genetic algorithm for feature selection. Neurocomputing, 74(17), 2914–2928.Kadoda, G., Cartwright, M., Chen, L., Shepperd, M. (2000). Experiences using case-based reasoning to predict software project effort. In EASE 2000 (pp. 2–3). Presented at the EASE 2000, Staffordshire, UK.Keung, J., Kocaguneli, E., & Menzies, T. (2012). Finding conclusion stability for selecting the best effort predictor in software effort estimation. Automated Software Engineering, 20(4), 543–567. https://doi.org/10.1007/s10515-012-0108-5 .Kirsopp, C., Shepperd, M. J., Hart, J. (2002). Search heuristics, case-based reasoning and software project effort prediction. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 9–13). New York, USA. http://v-scheiner.brunel.ac.uk/handle/2438/1554 . Accessed 27 Jan 2016.Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. https://doi.org/10.1016/S0004-3702(97)00043-X .Kwak, N., & Choi, C.-H. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13(1), 143–159. https://doi.org/10.1109/72.977291 .Langdon, W. B., Dolado, J., Sarro, F., & Harman, M. (2016). Exact mean absolute error of baseline predictor, MARP0. Information and Software Technology, 73, 16–18. https://doi.org/10.1016/j.infsof.2016.01.003 .Li, Y. F., Xie, M., & Goh, T. N. (2009). A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Systems with Applications, 36(3), 5921–5931.Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer Science & Business Media. https://books.google.es/books?hl=en&lr=&id=aaDbBwAAQBAJ&oi=fnd&pg=PP10&dq=Feature+selection+for+knowledge+discovery+and+data+mining&ots=iuMhcWZGcf&sig=KlmNEIcsBdDVs-m1HUuICfpYZiM . Accessed 25 Jan 2016.Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502. https://doi.org/10.1109/TKDE.2005.66 .Liu, H., Wei, R., & Jiang, G. (2013). A hybrid feature selection scheme for mixed attributes data. Computational and Applied Mathematics, 32(1), 145–161. https://doi.org/10.1007/s40314-013-0019-5 .Liu, Q., Wang, J., Xiao, J., Zhu, H. (2014). Mutual information based feature selection for symbolic interval data. In International Conference on Software Intelligence Technologies and Applications International Conference on Frontiers of Internet of Things 2014 (pp. 62–69). Presented at the International Conference on Software Intelligence Technologies and Applications International Conference on Frontiers of Internet of Things 2014. https://doi.org/10.1049/cp.2014.1537 .Lokan, C. (2005). What should you optimize when building an estimation model? In Software Metrics, 2005. 11th IEEE International Symposium (pp. 1–10). https://doi.org/10.1109/METRICS.2005.55 .Lokan, C., & Mendes, E. (2009a). Investigating the use of chronological split for software effort estimation. Software, IET, 3(5), 422–434. https://doi.org/10.1049/iet-sen.2008.0107 .Lokan, C., & Mendes, E. (2009b). Applying moving windows to software effort estimation. In Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement (pp. 111–122). Washington, DC: IEEE Computer Society. https://doi.org/10.1109/ESEM.2009.5316019 .Lokan, C., & Mendes, E. (2012). Investigating the use of duration-based moving windows to improve software effort prediction. In Software Engineering Conference (APSEC), 2012 19th Asia-Pacific (Vol. 1, pp. 818–827). Presented at the Software Engineering Conference (APSEC), 2012 19th Asia-Pacific. https://doi.org/10.1109/APSEC.2012.74 .Lustgarten, J.L., Visweswaran, S., Grover, H., Gopalakrishnan, V. (2008). An evaluation of discretization methods for learning rules from biomedical datasets. In BIOCOMP (pp. 527–532).Mandal, M., & Mukhopadhyay, A. (2013). An improved minimum redundancy maximum relevance approach for feature selection in gene expression data. Procedia Technology, 10, 20–27. https://doi.org/10.1016/j.protcy.2013.12.332 .Mendes, E., Watson, I., Triggs, C., Mosley, N., & Counsell, S. (2003). A comparative study of cost estimation models for web hypermedia applications. Empirical Software Engineering, 8(2), 163–196.Mendes, E., Lokan, C., Harrison, R., Triggs, C. (2005). A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database. In Software Metrics, 2005. 11th IEEE International Symposium (pp. 1–10). https://doi.org/10.1109/METRICS.2005.4 .Moses, J., Farrow, M., Parrington, N., & Smith, P. (2006). A productivity benchmarking case study using Bayesian credible intervals. Software Quality Journal, 14(1), 37–52. https://doi.org/10.1007/s11219-006-6000-4 .Núñez, H., Sànchez-Marrè, M., Cortés, U., Comas, J., Martínez, M., Rodríguez-Roda, I., & Poch, M. (2004). A comparative study on the use of similarity measures in case-based reasoning to improve the classification of environmental system situations. Environmental Modelling & Software, 19(9), 809–819. https://doi.org/10.1016/j.envsoft.2003.03.003 .Oh, I.-S., Lee, J.-S., & Moon, B.-R. (2004). Hybrid genetic algorithms for feature selection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(11), 1424–1437.Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. https://doi.org/10.1109/TPAMI.2005.159 .R Core Team. (2015). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing https://www.R-project.org/ .Romanski, P., & Kotthoff, L. (2014). FSelector: Selecting attributes. R package version 0.20. https://CRAN.R-project.org/package=FSelector .Shannon, C. E. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.Shepperd, M., & MacDonell, S. (2012). Evaluating prediction systems in software project estimation. Information and Software Technology, 54(8), 820–827.Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. Software Engineering, IEEE Transactions on, 23(11), 736–743.Somol, P., Pudil, P., & Kittler, J. (2004). Fast branch & bound algorithms for optimal feature selection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(7), 900–912.Song, Q., & Shepperd, M. (2007). A new imputation method for small software project data sets. Journal of Systems and Software, 80(1), 51–62.Top, O. O., Ozkan, B., Nabi, M., Demirors, O. (2011). Internal and External Software Benchmark Repository Utilization for Effort Estimation. In Software Measurement, 2011 Joint Conference of the 21st Int’l Workshop on and 6th Int’l Conference on Software Process and Product Measurement (IWSM-MENSURA) (pp. 302–307). https://doi.org/10.1109/IWSM-MENSURA.2011.41 .Vinh, L.T., Thang, N.D., Lee, Y.-K. (2010). An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet (SAINT) (pp. 395–398). Presented at the 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet (SAINT). https://doi.org/10.1109/SAINT.2010.50 .Witten, I.H., Frank, E., Hall, M.A., Pal, C.J. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann

Crossref

RiuNet

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Open Research Exeter

Dimension Reduction by Mutual Information Discriminant Analysis

Author: Shadvar Ali
Publication venue
Publication date: 01/01/2012
Field of study

In the past few decades, researchers have proposed many discriminant analysis (DA) algorithms for the study of high-dimensional data in a variety of problems. Most DA algorithms for feature extraction are based on transformations that simultaneously maximize the between-class scatter and minimize the withinclass scatter matrices. This paper presents a novel DA algorithm for feature extraction using mutual information (MI). However, it is not always easy to obtain an accurate estimation for high-dimensional MI. In this paper, we propose an efficient method for feature extraction that is based on one-dimensional MI estimations. We will refer to this algorithm as mutual information discriminant analysis (MIDA). The performance of this proposed method was evaluated using UCI databases. The results indicate that MIDA provides robust performance over different data sets with different characteristics and that MIDA always performs better than, or at least comparable to, the best performing algorithms.Comment: 13pages, 3 tables, International Journal of Artificial Intelligence & Application

arXiv.org e-Print Archive

CiteSeerX