373,794 research outputs found

    The Extended Edit Distance Metric

    Full text link
    Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high dimensional data objects. We propose a new distance metric that is applied to symbolic data objects and we test it on time series data bases in a classification task. We compare it to other distances that are well known in the literature for symbolic data objects. We also prove, mathematically, that our distance is metric.Comment: Technical repor

    SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance

    Get PDF
    Machine Learning algorithms have been widely used to solve various kinds of data classification problems. Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicated and computationally expensive, especially when the number of possible different combinations of variables is so high. Support Vector Machine (SVM) has been proven to perform much better when dealing with high dimensional datasets and numerical features. Although SVM works well with default value, the performance of SVM can be improved significantly using parameter optimization. We applied two methods which are Grid Search and Genetic Algorithm (GA) to optimize the SVM parameters. Our experiment showed that SVM parameter optimization using grid search always finds near optimal parameter combination within the given ranges. However, grid search was very slow; therefore it was very reliable only in low dimensional datasets with few parameters. SVM parameter optimization using GA can be used to solve the problem of grid search. GA has proven to be more stable than grid search. Based on average running time on 9 datasets, GA was almost 16 times faster than grid search. Futhermore, the GAā€™s results were slighlty better than the grid search in 8 of 9 datasets

    Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

    Get PDF
    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins
    • ā€¦
    corecore