34 research outputs found

    Improving supervised music classification by means of multi-objective evolutionary feature selection

    Get PDF
    In this work, several strategies are developed to reduce the impact of the two limitations of most current studies in supervised music classification: the classification rules and music features have often a low interpretability, and the evaluation of algorithms and feature subsets is almost always done with respect to only one or a few common evaluation criteria separately. Although music classification is in most cases user-centered and it is desired to understand well the properties of related music categories, many current approaches are based on low-level characteristics of the audio signal. We have designed a large set of more meaningful and interpretable high-level features, which may completely replace the baseline low-level feature set and are even capable to significantly outperform it for the categorisation into three music styles. These features provide a comprehensible insight into the properties of music genres and styles: instrumentation, moods, harmony, temporal, and melodic characteristics. A crucial advantage of audio high-level features is that they can be extracted from any digitally available music piece, independently of its popularity, availability of the corresponding score, or the Internet connection for the download of the metadata and community features, which are sometimes erroneous and incomplete. A part of high-level features, which are particularly successful for classification into genres and styles, has been developed based on the novel approach called sliding feature selection. Here, high-level features are estimated from low-level and other high-level ones during a sequence of supervised classification steps, and an integrated evolutionary feature selection helps to search for the most relevant features in each step of this sequence. Another drawback of many related state-of-the-art studies is that the algorithms and feature sets are almost always compared using only one or a few evaluation criteria separately. However, different evaluation criteria are often in conflict: an algorithm optimised only with respect to classification quality may be slow, have high storage demands, perform worse on imbalanced data, or require high user efforts for labelling of songs. The simultaneous optimisation of multiple conflicting criteria remains until now almost unexplored in music information retrieval, and it was applied for feature selection in music classification for the first time in this thesis, except for several preliminary own publications. As an exemplarily multi-objective approach for optimisation of feature selection, we simultaneously minimise the classification error and the number of features used for classification. The sets with more features lead to a higher classification quality. On the other side, the sets with fewer features and a lower classification performance may help to strongly decrease the demands for storage and computing time and to reduce the risk of too complex and overfitted classification models. Further, we describe several groups of evaluation criteria and discuss other reasonable multi-objective optimisation scenarios for music data analysis

    Evolution strategies based coefficient of TSK fuzzy forecasting engine

    Get PDF
    Forecasting is a method of predicting past and current data, most often by pattern analysis. A Fuzzy Takagi Sugeno Kang (TSK) study can predict Indonesia's inflation rate, yet with too high error. This study proposes an accuracy improvement based on Evolution Strategies (ES), a specific evolutionary algorithm with good performance optimization problems. ES algorithm used to determine the best coefficient values on consequent fuzzy rules. This research uses Bank Indonesia time-series data as in the previous study. ES algorithm uses the popSize test to determine the number of initial chromosomes to produce the best optimal solution for this problem. The increase of popSize creates better fitness value due to the ES's broader search area. The RMSE of ES-TSK is 0.637, which outperforms the baseline approach. This research generally shows that ES may reduce repetitive experiment events due to Fuzzy coefficients' manual setting. The algorithm complexity may cost to the computing time, yet with higher performance

    Integrated bio-search approaches with multi-objective algorithms for optimization and classification problem

    Get PDF
    Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features

    Adversarial Sample Generation using the Euclidean Jacobian-based Saliency Map Attack (EJSMA) and Classification for IEEE 802.11 using the Deep Deterministic Policy Gradient (DDPG)

    Get PDF
    One of today's most promising developments is wireless networking, as it enables people across the globe to stay connected. As the wireless networks' transmission medium is open, there are potential issues in safeguarding the privacy of the information. Though several security protocols exist in the literature for the preservation of information, most cases fail with a simple spoof attack. So, intrusion detection systems are vital in wireless networks as they help in the identification of harmful traffic. One of the challenges that exist in wireless intrusion detection systems (WIDS) is finding a balance between accuracy and false alarm rate. The purpose of this study is to provide a practical classification scheme for newer forms of attack. The AWID dataset is used in the experiment, which proposes a feature selection strategy using a combination of Elastic Net and recursive feature elimination. The best feature subset is obtained with 22 features, and a deep deterministic policy gradient learning algorithm is then used to classify attacks based on those features. Samples are generated using the Euclidean Jacobian-based Saliency Map Attack (EJSMA) to evaluate classification outcomes using adversarial samples. The meta-analysis reveals improved results in terms of feature production (22 features), classification accuracy (98.75% for testing samples and 85.24% for adversarial samples), and false alarm rates (0.35%).&nbsp

    Probabilistic Value Selection for Space Efficient Model

    Get PDF
    An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.Comment: Accepted in the 21st IEEE International Conference on Mobile Data Management (July 2020

    Multivariate feature ranking of gene expression data

    Full text link
    Gene expression datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods tend to be not applicable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, so they do not detect interactions between factors. In this paper we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which we have applied in three gene expression classification problems. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance, as well as feature selection methods of attribute subset evaluation based on correlation and consistency with multi-objective evolutionary search strategy

    Hierarchical forecast reconciliation with machine learning

    Get PDF
    Hierarchical forecasting methods have been widely used to support aligned decision-making by providing coherent forecasts at different aggregation levels. Traditional hierarchical forecasting approaches, such as the bottom-up and top-down methods, focus on a particular aggregation level to anchor the forecasts. During the past decades, these have been replaced by a variety of linear combination approaches that exploit information from the complete hierarchy to produce more accurate forecasts. However, the performance of these combination methods depends on the particularities of the examined series and their relationships. This paper proposes a novel hierarchical forecasting approach based on machine learning that deals with these limitations in three important ways. First, the proposed method allows for a non-linear combination of the base forecasts, thus being more general than the linear approaches. Second, it structurally combines the objectives of improved post-sample empirical forecasting accuracy and coherence. Finally, due to its non-linear nature, our approach selectively combines the base forecasts in a direct and automated way without requiring that the complete information must be used for producing reconciled forecasts for each series and level. The proposed method is evaluated both in terms of accuracy and bias using two different data sets coming from the tourism and retail industries. Our results suggest that the proposed method gives superior point forecasts than existing approaches, especially when the series comprising the hierarchy are not characterized by the same patterns
    corecore