1,147,809 research outputs found

    New Method for Optimal Feature Set Reduction

    Get PDF
    A problem of searching a minimum-size feature set to use in distribution of multidimensional objects in classes, for instance with the help of classifying trees, is considered. It has an important value in developing high speed and accuracy classifying systems. A short comparative review of existing approaches is given. Formally, the problem is formulated as finding a minimum-size (minimum weighted sum) covering set of discriminating 0,1-matrix, which is used to represent capabilities of the features to distinguish between each pair of objects belonging to different classes. There is given a way to build a discriminating 0,1-matrix. On the basis of the common solving principle, called the group resolution principle, the following problems are formulated and solved: finding an exact minimum-size feature set; finding a feature set with minimum total weight among all the minimum-size feature sets (the feature weights may be defined by the known methods, e.g. the RELIEF method and its modifications); finding an optimal feature set with respect to fuzzy data and discriminating matrix elements belonging to diapason [0,1]; finding statistically optimal solution especially in the case of big data. Statistically optimal algorithm makes it possible to restrict computational time by a polynomial of the problem sizes and density of units in discriminating matrix and provides a probability of finding an exact solution close to 1. Thus, the paper suggests a common approach to finding a minimum-size feature set with peculiarities in problem formulation, which differs it from the known approaches. The paper contains a lot of illustrations for clarification aims. Some theoretical statements given in the paper are based on the previously published works. In the concluding part, the results of the experiments are presented, as well as the information on dimensionality reduction for the coverage problem for big datasets. Some promising directions of the outlined approach are noted, including working with incomplete and categorical data, integrating the control model into the data classification system

    A transfer learning-based feature reduction method to improve classification accuracy

    Get PDF
    The need for efficient data use grows in machine learning algorithm for dataset with larger feature sets. Feature selection is the process of selecting minimum set of features that fully represent the learning problem. Transfer learning can motivate in scenario where we train model with the common problem and use it to identify important features needed to build model for target problem. In this thesis, we propose transfer learning algorithm combined with or without suggested features from experts, to learn from the source dataset and recognize important feature sets needed to train models in target dataset. Also, we compared this algorithm with classical machine learning algorithm with or without using the suggested features recommended by the experts. In series of experiment, it shows that our method is adequate to find the minimum feature sets which also outperformed then using only the suggested features by the experts. Furthermore, it also shows that the subsequent reduce in number of features in transfer learning method have better or almost same performance then using all the features of the dataset. We performed our experiments using heart disease, readmission dataset and BMI dataset

    Robust Branch-Cut-and-Price for the Capacitated Minimum Spanning Tree Problem over a Large Extended Formulation

    Get PDF
    This paper presents a robust branch-cut-and-price algorithm for the Capacitated Minimum Spanning Tree Problem (CMST). The variables are associated to q-arbs, a structure that arises from a relaxation of the capacitated prize-collecting arbores- cence problem in order to make it solvable in pseudo-polynomial time. Traditional inequalities over the arc formulation, like Capacity Cuts, are also used. Moreover, a novel feature is introduced in such kind of algorithms. Powerful new cuts expressed over a very large set of variables could be added, without increasing the complexity of the pricing subproblem or the size of the LPs that are actually solved. Computational results on benchmark instances from the OR-Library show very signi¯cant improvements over previous algorithms. Several open instances could be solved to optimalityNo keywords;

    Polynomial-Time Fence Insertion for Structured Programs

    Get PDF
    To enhance performance, common processors feature relaxed memory models that reorder instructions. However, the correctness of concurrent programs is often dependent on the preservation of the program order of certain instructions. Thus, the instruction set architectures offer memory fences. Using fences is a subtle task with performance and correctness implications: using too few can compromise correctness and using too many can hinder performance. Thus, fence insertion algorithms that given the required program orders can automatically find the optimum fencing can enhance the ease of programming, reliability, and performance of concurrent programs. In this paper, we consider the class of programs with structured branch and loop statements and present a greedy and polynomial-time optimum fence insertion algorithm. The algorithm incrementally reduces fence insertion for a control-flow graph to fence insertion for a set of paths. In addition, we show that the minimum fence insertion problem with multiple types of fence instructions is NP-hard even for straight-line programs

    Kernel Ellipsoidal Trimming

    No full text
    Ellipsoid estimation is an issue of primary importance in many practical areas such as control, system identification, visual/audio tracking, experimental design, data mining, robust statistics and novelty/outlier detection. This paper presents a new method of kernel information matrix ellipsoid estimation (KIMEE) that finds an ellipsoid in a kernel defined feature space based on a centered information matrix. Although the method is very general and can be applied to many of the aforementioned problems, the main focus in this paper is the problem of novelty or outlier detection associated with fault detection. A simple iterative algorithm based on Titterington's minimum volume ellipsoid method is proposed for practical implementation. The KIMEE method demonstrates very good performance on a set of real-life and simulated datasets compared with support vector machine methods
    corecore