757,612 research outputs found

    Feature Selection Based on Sequential Orthogonal Search Strategy

    Get PDF
    This thesis introduces three new feature selection methods based on sequential orthogonal search strategy that addresses three different contexts of feature selection problem being considered. The first method is a supervised feature selection called the maximum relevance–minimum multicollinearity (MRmMC), which can overcome some shortcomings associated with existing methods that apply the same form of feature selection criterion, especially those that are based on mutual information. In the proposed method, relevant features are measured by correlation characteristics based on conditional variance while redundancy elimination is achieved according to multiple correlation assessment using an orthogonal projection scheme. The second method is an unsupervised feature selection based on Locality Preserving Projection (LPP), which is incorporated in a sequential orthogonal search (SOS) strategy. Locality preserving criterion has been proved a successful measure to evaluate feature importance in many feature selection methods but most of which ignore feature correlation and this means these methods ignore redundant features. This problem has motivated the introduction of the second method that evaluates feature importance jointly rather than individually. In the method, the first LPP component which contains the information of local largest structure (LLS) is utilized as a reference variable to guide the search for significant features. This method is referred to as sequential orthogonal search for local largest structure (SOS-LLS). The third method is also an unsupervised feature selection with essentially the same SOS strategy but it is specifically designed to be robust on noisy data. As limited work has been reported concerning feature selection in the presence of attribute noise, the third method is thus attempts to make an effort towards this scarcity by further exploring the second proposed method. The third method is designed to deal with attribute noise in the search for significant features, and kernel pre-images (KPI) based on kernel PCA are used in the third method to replace the role of the first LPP component as the reference variable used in the second method. This feature selection scheme is referred to as sequential orthogonal search for kernel pre-images (SOS-KPI) method. The performance of these three feature selection methods are demonstrated based on some comprehensive analysis on public real datasets of different characteristics and comparative studies with a number of state-of-the-art methods. Results show that each of the proposed methods has the capacity to select more efficient feature subsets than the other feature selection methods in the comparative studies

    Adaptive sequential feature selection in visual perception and pattern recognition

    Get PDF
    In the human visual system, one of the most prominent functions of the extensive feedback from the higher brain areas within and outside of the visual cortex is attentional modulation. The feedback helps the brain to concentrate its resources on visual features that are relevant for recognition, i. e. it iteratively selects certain aspects of the visual scene for refined processing by the lower areas until the inference process in the higher areas converges to a single hypothesis about this scene. In order to minimize a number of required selection-refinement iterations, one has to find a short sequence of maximally informative portions of the visual input. Since the feedback is not static, the selection process is adapted to a scene that should be recognized. To find a scene-specific subset of informative features, the adaptive selection process on every iteration utilizes results of previous processing in order to reduce the remaining uncertainty about the visual scene. This phenomenon inspired us to develop a computational algorithm solving a visual classification task that would incorporate such principle, adaptive feature selection. It is especially interesting because usually feature selection methods are not adaptive as they define a unique set of informative features for a task and use them for classifying all objects. However, an adaptive algorithm selects features that are the most informative for the particular input. Thus, the selection process should be driven by statistics of the environment concerning the current task and the object to be classified. Applied to a classification task, our adaptive feature selection algorithm favors features that maximally reduce the current class uncertainty, which is iteratively updated with values of the previously selected features that are observed on the testing sample. In information-theoretical terms, the selection criterion is the mutual information of a class variable and a feature-candidate conditioned on the already selected features, which take values observed on the current testing sample. Then, the main question investigated in this thesis is whether the proposed adaptive way of selecting features is advantageous over the conventional feature selection and in which situations. Further, we studied whether the proposed adaptive information-theoretical selection scheme, which is a computationally complex algorithm, is utilized by humans while they perform a visual classification task. For this, we constructed a psychophysical experiment where people had to select image parts that as they think are relevant for classification of these images. We present the analysis of behavioral data where we investigate whether human strategies of task-dependent selective attention can be explained by a simple ranker based on the mutual information, a more complex feature selection algorithm based on the conventional static mutual information and the proposed here adaptive feature selector that mimics a mechanism of the iterative hypothesis refinement. Hereby, the main contribution of this work is the adaptive feature selection criterion based on the conditional mutual information. Also it is shown that such adaptive selection strategy is indeed used by people while performing visual classification.:1. Introduction 2. Conventional feature selection 3. Adaptive feature selection 4. Experimental investigations of ACMIFS 5. Information-theoretical strategies of selective attention 6. Discussion Appendix Bibliograph

    Optimisation based approaches for machine learning

    Get PDF
    Machine learning has attracted a lot of attention in recent years and it has become an integral part of many commercial and research projects, with a wide range of applications. With current developments in technology, more data is generated and stored than ever before. Identifying patterns, trends and anomalies in these datasets and summarising them with simple quantitative models is a vital task. This thesis focuses on the development of machine learning algorithms based on mathematical programming for datasets that are relatively small in size. The first topic of this doctoral thesis is piecewise regression, where a dataset is partitioned into multiple regions and a regression model is fitted to each one. This work uses an existing algorithm from the literature and extends the mathematical formulation in order to include information criteria. The inclusion of such criteria targets to deal with overfitting, which is a common problem in supervised learning tasks, by finding a balance between predictive performance and model complexity. The improvement in overall performance is demonstrated by testing and comparing the proposed method with various algorithms from the literature on various regression datasets. Extending the topic of regression, a decision tree regressor is also proposed. Decision trees are powerful and easy to understand structures that can be used both for regression and classification. In this work, an optimisation model is used for the binary splitting of nodes. A statistical test is introduced to check whether the partitioning of nodes is statistically meaningful and as a result control the tree generation process. Additionally, a novel mathematical formulation is proposed to perform feature selection and ultimately identify the appropriate variable to be selected for the splitting of nodes. The performance of the proposed algorithm is once again compared with a number of literature algorithms and it is shown that the introduction of the variable selection model is useful for reducing the training time of the algorithm without major sacrifices in performance. Lastly, a novel decision tree classifier is proposed. This algorithm is based on a mathematical formulation that identifies the optimal splitting variable and break value, applies a linear transformation to the data and then assigns them to a class while minimising the number of misclassified samples. The introduction of the linear transformation step reduces the dimensionality of the examined dataset down to a single variable, aiding the classification accuracy of the algorithm for more complex datasets. Popular classifiers from the literature have been used to compare the accuracy of the proposed algorithm on both synthetic and publicly available classification datasets

    Quantum-Inspired Particle Swarm Optimization for Feature Selection and Parameter Optimization in Evolving Spiking Neural Networks for Classification Tasks

    Get PDF
    Introduction: Particle Swarm Optimization (PSO) was introduced in 1995 by Russell Eberhart and James Kennedy (Eberhart & Kennedy, 1995). PSO is a biologically-inspired technique based around the study of collective behaviour in decentralized and self-organized animal society systems. The systems are typically made up from a population of candidates (particles) interacting with one another within their environment (swarm) to solve a given problem. Because of its efficiency and simplicity, PSO has been successfully applied as an optimizer in many applications such as function optimization, artificial neural network training, fuzzy system control. However, despite recent research and development, there is an opportunity to find the most effective methods for parameter optimization and feature selection tasks. This chapter deals with the problem of feature (variable) and parameter optimization for neural network models, utilising a proposed Quantum–inspired PSO (QiPSO) method. In this method the features of the model are represented probabilistically as a quantum bit (qubit) vector and the model parameter values as real numbers. The principles of quantum superposition and quantum probability are used to accelerate the search for an optimal set of features, that combined through co-evolution with a set of optimised parameter values, will result in a more accurate computational neural network model. The method has been applied to the problem of feature and parameter optimization in Evolving Spiking Neural Network (ESNN) for classification. A swarm of particles is used to find the most accurate classification model for a given classification task. The QiPSO will be integrated within ESNN where features and parameters are simultaneously and more efficiently optimized. A hybrid particle structure is required for the qubit and real number data types. In addition, an improved search strategy has been introduced to find the most relevant and eliminate the irrelevant features on a synthetic dataset. The method is tested on a benchmark classification problem. The proposed method results in the design of faster and more accurate neural network classification models than the ones optimised through the use of standard evolutionary optimization algorithms. This chapter is organized as follows. Section 2 introduces PSO with quantum information principles and an improved feature search strategy used later in the developed method. Section 3 is an overview of ESNN, while Section 4 gives details of the integrated structure and the experimental results. Finally, Section 5 concludes this chapter

    Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

    Get PDF
    Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space

    Feature and Variable Selection in Classification

    Full text link
    The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not lend themselves to interpretable results, and the CPU and memory resources necessary to run on high-dimensional datasets severly limit the applications of the approaches. Variable and feature selection aim to remedy this by finding a subset of features that in some way captures the information provided best. In this paper we present the general methodology and highlight some specific approaches.Comment: Part of master seminar in document analysis held by Marcus Eichenberger-Liwick

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Computer-Aided Detection of Pathologically Enlarged Lymph Nodes On Non-Contrast CT In Cervical Cancer Patients For Low-Resource Settings

    Get PDF
    The mortality rate of cervical cancer is approximately 266,000 people each year, and 70% of the burden occurs in Low- and Middle- Income Countries (LMICs). Radiation therapy is the primary modality for treatment of locally advanced cervical cancer cases. In the absence of high quality diagnostic imaging needed to identify nodal metastasis, many LMIC sites treat standard pelvic fields, failing to include node metastasis outside of the field and/or to boost lymph nodes in the abdomen and pelvis. The first goal of this project was to create a program which automatically identifies positive cervical cancer lymph nodes on non-contrast daily CT images, which are widely available in LMICs(1). A region of interest which is likely to contain the nodal volumes relevant for cervical cancer was defined on a single patient CT(2). This region was deformed onto new patients using an in-house, demons-based deformation software. Edge detection and erosion filtering were used to distinguish potential positive nodes from normal structures. Regions on adjacent slices were then connected into a potential nodal 3D-structure. To differentiate these 3D structures from normal tissues, eighty-six features were generated based on the shape and mean pixel values of the structures, and four classification ensemble methods were tested to differentiate the positive nodes from normal tissues. A cohort of fifty-eight MD Anderson cervical cancer patients with pathologically enlarged lymph nodes were used as a training-test set. Similarly, twenty MD Anderson cervical cancer patients were obtained as a validation set. They contained 154 and 35 pathologically enlarged lymph nodes, respectively. Model comparison led to the selection of the Adaboost ensemble model, utilizing 17 features. In the validation set, 60% of the clinically significant positive cervical cancer nodes were identified along with a false/true positive ratio of ~4:1. The entire process takes approximately 10/number-of-cores-minutes. Our findings demonstrated that our computer-aided detection model can assist in the identification of metastatic nodal disease where high quality diagnostic imaging is not readily available. By identifying these nodes, radiation treatment fields can be modified to include pathologically enlarged lymph nodes, which is an essential element to providing potentially curative radiotherapy for cervical cancer
    • …
    corecore