6 research outputs found

    Adaptive feature selection based on the most informative graph-based features

    Get PDF
    In this paper, we propose a novel method to adaptively select the most informative and least redundant feature subset, which has strong discriminating power with respect to the target label. Unlike most traditional methods using vectorial features, our proposed approach is based on graph-based features and thus incorporates the relationships between feature samples into the feature selection process. To efficiently encapsulate the main characteristics of the graph-based features, we probe each graph structure using the steady state random walk and compute a probability distribution of the walk visiting the vertices. Furthermore, we propose a new information theoretic criterion to measure the joint relevance of different pairwise feature combinations with respect to the target feature, through the Jensen-Shannon divergence measure between the probability distributions from the random walk on different graphs. By solving a quadratic programming problem, we use the new measure to automatically locate the subset of the most informative features, that have both low redundancy and strong discriminating power. Unlike most existing state-of-the-art feature selection methods, the proposed information theoretic feature selection method can accommodate both continuous and discrete target features. Experiments on the problem of P2P lending platforms in China demonstrate the effectiveness of the proposed method

    Minimising Entropy Changes in Dynamic Network Evolution

    Get PDF

    Graphs from features: tree-based graph layout for feature analysis

    Get PDF
    Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set

    Feature Selection from Higher Order Correlations

    Get PDF
    This thesis addresses the problems in feature selection, particularly focusing on selecting features from higher order correlations. To this end, we present two supervised feature selection approaches named \emph{Graph based Information-theoretic Feature Selection} and \emph{Hypergraph based Information-theoretic Feature Selection} respectively, which are capable of considering third or even higher order dependencies between the relevant features and capturing the optimal size of relevant feature subset. Furthermore, we develop two unsupervised feature selection methods which can evaluate features jointly rather than individually. In this case, larger feature combinations are considered. The reason for this is that although an individual feature may have limited relevance to a particular class, when taken in combination with other features it may be strongly relevant to the class. In Chapter 22, we thoroughly review the relevant literature of the classifier independent (filter-based) feature selection methods. One dominant direction of research in this area is exemplified by the so-called information theoretic feature selection criteria, which is measuring the mutual dependence of two variables. Another influential direction is the graph-based feature selection methods, which are to select the features that best preserve the data similarity or a manifold structure derived from the entire feature set. We notice that most existing feature selection methods evaluate features individually or just simply consider pairwise feature interaction, and hence cannot handle redundant features. Another shortcoming of existing feature selection methods is that most of them select features in a greedy way and do not provide a direct measure to judge whether to add additional features or not. To deal with this problem, they require a user to supply the number of selected features in advance. However, in real applications, it is hard to estimate the number of useful features before the feature selection process. This thesis addresses these weaknesses, and fills a gap in the literature of selecting features from higher order correlations. In Chapter 33 we propose a graph based information-theoretic approach to feature selection. There are three novel ingredients. First, by incorporating mutual information (MI) for pairwise feature similarity measure, we establish a novel feature graph framework which is used for characterizing the informativeness between the pair of features. Secondly, we locate the relevant feature subset (RFS) from the feature graph by maximizing features' average pairwise relevance. The RFS is expected to have little redundancy and very strong discriminating power. This strategy reduces the optimal search space from the original feature set to the relatively smaller relevant feature subset, and thus enable an efficient computation. Finally, based on RFS, we evaluate the importance of unselected features by using a new information theoretic criterion referred to as the multidimensional interaction information (MII). The advantage of MII is that it can go beyond pairwise interaction and consider third or higher order feature interactions. As a result, we can evaluate features jointly, and thus avoid the redundancies arising in individual feature combinations. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets. In Chapter 44, we find that in some situations the graph representation for relational patterns can lead to substantial loss of information. This is because in real-world problems objects and their features tend to exhibit multiple relationships rather than simple pairwise ones. This motive us to establish a feature hypergraph (rather than feature graph) to characterize the multiple relationships among features. We draw on recent work on hyper-graph clustering to select the most informative feature subset (mIFS) from a set of objects using high-order (rather than pairwise) similarities. There are two novel ingredients. First, we use MII to measure the significance of different feature combinations with respect to the class labels. Secondly, we use hypergraph clustering to select the most informative feature subset (mIFS), which has both low redundancy and strong discriminating power. The advantage of MII is that it incorporates third or higher order feature interactions. Hypergraph clustering, which extracts the most informative features. The size of the most informative feature subset (mIFS) is determined automatically. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets. In addition to the supervised feature selection methods, we present two novel unsupervised feature selection methods in Chapter 55 and Chapter 66. Specifically, we propose a new two-step spectral regression technique for unsupervised feature selection in Chapter 55. In the first step, we use kernel entropy component analysis (kECA) to transform the data into a lower-dimensional space so as to improve class separation. Second, we use â„“1\ell_{1}-norm regularization to select the features that best align with the data embedding resulting from kECA. The advantage of kECA is that dimensionality reducing data transformation maximally preserves entropy estimates for the input data whilst also best preserving the cluster structure of the data. Using â„“1\ell_{1}-norm regularization, we cast feature discriminant analysis into a regression framework which accommodates the correlations among features. As a result, we can evaluate joint feature combinations, rather than being confined to consider them individually. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard face data-sets. In Chapter 66, by incorporating MII for higher order similarities measure, we establish a novel hypergraph framework which is used for characterizing the multiple relationships within a set of samples (e.g. face samples under varying illumination conditions). Thus, the structural information latent in the data can be more effectively modeled. We then explore a strategy to select the discriminating feature subset on the basis of the hypergraph representation. The strategy is based on an unsupervised method which derive the hypergraph embedding view of feature selection. We develop the strategy based on a number of standard image datasets, and the results demonstrate the effectiveness of our feature selection method. We summarize the contributions of this thesis in Chapter 77, and analyze the developed methods. Finally, we give some suggestions to the future work in feature selection
    corecore