10,023 research outputs found

    A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

    Get PDF
    kk Nearest Neighbors (kkNN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based kkNN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an RR-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new kkNN algorithm and its improvements to other version of kkNN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional kkNN algorithm, the proposed manifold version kkNN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

    Hierarchical Multi-resolution Mesh Networks for Brain Decoding

    Full text link
    We propose a new framework, called Hierarchical Multi-resolution Mesh Networks (HMMNs), which establishes a set of brain networks at multiple time resolutions of fMRI signal to represent the underlying cognitive process. The suggested framework, first, decomposes the fMRI signal into various frequency subbands using wavelet transforms. Then, a brain network, called mesh network, is formed at each subband by ensembling a set of local meshes. The locality around each anatomic region is defined with respect to a neighborhood system based on functional connectivity. The arc weights of a mesh are estimated by ridge regression formed among the average region time series. In the final step, the adjacency matrices of mesh networks obtained at different subbands are ensembled for brain decoding under a hierarchical learning architecture, called, fuzzy stacked generalization (FSG). Our results on Human Connectome Project task-fMRI dataset reflect that the suggested HMMN model can successfully discriminate tasks by extracting complementary information obtained from mesh arc weights of multiple subbands. We study the topological properties of the mesh networks at different resolutions using the network measures, namely, node degree, node strength, betweenness centrality and global efficiency; and investigate the connectivity of anatomic regions, during a cognitive task. We observe significant variations among the network topologies obtained for different subbands. We, also, analyze the diversity properties of classifier ensemble, trained by the mesh networks in multiple subbands and observe that the classifiers in the ensemble collaborate with each other to fuse the complementary information freed at each subband. We conclude that the fMRI data, recorded during a cognitive task, embed diverse information across the anatomic regions at each resolution.Comment: 18 page

    Deep Over-sampling Framework for Classifying Imbalanced Data

    Full text link
    Class imbalance is a challenging issue in practical classification problems for deep learning models as well as traditional models. Traditionally successful countermeasures such as synthetic over-sampling have had limited success with complex, structured data handled by deep learning models. In this paper, we propose Deep Over-sampling (DOS), a framework for extending the synthetic over-sampling method to exploit the deep feature space acquired by a convolutional neural network (CNN). Its key feature is an explicit, supervised representation learning, for which the training data presents each raw input sample with a synthetic embedding target in the deep feature space, which is sampled from the linear subspace of in-class neighbors. We implement an iterative process of training the CNN and updating the targets, which induces smaller in-class variance among the embeddings, to increase the discriminative power of the deep representation. We present an empirical study using public benchmarks, which shows that the DOS framework not only counteracts class imbalance better than the existing method, but also improves the performance of the CNN in the standard, balanced settings

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table

    k-nearest neighbors prediction and classification for spatial data

    Full text link
    We propose a nonparametric predictor and a supervised classification based on the regression function estimate of a spatial real variable using k-nearest neighbors method (k-NN). Under some assumptions, we establish almost complete or sure convergence of the proposed estimates which incorporate a spatial proximity between observations. Numerical results on simulated and real fish data illustrate the behavior of the given predictor and classification method
    corecore