46,478 research outputs found

    Multi-Task Kernel Null-Space for One-Class Classification

    Full text link
    The one-class kernel spectral regression (OC-KSR), the regression-based formulation of the kernel null-space approach has been found to be an effective Fisher criterion-based methodology for one-class classification (OCC), achieving state-of-the-art performance in one-class classification while providing relatively high robustness against data corruption. This work extends the OC-KSR methodology to a multi-task setting where multiple one-class problems share information for improved performance. By viewing the multi-task structure learning problem as one of compositional function learning, first, the OC-KSR method is extended to learn multiple tasks' structure \textit{linearly} by posing it as an instantiation of the separable kernel learning problem in a vector-valued reproducing kernel Hilbert space where an output kernel encodes tasks' structure while another kernel captures input similarities. Next, a non-linear structure learning mechanism is proposed which captures multiple tasks' relationships \textit{non-linearly} via an output kernel. The non-linear structure learning method is then extended to a sparse setting where different tasks compete in an output composition mechanism, leading to a sparse non-linear structure among multiple problems. Through extensive experiments on different data sets, the merits of the proposed multi-task kernel null-space techniques are verified against the baseline as well as other existing multi-task one-class learning techniques

    A Benchmark to Select Data Mining Based Classification Algorithms For Business Intelligence And Decision Support Systems

    Full text link
    DSS serve the management, operations, and planning levels of an organization and help to make decisions, which may be rapidly changing and not easily specified in advance. Data mining has a vital role to extract important information to help in decision making of a decision support system. Integration of data mining and decision support systems (DSS) can lead to the improved performance and can enable the tackling of new types of problems. Artificial Intelligence methods are improving the quality of decision support, and have become embedded in many applications ranges from ant locking automobile brakes to these days interactive search engines. It provides various machine learning techniques to support data mining. The classification is one of the main and valuable tasks of data mining. Several types of classification algorithms have been suggested, tested and compared to determine the future trends based on unseen data. There has been no single algorithm found to be superior over all others for all data sets. The objective of this paper is to compare various classification algorithms that have been frequently used in data mining for decision support systems. Three decision trees based algorithms, one artificial neural network, one statistical, one support vector machines with and without ada boost and one clustering algorithm are tested and compared on four data sets from different domains in terms of predictive accuracy, error rate, classification index, comprehensibility and training time. Experimental results demonstrate that Genetic Algorithm (GA) and support vector machines based algorithms are better in terms of predictive accuracy. SVM without adaboost shall be the first choice in context of speed and predictive accuracy. Adaboost improves the accuracy of SVM but on the cost of large training time.Comment: 18 Pages, 11 Figures, 6 Tables, Journa

    High Dimensional Linear Regression using Lattice Basis Reduction

    Full text link
    We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector ÎČ∗\beta^* from nn noisy linear observations Y=XÎČ∗+W∈RnY=X\beta^*+W \in \mathbb{R}^n, for known X∈Rn×pX \in \mathbb{R}^{n \times p} and unknown W∈RnW \in \mathbb{R}^n. Unlike most of the literature on this model we make no sparsity assumption on ÎČ∗\beta^*. Instead we adopt a regularization based on assuming that the underlying vectors ÎČ∗\beta^* have rational entries with the same denominator Q∈Z>0Q \in \mathbb{Z}_{>0}. We call this QQ-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the QQ-rationality assumption, our algorithm recovers exactly the vector ÎČ∗\beta^* for a large class of distributions for the iid entries of XX and non-zero noise WW. We prove that it is successful under small noise, even when the learner has access to only one observation (n=1n=1). Furthermore, we prove that in the case of the Gaussian white noise for WW, n=o(p/log⁥p)n=o\left(p/\log p\right) and QQ sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise

    Online Hyperparameter-Free Sparse Estimation Method

    Full text link
    In this paper we derive an online estimator for sparse parameter vectors which, unlike the LASSO approach, does not require the tuning of any hyperparameters. The algorithm is based on a covariance matching approach and is equivalent to a weighted version of the square-root LASSO. The computational complexity of the estimator is of the same order as that of the online versions of regularized least-squares (RLS) and LASSO. We provide a numerical comparison with feasible and infeasible implementations of the LASSO and RLS to illustrate the advantage of the proposed online hyperparameter-free estimator

    Predicting the Future Behavior of a Time-Varying Probability Distribution

    Full text link
    We study the problem of predicting the future, though only in the probabilistic sense of estimating a future state of a time-varying probability distribution. This is not only an interesting academic problem, but solving this extrapolation problem also has many practical application, e.g. for training classifiers that have to operate under time-varying conditions. Our main contribution is a method for predicting the next step of the time-varying distribution from a given sequence of sample sets from earlier time steps. For this we rely on two recent machine learning techniques: embedding probability distributions into a reproducing kernel Hilbert space, and learning operators by vector-valued regression. We illustrate the working principles and the practical usefulness of our method by experiments on synthetic and real data. We also highlight an exemplary application: training a classifier in a domain adaptation setting without having access to examples from the test time distribution at training time

    Kernels on Sample Sets via Nonparametric Divergence Estimates

    Full text link
    Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach.Comment: Substantially updated version as submitted to T-PAMI. 15 pages including appendi

    Distance-based analysis of variance: approximate inference and an application to genome-wide association studies

    Full text link
    In several modern applications, ranging from genetics to genomics and neuroimaging, there is a need to compare observations across different populations, such as groups of healthy and diseased individuals. The interest is in detecting a group effect. When the observations are vectorial, real-valued and follow a multivariate Normal distribution, multivariate analysis of variance (MANOVA) tests are routinely applied. However, such traditional procedures are not suitable when dealing with more complex data structures such as functional (e.g. curves) or graph-structured (e.g. trees and networks) objects, where the required distributional assumptions may be violated. In this paper we discuss a distance-based MANOVA-like approach, the DBF test, for detecting differences between groups for a wider range of data types. The test statistic, analogously to other distance-based statistics, only relies on a suitably chosen distance measure that captures the pairwise dissimilarity among all available samples. An approximate null probability distribution of the DBF statistic is proposed thus allowing inferences to be drawn without the need for costly permutation procedures. Through extensive simulations we provide evidence that the proposed methodology works well for a range of data types and distances, and generalizes the traditional MANOVA tests. We also report on an application of the proposed methodology for the analysis of a multi-locus genome-wide association study of Alzheimer's disease, which has been carried out using several genetic distance measures

    Vector-Valued Graph Trend Filtering with Non-Convex Penalties

    Full text link
    This work studies the denoising of piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness over a graph, where the value at each node can be vector-valued. We extend the graph trend filtering framework to denoising vector-valued graph signals with a family of non-convex regularizers, which exhibit superior recovery performance over existing convex regularizers. Using an oracle inequality, we establish the statistical error rates of first-order stationary points of the proposed non-convex method for generic graphs. Furthermore, we present an ADMM-based algorithm to solve the proposed method and establish its convergence. Numerical experiments are conducted on both synthetic and real-world data for denoising, support recovery, event detection, and semi-supervised classification.Comment: The first two authors contributed equall

    A Unified SVM Framework for Signal Estimation

    Full text link
    This paper presents a unified framework to tackle estimation problems in Digital Signal Processing (DSP) using Support Vector Machines (SVMs). The use of SVMs in estimation problems has been traditionally limited to its mere use as a black-box model. Noting such limitations in the literature, we take advantage of several properties of Mercer's kernels and functional analysis to develop a family of SVM methods for estimation in DSP. Three types of signal model equations are analyzed. First, when a specific time-signal structure is assumed to model the underlying system that generated the data, the linear signal model (so called Primal Signal Model formulation) is first stated and analyzed. Then, non-linear versions of the signal structure can be readily developed by following two different approaches. On the one hand, the signal model equation is written in reproducing kernel Hilbert spaces (RKHS) using the well-known RKHS Signal Model formulation, and Mercer's kernels are readily used in SVM non-linear algorithms. On the other hand, in the alternative and not so common Dual Signal Model formulation, a signal expansion is made by using an auxiliary signal model equation given by a non-linear regression of each time instant in the observed time series. These building blocks can be used to generate different novel SVM-based methods for problems of signal estimation, and we deal with several of the most important ones in DSP. We illustrate the usefulness of this methodology by defining SVM algorithms for linear and non-linear system identification, spectral analysis, nonuniform interpolation, sparse deconvolution, and array processing. The performance of the developed SVM methods is compared to standard approaches in all these settings. The experimental results illustrate the generality, simplicity, and capabilities of the proposed SVM framework for DSP.Comment: 22 pages, 13 figures. Digital Signal Processing, 201

    Multi-view Vector-valued Manifold Regularization for Multi-label Image Classification

    Full text link
    In computer vision, image datasets used for classification are naturally associated with multiple labels and comprised of multiple views, because each image may contain several objects (e.g. pedestrian, bicycle and tree) and is properly characterized by multiple visual features (e.g. color, texture and shape). Currently available tools ignore either the label relationship or the view complementary. Motivated by the success of the vector-valued function that constructs matrix-valued kernels to explore the multi-label structure in the output space, we introduce multi-view vector-valued manifold regularization (MV3\mathbf{^3}MR) to integrate multiple features. MV3\mathbf{^3}MR exploits the complementary property of different features and discovers the intrinsic local geometry of the compact support shared by different features under the theme of manifold regularization. We conducted extensive experiments on two challenging, but popular datasets, PASCAL VOC' 07 (VOC) and MIR Flickr (MIR), and validated the effectiveness of the proposed MV3\mathbf{^3}MR for image classification
    • 

    corecore