104 research outputs found

    Rank-based Decomposable Losses in Machine Learning: A Survey

    Full text link
    Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    High Dimensional Covariance Estimation for Spatio-Temporal Processes

    Full text link
    High dimensional time series and array-valued data are ubiquitous in signal processing, machine learning, and science. Due to the additional (temporal) direction, the total dimensionality of the data is often extremely high, requiring large numbers of training examples to learn the distribution using unstructured techniques. However, due to difficulties in sampling, small population sizes, and/or rapid system changes in time, it is often the case that very few relevant training samples are available, necessitating the imposition of structure on the data if learning is to be done. The mean and covariance are useful tools to describe high dimensional distributions because (via the Gaussian likelihood function) they are a data-efficient way to describe a general multivariate distribution, and allow for simple inference, prediction, and regression via classical techniques. In this work, we develop various forms of multidimensional covariance structure that explicitly exploit the array structure of the data, in a way analogous to the widely used low rank modeling of the mean. This allows dramatic reductions in the number of training samples required, in some cases to a single training sample. Covariance models of this form have been increasing in interest recently, and statistical performance bounds for high dimensional estimation in sample-starved scenarios are of great relevance. This thesis focuses on the high-dimensional covariance estimation problem, exploiting spatio-temporal structure to reduce sample complexity. Contributions are made in the following areas: (1) development of a variety of rich Kronecker product-based covariance models allowing the exploitation of spatio-temporal and other structure with applications to sample-starved real data problems, (2) strong performance bounds for high-dimensional estimation of covariances under each model, and (3) a strongly adaptive online method for estimating changing optimal low-dimensional metrics (inverse covariances) for high-dimensional data from a series of similarity labels.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137082/1/greenewk_1.pd

    Algorithms for learning parsimonious context trees

    Get PDF
    Parsimonious context trees, PCTs, provide a sparse parameterization of conditional probability distributions. They are particularly powerful for modeling context-specific independencies in sequential discrete data. Learning PCTs from data is computationally hard due to the combinatorial explosion of the space of model structures as the number of predictor variables grows. Under the score-and-search paradigm, the fastest algorithm for finding an optimal PCT, prior to the present work, is based on dynamic programming. While the algorithm can handle small instances fast, it becomes infeasible already when there are half a dozen four-state predictor variables. Here, we show that common scoring functions enable the use of new algorithmic ideas, which can significantly expedite the dynamic programming algorithm on typical data. Specifically, we introduce a memoization technique, which exploits regularities within the predictor variables by equating different contexts associated with the same data subset, and a bound-and-prune technique, which exploits regularities within the response variable by pruning parts of the search space based on score upper bounds. On real-world data from recent applications of PCTs within computational biology the ideas are shown to reduce the traversed search space and the computation time by several orders of magnitude in typical cases.Peer reviewe

    Information Processing Equalities and the Information-Risk Bridge

    Full text link
    We introduce two new classes of measures of information for statistical experiments which generalise and subsume ϕ\phi-divergences, integral probability metrics, N\mathfrak{N}-distances (MMD), and (f,Γ)(f,\Gamma) divergences between two or more distributions. This enables us to derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem, thus extending the variational ϕ\phi-divergence representation to multiple distributions in an entirely symmetric manner. The new families of divergence are closed under the action of Markov operators which yields an information processing equality which is a refinement and generalisation of the classical data processing inequality. This equality gives insight into the significance of the choice of the hypothesis class in classical risk minimization.Comment: 48 pages; corrected some typos and added a few additional explanation
    • …
    corecore