27,633 research outputs found

    Estimation of Markov Chain via Rank-Constrained Likelihood

    Full text link
    This paper studies the estimation of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the â„“2\ell_2 risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank-constrained non-smooth optimization problem. Convergence results are established. Experiments show that the proposed estimator achieves better empirical performance than other popular approaches.Comment: Accepted at ICML 201

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    Fast and Robust Rank Aggregation against Model Misspecification

    Full text link
    In rank aggregation, preferences from different users are summarized into a total order under the homogeneous data assumption. Thus, model misspecification arises and rank aggregation methods take some noise models into account. However, they all rely on certain noise model assumptions and cannot handle agnostic noises in the real world. In this paper, we propose CoarsenRank, which rectifies the underlying data distribution directly and aligns it to the homogeneous data assumption without involving any noise model. To this end, we define a neighborhood of the data distribution over which Bayesian inference of CoarsenRank is performed, and therefore the resultant posterior enjoys robustness against model misspecification. Further, we derive a tractable closed-form solution for CoarsenRank making it computationally efficient. Experiments on real-world datasets show that CoarsenRank is fast and robust, achieving consistent improvement over baseline methods

    Generalised Mixability, Constant Regret, and Bayesian Updating

    Full text link
    Mixability of a loss is known to characterise when constant regret bounds are achievable in games of prediction with expert advice through the use of Vovk's aggregating algorithm. We provide a new interpretation of mixability via convex analysis that highlights the role of the Kullback-Leibler divergence in its definition. This naturally generalises to what we call Φ\Phi-mixability where the Bregman divergence DΦD_\Phi replaces the KL divergence. We prove that losses that are Φ\Phi-mixable also enjoy constant regret bounds via a generalised aggregating algorithm that is similar to mirror descent.Comment: 12 page

    Generalized Mixability via Entropic Duality

    Full text link
    Mixability is a property of a loss which characterizes when fast convergence is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the exp and log operations present in the usual theory are not as special as one might have thought. In doing this we introduce a more general notion of Φ\Phi-mixability where Φ\Phi is a general entropy (\ie, any convex function on probabilities). We show how a property shared by the convex dual of any such entropy yields a natural algorithm (the minimizer of a regret bound) which, analogous to the classical aggregating algorithm, is guaranteed a constant regret when used with Φ\Phi-mixable losses. We characterize precisely which Φ\Phi have Φ\Phi-mixable losses and put forward a number of conjectures about the optimality and relationships between different choices of entropy.Comment: 20 pages, 1 figure. Supersedes the work in arXiv:1403.2433 [cs.LG

    Representation learning for very short texts using weighted word embedding aggregation

    Full text link
    Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.Comment: 8 pages, 3 figures, 2 tables, appears in Pattern Recognition Letter
    • …
    corecore