5,856 research outputs found

    Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation

    Full text link
    As opposed to manual feature engineering which is tedious and difficult to scale, network representation learning has attracted a surge of research interests as it automates the process of feature learning on graphs. The learned low-dimensional node vector representation is generalizable and eases the knowledge discovery process on graphs by enabling various off-the-shelf machine learning tools to be directly applied. Recent research has shown that the past decade of network embedding approaches either explicitly factorize a carefully designed matrix to obtain the low-dimensional node vector representation or are closely related to implicit matrix factorization, with the fundamental assumption that the factorized node connectivity matrix is low-rank. Nonetheless, the global low-rank assumption does not necessarily hold especially when the factorized matrix encodes complex node interactions, and the resultant single low-rank embedding matrix is insufficient to capture all the observed connectivity patterns. In this regard, we propose a novel multi-level network embedding framework BoostNE, which can learn multiple network embedding representations of different granularity from coarse to fine without imposing the prevalent global low-rank assumption. The proposed BoostNE method is also in line with the successful gradient boosting method in ensemble learning as multiple weak embeddings lead to a stronger and more effective one. We assess the effectiveness of the proposed BoostNE framework by comparing it with existing state-of-the-art network embedding methods on various datasets, and the experimental results corroborate the superiority of the proposed BoostNE network embedding framework

    Online Machine Learning in Big Data Streams

    Full text link
    The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail

    Factorization tricks for LSTM networks

    Full text link
    We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is "matrix factorization by design" of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups. Both approaches allow us to train large LSTM networks significantly faster to the near state-of the art perplexity while using significantly less RNN parameters.Comment: accepted to ICLR 2017 Worksho

    Topic Compositional Neural Language Model

    Full text link
    We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.Comment: To appear in AISTATS 2018, updated versio

    Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization

    Full text link
    Latent Variable Models (LVMs) are a large family of machine learning models providing a principled and effective way to extract underlying patterns, structure and knowledge from observed data. Due to the dramatic growth of volume and complexity of data, several new challenges have emerged and cannot be effectively addressed by existing LVMs: (1) How to capture long-tail patterns that carry crucial information when the popularity of patterns is distributed in a power-law fashion? (2) How to reduce model complexity and computational cost without compromising the modeling power of LVMs? (3) How to improve the interpretability and reduce the redundancy of discovered patterns? To addresses the three challenges discussed above, we develop a novel regularization technique for LVMs, which controls the geometry of the latent space during learning to enable the learned latent components of LVMs to be diverse in the sense that they are favored to be mutually different from each other, to accomplish long-tail coverage, low redundancy, and better interpretability. We propose a mutual angular regularizer (MAR) to encourage the components in LVMs to have larger mutual angles. The MAR is non-convex and non-smooth, entailing great challenges for optimization. To cope with this issue, we derive a smooth lower bound of the MAR and optimize the lower bound instead. We show that the monotonicity of the lower bound is closely aligned with the MAR to qualify the lower bound as a desirable surrogate of the MAR. Using neural network (NN) as an instance, we analyze how the MAR affects the generalization performance of NN. On two popular latent variable models --- restricted Boltzmann machine and distance metric learning, we demonstrate that MAR can effectively capture long-tail patterns, reduce model complexity without sacrificing expressivity and improve interpretability

    Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization

    Full text link
    We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.Comment: 9 pages, Submitted to a conference, Feb. 201

    Sparse convolutional coding for neuronal ensemble identification

    Full text link
    Cell ensembles, originally proposed by Donald Hebb in 1949, are subsets of synchronously firing neurons and proposed to explain basic firing behavior in the brain. Despite having been studied for many years no conclusive evidence has been presented yet for their existence and involvement in information processing such that their identification is still a topic of modern research, especially since simultaneous recordings of large neuronal population have become possible in the past three decades. These large recordings pose a challenge for methods allowing to identify individual neurons forming cell ensembles and their time course of activity inside the vast amounts of spikes recorded. Related work so far focused on the identification of purely simulta- neously firing neurons using techniques such as Principal Component Analysis. In this paper we propose a new algorithm based on sparse convolution coding which is also able to find ensembles with temporal structure. Application of our algorithm to synthetically generated datasets shows that it outperforms previous work and is able to accurately identify temporal cell ensembles even when those contain overlapping neurons or when strong background noise is present.Comment: 12 pages, 6 figure

    Image Retrieval using Histogram Factorization and Contextual Similarity Learning

    Full text link
    Image retrieval has been a top topic in the field of both computer vision and machine learning for a long time. Content based image retrieval, which tries to retrieve images from a database visually similar to a query image, has attracted much attention. Two most important issues of image retrieval are the representation and ranking of the images. Recently, bag-of-words based method has shown its power as a representation method. Moreover, nonnegative matrix factorization is also a popular way to represent the data samples. In addition, contextual similarity learning has also been studied and proven to be an effective method for the ranking problem. However, these technologies have never been used together. In this paper, we developed an effective image retrieval system by representing each image using the bag-of-words method as histograms, and then apply the nonnegative matrix factorization to factorize the histograms, and finally learn the ranking score using the contextual similarity learning method. The proposed novel system is evaluated on a large scale image database and the effectiveness is shown.Comment: 4 page

    Divide-and-Conquer Learning by Anchoring a Conical Hull

    Full text link
    We reduce a broad class of machine learning problems, usually addressed by EM or sampling, to the problem of finding the kk extremal rays spanning the conical hull of a data point set. These kk "anchors" lead to a global solution and a more interpretable model that can even outperform EM and sampling on generalization error. To find the kk anchors, we propose a novel divide-and-conquer learning scheme "DCA" that distributes the problem to O(klogk)\mathcal O(k\log k) same-type sub-problems on different low-D random hyperplanes, each can be solved by any solver. For the 2D sub-problem, we present a non-iterative solver that only needs to compute an array of cosine values and its max/min entries. DCA also provides a faster subroutine for other methods to check whether a point is covered in a conical hull, which improves algorithm design in multiple dimensions and brings significant speedup to learning. We apply our method to GMM, HMM, LDA, NMF and subspace clustering, then show its competitive performance and scalability over other methods on rich datasets.Comment: 26 pages, long version, in updatin

    Collaborative Recommendation with Auxiliary Data: A Transfer Learning View

    Full text link
    Intelligent recommendation technology has been playing an increasingly important role in various industry applications such as e-commerce product promotion and Internet advertisement display. Besides users' feedbacks (e.g., numerical ratings) on items as usually exploited by some typical recommendation algorithms, there are often some additional data such as users' social circles and other behaviors. Such auxiliary data are usually related to users' preferences on items behind the numerical ratings. Collaborative recommendation with auxiliary data (CRAD) aims to leverage such additional information so as to improve the personalization services, which have received much attention from both researchers and practitioners. Transfer learning (TL) is proposed to extract and transfer knowledge from some auxiliary data in order to assist the learning task on some target data. In this paper, we consider the CRAD problem from a transfer learning view, especially on how to achieve knowledge transfer from some auxiliary data. First, we give a formal definition of transfer learning for CRAD (TL-CRAD). Second, we extend the existing categorization of TL techniques (i.e., adaptive, collective and integrative knowledge transfer algorithm styles) with three knowledge transfer strategies (i.e., prediction rule, regularization and constraint). Third, we propose a novel generic knowledge transfer framework for TL-CRAD. Fourth, we describe some representative works of each specific knowledge transfer strategy of each algorithm style in detail, which are expected to inspire further works. Finally, we conclude the paper with some summary discussions and several future directions
    corecore