175 research outputs found
Predicting response in mobile advertising with Hierarchical Importance-Aware Factorization Machine
Mobile advertising has recently seen dramatic growth, fu-eled by the global proliferation of mobile phones and devices. The task of predicting ad response is thus crucial for maxi-mizing business revenue. However, ad response data change dynamically over time, and are subject to cold-start situ-ations in which limited history hinders reliable prediction
End-to-End Differentiable Proving
We introduce neural networks for end-to-end differentiable proving of queries
to knowledge bases by operating on dense vector representations of symbols.
These neural networks are constructed recursively by taking inspiration from
the backward chaining algorithm as used in Prolog. Specifically, we replace
symbolic unification with a differentiable computation on vector
representations of symbols using a radial basis function kernel, thereby
combining symbolic reasoning with learning subsymbolic vector representations.
By using gradient descent, the resulting neural network can be trained to infer
facts from a given incomplete knowledge base. It learns to (i) place
representations of similar symbols in close proximity in a vector space, (ii)
make use of such similarities to prove queries, (iii) induce logical rules, and
(iv) use provided and induced logical rules for multi-hop reasoning. We
demonstrate that this architecture outperforms ComplEx, a state-of-the-art
neural link prediction model, on three out of four benchmark knowledge bases
while at the same time inducing interpretable function-free first-order logic
rules.Comment: NIPS 2017 camera-ready, NIPS 201
Recommended from our members
Accelerating Iterative Computations for Large-Scale Data Processing
Recent advances in sensing, storage, and networking technologies are creating massive amounts of data at an unprecedented scale and pace. Large-scale data processing is commonly leveraged to make sense of these data, which will enable companies, governments, and organizations, to make better decisions and bring convenience to our daily life. However, the massive amount of data involved makes it challenging to perform data processing in a timely manner. On the one hand, huge volumes of data might not even fit into the disk of a single machine. On the other hand, data mining and machine learning algorithms, which are usually involved in large-scale data processing, typically require time-consuming iterative computations. Therefore, it is imperative to efficiently perform iterative computations on large computer clusters or cloud using highly-parallel and shared-nothing distributed systems.
This research aims to explore new forms of iterative computations that reduce unnecessary computations so as to accelerate large-scale data processing in a distributed environment. We propose the iterative computation transformation for well-known data mining and machine learning algorithms, such as expectation-maximization, nonnegative matrix factorization, belief propagation, and graph algorithms (e.g., PageRank). These algorithms have been used in a wide range of application domains. First, we show how to accelerate expectation-maximization algorithms with frequent updates in a distributed environment. Then, we illustrate the way of efficiently scaling distributed nonnegative matrix factorization with block-wise updates. Next, our approach of scaling distributed belief propagation with prioritized block updates is presented. Last, we illustrate how to efficiently perform distributed incremental computation on evolving graphs.
We will elaborate how to implement these transformed iterative computations on existing distributed programming models such as the MapReduce-based model, as well as develop new scalable and efficient distributed programming models and frameworks when necessary. The goal of these supporting distributed frameworks is to lift the burden of the programmers in specifying transformation of iterative computations and communication mechanisms, and automatically optimize the execution of the computation. Our techniques are evaluated extensively to demonstrate their efficiency. While the techniques we propose are in the context of specific algorithms, they address the challenges commonly faced in many other algorithms
Scalable and distributed constrained low rank approximations
Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A â WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms.Ph.D
Combining Representation Learning with Logic for Language Processing
The current state-of-the-art in many natural language processing and
automated knowledge base completion tasks is held by representation learning
methods which learn distributed vector representations of symbols via
gradient-based optimization. They require little or no hand-crafted features,
thus avoiding the need for most preprocessing steps and task-specific
assumptions. However, in many cases representation learning requires a large
amount of annotated training data to generalize well to unseen data. Such
labeled training data is provided by human annotators who often use formal
logic as the language for specifying annotations. This thesis investigates
different combinations of representation learning methods with logic for
reducing the need for annotated training data, and for improving
generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201
Scalable optimization algorithms for recommender systems
Recommender systems have now gained significant popularity and been widely used in many e-commerce applications. Predicting user preferences is a key step to providing high quality recommendations. In practice, however, suggestions made to users must not only consider user preferences in isolation; a good recommendation engine also needs to account for certain constraints. For instance, an online video rental that suggests multimedia items (e.g., DVDs) to its customers should consider the availability of DVDs in stock to reduce customer waiting times for accepted recommendations. Moreover, every user should receive a small but sufficient number of suggestions that the user is likely to be interested in.
This thesis aims to develop and implement scalable optimization algorithms that can be used (but are not restricted) to generate recommendations satisfying certain objectives and constraints like the ones above. State-of-the-art approaches lack efficiency and/or scalability in coping with large real-world instances, which may involve millions of users and items. First, we study large-scale matrix completion in the context of collaborative filtering in recommender systems. For such problems, we propose a set of novel shared-nothing algorithms which are designed to run on a small cluster of commodity nodes and outperform alternative approaches in terms of efficiency, scalability, and memory footprint. Next, we view our recommendation task as a generalized matching problem, and propose the first distributed solution for solving such problems at scale. Our algorithm is designed to run on a small cluster of commodity nodes (or in a MapReduce environment) and has strong approximation guarantees. Our matching algorithm relies on linear programming. To this end, we present an efficient distributed approximation algorithm for mixed packing-covering linear programs, a simple but expressive subclass of linear programs. Our approximation algorithm requires a poly-logarithmic number of passes over the input, is simple, and well-suited for parallel processing on GPUs, in shared-memory architectures, as well as on a small cluster of commodity nodes.Empfehlungssysteme haben eine beachtliche PopularitĂ€t erreicht und werden in zahlreichen E-Commerce Anwendungen eingesetzt. Entscheidend fĂŒr die Generierung hochqualitativer Empfehlungen ist die Vorhersage von NutzerprĂ€ferenzen. Jedoch sollten in der Praxis nicht nur VorschlĂ€ge auf Basis von NutzerprĂ€ferenzen gegeben werden, sondern gute Empfehlungssysteme mĂŒssen auch bestimmte Nebenbedingungen berĂŒcksichtigen. Zum Beispiel sollten online Videoverleihfirmen, welche ihren Kunden multimediale Produkte (z.B. DVDs) vorschlagen, die VerfĂŒgbarkeit von vorrĂ€tigen DVDs beachten, um die Wartezeit der Kunden fĂŒr angenommene Empfehlungen zu reduzieren. DarĂŒber hinaus sollte jeder Kunde eine kleine, aber ausreichende Anzahl an VorschlĂ€gen erhalten, an denen er interessiert sein könnte.
Diese Arbeit strebt an skalierbare Optimierungsalgorithmen zu entwickeln und zu implementieren, die (unter anderem) eingesetzt werden können Empfehlungen zu generieren, welche weitere Zielvorgaben und Restriktionen einhalten. Derzeit existierenden AnsĂ€tzen mangelt es an Effizienz und/oder Skalierbarkeit im Umgang mit sehr groĂen, durchaus realen DatensĂ€tzen von, beispielsweise Millionen von Nutzern und Produkten. ZunĂ€chst analysieren wir die VervollstĂ€ndigung groĂskalierter Matrizen im Kontext von kollaborativen Filtern in Empfehlungssystemen. FĂŒr diese Probleme schlagen wir verschiedene neue, verteilte Algorithmen vor, welche konzipiert sind auf einer kleinen Anzahl von gĂ€ngigen Rechnern zu laufen. Zudem können sie alternative AnsĂ€tze hinsichtlich der Effizienz, Skalierbarkeit und benötigten SpeicherkapazitĂ€t ĂŒberragen. Als NĂ€chstes haben wir die Empfehlungsproblematik als ein generalisiertes Zuordnungsproblem betrachtet und schlagen daher die erste verteilte Lösung fĂŒr groĂskalierte Zuordnungsprobleme vor. Unser Algorithmus funktioniert auf einer kleinen Gruppe von gĂ€ngigen Rechnern (oder in einem MapReduce-Programmierungsmodel) und erzielt gute Approximationsgarantien. Unser Zuordnungsalgorithmus beruht auf linearer Programmierung. Daher prĂ€sentieren wir einen effizienten, verteilten Approximationsalgorithmus fĂŒr vermischte lineare Packungs- und Ăberdeckungsprobleme, eine einfache aber expressive Unterklasse der linearen Programmierung. Unser Algorithmus benötigt eine polylogarithmische Anzahl an Scans der Eingabedaten. Zudem ist er einfach und sehr gut geeignet fĂŒr eine parallele Verarbeitung mithilfe von Grafikprozessoren, unter einer gemeinsam genutzten Speicherarchitektur sowie auf einer kleinen Gruppe von gĂ€ngigen Rechnern
Recommended from our members
Scalable algorithms for latent variable models in machine learning
Latent variable modeling (LVM) is a popular approach in many machine learning applications, such as recommender systems and topic modeling, due to its ability to succinctly represent data, even in the presence of several missing entries. Existing learning methods for LVMs, while attractive, are infeasible for the large-scale datasets required in modern big data applications. In addition, such applications often come with various types of side information such as the text description of items and the social network among users in a recommender system. In this thesis, we present scalable learning algorithms for a wide range of latent variable models such as low-rank matrix factorization and latent Dirichlet allocation. We also develop simple but effective techniques to extend existing LVMs to exploit various types of side information and make better predictions in many machine learning applications such as recommender systems, multi-label learning, and high-dimensional time-series prediction. In addition, we also propose a novel approach for the maximum inner product search problem to accelerate the prediction phase of many latent variable models.Computer Science
- âŠ