769 research outputs found
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Recent advancements in deep neural networks for graph-structured data have
led to state-of-the-art performance on recommender system benchmarks. However,
making these methods practical and scalable to web-scale recommendation tasks
with billions of items and hundreds of millions of users remains a challenge.
Here we describe a large-scale deep recommendation engine that we developed and
deployed at Pinterest. We develop a data-efficient Graph Convolutional Network
(GCN) algorithm PinSage, which combines efficient random walks and graph
convolutions to generate embeddings of nodes (i.e., items) that incorporate
both graph structure as well as node feature information. Compared to prior GCN
approaches, we develop a novel method based on highly efficient random walks to
structure the convolutions and design a novel training strategy that relies on
harder-and-harder training examples to improve robustness and convergence of
the model. We also develop an efficient MapReduce model inference algorithm to
generate embeddings using a trained model. We deploy PinSage at Pinterest and
train it on 7.5 billion examples on a graph with 3 billion nodes representing
pins and boards, and 18 billion edges. According to offline metrics, user
studies and A/B tests, PinSage generates higher-quality recommendations than
comparable deep learning and graph-based alternatives. To our knowledge, this
is the largest application of deep graph embeddings to date and paves the way
for a new generation of web-scale recommender systems based on graph
convolutional architectures.Comment: KDD 201
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Recommended from our members
Parallelizing support vector machines for scalable image annotation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large.
In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments.
SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced.
The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers.
The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications
Koneoppimiskehys petrokemianteollisuuden sovelluksille
Machine learning has many potentially useful applications in process industry, for example in process monitoring and control. Continuously accumulating process data and the recent development in software and hardware that enable more advanced machine learning, are fulfilling the prerequisites of developing and deploying process automation integrated machine learning applications which improve existing functionalities or even implement artificial intelligence.
In this master's thesis, a framework is designed and implemented on a proof-of-concept level, to enable easy acquisition of process data to be used with modern machine learning libraries, and to also enable scalable online deployment of the trained models. The literature part of the thesis concentrates on studying the current state and approaches for digital advisory systems for process operators, as a potential application to be developed on the machine learning framework.
The literature study shows that the approaches for process operators' decision support tools have shifted from rule-based and knowledge-based methods to machine learning. However, no standard methods can be concluded, and most of the use cases are quite application-specific.
In the developed machine learning framework, both commercial software and open source components with permissive licenses are used. Data is acquired over OPC UA and then processed in Python, which is currently almost the de facto standard language in data analytics. Microservice architecture with containerization is used in the online deployment, and in a qualitative evaluation, it proved to be a versatile and functional solution.Koneoppimisella voidaan osoittaa olevan useita hyödyllisiä käyttökohteita prosessiteollisuudessa, esimerkiksi prosessinohjaukseen liittyvissä sovelluksissa. Jatkuvasti kerääntyvä prosessidata ja toisaalta koneoppimiseen soveltuvien ohjelmistojen sekä myös laitteistojen viimeaikainen kehitys johtavat tilanteeseen, jossa prosessiautomaatioon liitettyjen koneoppimissovellusten avulla on mahdollista parantaa nykyisiä toiminnallisuuksia tai jopa toteuttaa tekoälysovelluksia.
Tässä diplomityössä suunniteltiin ja toteutettiin prototyypin tasolla koneoppimiskehys, jonka avulla on helppo käyttää prosessidataa yhdessä nykyaikaisten koneoppimiskirjastojen kanssa. Kehys mahdollistaa myös koneopittujen mallien skaalautuvan käyttöönoton. Diplomityön kirjallisuusosa keskittyy prosessioperaattoreille tarkoitettujen digitaalisten avustajajärjestelmien nykytilaan ja toteutustapoihin, avustajajärjestelmän tai sen päätöstukijärjestelmän ollessa yksi mahdollinen koneoppimiskehyksen päälle rakennettava ohjelma.
Kirjallisuustutkimuksen mukaan prosessioperaattorin päätöstukijärjestelmien taustalla olevat menetelmät ovat yhä useammin koneoppimiseen perustuvia, aiempien sääntö- ja tietämyskantoihin perustuvien menetelmien sijasta. Selkeitä yhdenmukaisia lähestymistapoja ei kuitenkaan ole helposti pääteltävissä kirjallisuuden perusteella. Lisäksi useimmat tapausesimerkit ovat sovellettavissa vain kyseisissä erikoistapauksissa.
Kehitetyssä koneoppimiskehyksessä on käytetty sekä kaupallisia että avoimen lähdekoodin komponentteja. Prosessidata haetaan OPC UA -protokollan avulla, ja sitä on mahdollista käsitellä Python-kielellä, josta on muodostunut lähes de facto -standardi data-analytiikassa. Kehyksen käyttöönottokomponentit perustuvat mikropalveluarkkitehtuuriin ja konttiteknologiaan, jotka osoittautuivat laadullisessa testauksessa monipuoliseksi ja toimivaksi toteutustavaksi
- …