7 research outputs found

    Tuning Word2vec for Large Scale Recommendation Systems

    Full text link
    Word2vec is a powerful machine learning tool that emerged from Natural Lan-guage Processing (NLP) and is now applied in multiple domains, including recom-mender systems, forecasting, and network analysis. As Word2vec is often used offthe shelf, we address the question of whether the default hyperparameters are suit-able for recommender systems. The answer is emphatically no. In this paper, wefirst elucidate the importance of hyperparameter optimization and show that un-constrained optimization yields an average 221% improvement in hit rate over thedefault parameters. However, unconstrained optimization leads to hyperparametersettings that are very expensive and not feasible for large scale recommendationtasks. To this end, we demonstrate 138% average improvement in hit rate with aruntime budget-constrained hyperparameter optimization. Furthermore, to makehyperparameter optimization applicable for large scale recommendation problemswhere the target dataset is too large to search over, we investigate generalizinghyperparameters settings from samples. We show that applying constrained hy-perparameter optimization using only a 10% sample of the data still yields a 91%average improvement in hit rate over the default parameters when applied to thefull datasets. Finally, we apply hyperparameters learned using our method of con-strained optimization on a sample to the Who To Follow recommendation serviceat Twitter and are able to increase follow rates by 15%.Comment: 11 pages, 4 figures, Fourteenth ACM Conference on Recommender System

    Hatékony algoritmusok = Efficient algorithms

    Get PDF
    A kutatás során csoportunk egy sor új eredményt ért el a számítástudomány több területén. Ezek a területek: algebrai és szimbolikus számítások, számításelmélet, kombinatorikus optimalizálás, adatbázis-elmélet, adatbányászat és internetes algoritmusok. Néhány fontosabb eredmény: -- véges ponthalmazokhoz rendelhető Gröbner-bázisok és kapcsolódó struktúrák leírása kombinatorikai szempontból érdekes esetekben, -- a kvantumszámítások néhány fontos modelljének az összehasonlítása, számító erejük tisztázása, kvantumalgoritmusok kidolgozása, -- az ""Adatbázis-szerkezetek"" c. akadémiai Nívódíjas monográfia elkészülte, -- komoly előrelépést értünk el több, az interneten való kereséssel kapcsolatos kérdésben: új, hatékony algoritmusokat javasoltunk a világháló lapjainak személyes preferenciákat figyelembe vevő rangsorolására; algoritmust dolgoztunk ki a web spam jelenség nagy megbízhatóságú, automatikus detektálására; létrehoztunk egy kísérleti keresőrendszert, -- új hatékony adatbányászati algoritmusok kidolgozása és ezek alkalmazása; az alkalmazások közül kiemelkedik a telekommunikációs ügyfelek viselkedésének modellezésével kapcsolatos vizsgálatunk, amely Barabási Albert László világhírű kutatócsoportjával közös munka, és amelyről a The New York Times is beszámolt. | With the partial support of the present grant, we have achieved new results in several fields of computer science, including algebraic and symbolic computation, theoretical computer science, combinatorial optimization, database theory, data mining, algorithms for the internet. Some of the highlights are: -- a description of Gröbner bases and related structures attached to finite sets of of points, where the point sets have combinatorial significance, -- a comparison of some models of quantum computation from the perspective of computing power; development of new quantum algorithms, -- publication of the monograph ""Database structures"" (in Hungarian) which won the Quality Prize of the Akadémai Kiadó, -- significant advances in several directions connected to searching the internet: we proposed new, efficient methods for obtaining a personalized ranking of web pages; we proposed algorithms for the automatic and highly reliable detection of spam links in the web; we developed an experimental search engine, -- development and applications of new algorithms for several data mining tasks; among the applications the most important is a model for telecommunication customer behaviour, which has been elaborated in a joint project with the renowned group of Albert László Barabási, among others The York Times reported on some of our findings

    A fast apriori implementation

    No full text
    The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation details are almost always neglected. In this paper we show that the effect of implementation can be more important than the selection of the algorithm. Ideas that seem to be quite promising, may turn out to be ineffective if we descend to the implementation level. We theoretically and experimentally analyze APRIORI which is the most established algorithm for frequent itemset mining. Several implementations of the algorithm have been put forward in the last decade. Although they are implementations of the very same algorithm, they display large differences in running time and memory need. In this paper we describe an implementation of APRIORI that outperforms all implementations known to us. We analyze, theoretically and experimentally, the principal data structure of our solution. This data structure is the main factor in the efficiency of our implementation. Moreover, we present a simple modification of APRIORI that appears to be faster than the original algorithm.
    corecore