237 research outputs found

    Fast Matrix Factorization for Online Recommendation with Implicit Feedback

    Full text link
    This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing works resort to assign a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption is invalid in real-world settings. Second, most methods are also designed in an offline setting and fail to keep up with the dynamic nature of online data. We address the above two issues in learning MF models from implicit feedback. We first propose to weight the missing data based on item popularity, which is more effective and flexible than the uniform-weight assumption. However, such a non-uniform weighting poses efficiency challenge in learning the model. To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data. We exploit this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback. Through comprehensive experiments on two public datasets in both offline and online protocols, we show that our eALS method consistently outperforms state-of-the-art implicit MF methods. Our implementation is available at https://github.com/hexiangnan/sigir16-eals.Comment: 10 pages, 8 figure

    Eighty years of food-web response to interannual variation in discharge recorded in river diatom frustules from an ocean sediment core.

    Get PDF
    Little is known about the importance of food-web processes as controls of river primary production due to the paucity of both long-term studies and of depositional environments which would allow retrospective fossil analysis. To investigate how freshwater algal production in the Eel River, northern California, varied over eight decades, we quantified siliceous shells (frustules) of freshwater diatoms from a well-dated undisturbed sediment core in a nearshore marine environment. Abundances of freshwater diatom frustules exported to Eel Canyon sediment from 1988 to 2001 were positively correlated with annual biomass of Cladophora surveyed over these years in upper portions of the Eel basin. Over 28 years of contemporary field research, peak algal biomass was generally higher in summers following bankfull, bed-scouring winter floods. Field surveys and experiments suggested that bed-mobilizing floods scour away overwintering grazers, releasing algae from spring and early summer grazing. During wet years, growth conditions for algae could also be enhanced by increased nutrient loading from the watershed, or by sustained summer base flows. Total annual rainfall and frustule densities in laminae over a longer 83-year record were weakly and negatively correlated, however, suggesting that positive effects of floods on annual algal production were primarily mediated by "top-down" (consumer release) rather than "bottom-up" (growth promoting) controls

    AutoGraph: Imperative-style Coding with Graph-based Performance

    Get PDF
    There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano benefit from whole-program optimization and can be deployed broadly, but make expressing complex models more cumbersome. We describe how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both. A key insight is to delay all type-dependent decisions until runtime, via dynamic dispatch. We instantiate these principles in AutoGraph, a software system that improves the programming experience of the TensorFlow library, and demonstrate usability improvements with no loss in performance compared to native TensorFlow graphs. We also show that our system is backend agnostic, and demonstrate targeting an alternate IR with characteristics not found in TensorFlow graphs

    World citation and collaboration networks: uncovering the role of geography in science

    Get PDF
    Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors' affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually.Comment: Published version. 9 pages, 5 figures + Appendix, The world citation and collaboration networks at both city and country level are available at http://becs.aalto.fi/~rajkp/datasets.htm

    Modeling Methane Adsorption in Interpenetrating Porous Polymer Networks

    Get PDF
    Porous polymer networks (PPNs) are a class of porous materials of particular interest in a variety of energy-related applications because of their stability, high surface areas, and gas uptake capacities. Computationally derived structures for five recently synthesized PPN frameworks, PPN-2, -3, -4, -5, and -6, were generated for various topologies, optimized using semiempirical electronic structure methods, and evaluated using classical grand-canonical Monte Carlo simulations. We show that a key factor in modeling the methane uptake performance of these materials is whether, and how, these material frameworks interpenetrate and demonstrate a computational approach for predicting the presence, degree, and nature of interpenetration in PPNs that enables the reproduction of experimental adsorption data. © 2013 American Chemical Society

    One-Pass Ranking Models for Low-Latency Product Recommendations

    Full text link
    Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommenda-tions by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the näıve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers’ preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equiva

    HXE 108 - APPROACHES TO ENGLISH LITERATURE OCT 04.

    Get PDF
    Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search. We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material

    Sublinear Algorithms for Approximating String Compressibility

    Get PDF
    We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.National Science Foundation (U.S.) (Award CCF-1065125)National Science Foundation (U.S.) (Award CCF-0728645)Marie Curie International Reintegration Grant PIRG03-GA-2008-231077Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09
    corecore