Search CORE

237 research outputs found

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

Author: Levy O.
Ling G.
Marlin B. M.
Mikolov T.
Rendle S.
Sculley D.
Publication venue
Publication date: 16/08/2017
Field of study

This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing works resort to assign a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption is invalid in real-world settings. Second, most methods are also designed in an offline setting and fail to keep up with the dynamic nature of online data. We address the above two issues in learning MF models from implicit feedback. We first propose to weight the missing data based on item popularity, which is more effective and flexible than the uniform-weight assumption. However, such a non-uniform weighting poses efficiency challenge in learning the model. To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data. We exploit this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback. Through comprehensive experiments on two public datasets in both offline and online protocols, we show that our eALS method consistently outperforms state-of-the-art implicit MF methods. Our implementation is available at https://github.com/hexiangnan/sigir16-eals.Comment: 10 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Eighty years of food-web response to interannual variation in discharge recorded in river diatom frustules from an ocean sediment core.

Author: Drexler Tina M
Lowe Rex L
Nittrouer Charles A
Power Mary E
Sculley John B
Publication venue: eScholarship, University of California
Publication date: 05/09/2017
Field of study

Little is known about the importance of food-web processes as controls of river primary production due to the paucity of both long-term studies and of depositional environments which would allow retrospective fossil analysis. To investigate how freshwater algal production in the Eel River, northern California, varied over eight decades, we quantified siliceous shells (frustules) of freshwater diatoms from a well-dated undisturbed sediment core in a nearshore marine environment. Abundances of freshwater diatom frustules exported to Eel Canyon sediment from 1988 to 2001 were positively correlated with annual biomass of Cladophora surveyed over these years in upper portions of the Eel basin. Over 28 years of contemporary field research, peak algal biomass was generally higher in summers following bankfull, bed-scouring winter floods. Field surveys and experiments suggested that bed-mobilizing floods scour away overwintering grazers, releasing algae from spring and early summer grazing. During wet years, growth conditions for algae could also be enhanced by increased nutrient loading from the watershed, or by sustained summer base flows. Total annual rainfall and frustule densities in laminae over a longer 83-year record were weakly and negatively correlated, however, suggesting that positive effects of floods on annual algal production were primarily mediated by "top-down" (consumer release) rather than "bottom-up" (growth promoting) controls

Crossref

eScholarship - University of California

AutoGraph: Imperative-style Coding with Graph-based Performance

Author: Moldovan Dan
Decker James M
Wang Fei
Johnson Andrew A
Lee Brian K
Nado Zachary
Sculley D
Rompf Tiark
Wiltschko Alexander B
Publication venue
Publication date: 26/03/2019
Field of study

There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano benefit from whole-program optimization and can be deployed broadly, but make expressing complex models more cumbersome. We describe how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both. A key insight is to delay all type-dependent decisions until runtime, via dynamic dispatch. We instantiate these principles in AutoGraph, a software system that improves the programming experience of the TensorFlow library, and demonstrate usability improvements with no loss in performance compared to native TensorFlow graphs. We also show that our system is backend agnostic, and demonstrate targeting an alternate IR with characteristics not found in TensorFlow graphs

arXiv.org e-Print Archive

Registro Nacional de Trabajos de Investigación y Proyectos

Universidad Nacional de Educacion Enrique Guzmán y Valle: Repositorio UNE

World citation and collaboration networks: uncovering the role of geography in science

Author: A Agrawal
A Chandra
A Petersen
B Jones
D Liben-Nowell
D Sculley
F Havemann
F Radicchi
G Krings
J Adams
J Johnes
JD Adams
JS Katz
JS Katz
K Bhattacharya
K Börner
K Frenken
L Bettencourt
L Georghiou
M Barthélemy
M Gastner
M Rosvall
P Kaluza
R Lambiotte
R Sinnott
RK Pan
S Hennemann
S Lee
S Teasley
S Wuchty
T Banchoff
TA Finholt
TS Rosenblat
W Glänzel
X Gabaix
Y Okubo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors' affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually.Comment: Published version. 9 pages, 5 figures + Appendix, The world citation and collaboration networks at both city and country level are available at http://becs.aalto.fi/~rajkp/datasets.htm

arXiv.org e-Print Archive

Crossref

PubMed Central

Aaltodoc Publication Archive

Recommended from our members

In-street wind direction variability in the vicinity of a busy intersection in central London

Author: A Dobre
A Robins
A Scaperdas
A Scaperdas
Adrian Dobre
AF Barlow
Ahmed A. Balogun
Alan G. Robins
Alison S. Tomlin
AS Tomlin
CR Wood
Curtis R. Wood
D Martin
Damien Martin
DE Shallcross
Dudley E. Shallcross
FF DePaul
GT Johnson
H Sugawara
I Eliasson
ID Longley
J Zamurs
Janet F. Barlow
Justin J. N. Lingard
JWD Boddy
L Soulhac
L Soulhac
M Carpentieri
M Nielson
M Roth
MK Ahmad
MR Raupach
MW Rotach
MW Rotach
MW Rotach
NS Dixon
P Kastner-Klein
P Kastner-Klein
P Klein
P Klein
RD Sculley
RE Britter
RJ Smalley
Robert J. Smalley
RW Macdonald
S Arnold
S Vardoulakis
Sam J. Arnold
SR Hanna
SR Hanna
Stephen E. Belcher
TR Oke
X Wang
Y Nakamura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2010
Field of study

We present results from fast-response wind measurements within and above a busy intersection between two street canyons (Marylebone Road and Gloucester Place) in Westminster, London taken as part of the DAPPLE (Dispersion of Air Pollution and Penetration into the Local Environment; www.dapple.org.uk) 2007 field campaign. The data reported here were collected using ultrasonic anemometers on the roof-top of a building adjacent to the intersection and at two heights on a pair of lamp-posts on opposite sides of the intersection. Site characteristics, data analysis and the variation of intersection flow with the above-roof wind direction (θref) are discussed. Evidence of both flow channelling and recirculation was identified within the canyon, only a few metres from the intersection for along-street and across-street roof-top winds respectively. Results also indicate that for oblique rooftop flows, the intersection flow is a complex combination of bifurcated channelled flows, recirculation and corner vortices. Asymmetries in local building geometry around the intersection and small changes in the background wind direction (changes in 15-min mean θref of 5–10 degrees) were also observed to have profound influences on the behaviour of intersection flow patterns. Consequently, short time-scale variability in the background flow direction can lead to highly scattered in-street mean flow angles masking the true multi-modal features of the flow and thus further complicating modelling challenges

Central Archive at the University of Reading

Crossref

Surrey Research Insight

Modeling Methane Adsorption in Interpenetrating Porous Polymer Networks

Author: Haranczyk M.
Martin R. L.
Sculley J. P.
Shahrak M. N.
Simon C. M.
Smit B.
Swisher J. A.
Zhou H. C.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 14/08/2014
Field of study

Porous polymer networks (PPNs) are a class of porous materials of particular interest in a variety of energy-related applications because of their stability, high surface areas, and gas uptake capacities. Computationally derived structures for five recently synthesized PPN frameworks, PPN-2, -3, -4, -5, and -6, were generated for various topologies, optimized using semiempirical electronic structure methods, and evaluated using classical grand-canonical Monte Carlo simulations. We show that a key factor in modeling the methane uptake performance of these materials is whether, and how, these material frameworks interpenetrate and demonstrate a computational approach for predicting the presence, degree, and nature of interpenetration in PPNs that enables the reproduction of experimental adsorption data. © 2013 American Chemical Society

Infoscience - École polytechnique fédérale de Lausanne

One-Pass Ranking Models for Low-Latency Product Recommendations

Author: Bottou L.
Burges C. J. C.
Büttcher S.
Croft B.
Herbrich R.
Hoffman M.
Hu C.
McMahan B. H.
Mohan A.
Negahban S.
Sculley D.
Snoek J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/12/2015
Field of study

Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommenda-tions by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the näıve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers’ preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equiva

CiteSeerX

Crossref

HXE 108 - APPROACHES TO ENGLISH LITERATURE OCT 04.

Author: Bron M.
Croft B.
Huurnink B.
Lee G. G.
Lin J.
Macdonald C.
Manning C. D.
Mohan A.
Robertson S.
Robertson S.
Sculley D.
Tague J.
Publication venue
Publication date: 01/10/2004
Field of study

Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search. We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material

CiteSeerX

Crossref

Repository@USM

International Migration, Integration and Social Cohesion online publications

Sublinear Algorithms for Approximating String Compressibility

Author: A. Luca de
Adam Smith
C.K. Chui
D. Benedetto
D. Sculley
Dana Ron
E. Frank
E. Keogh
E. Lehman
E.J. Keogh
F. Levé
F.M.J. Willems
G. Cormode
H. Cai
I. Gheorghiciuc
I.H. Witten
J. Cleary
J. Shallit
J. Ziv
J. Ziv
L. Ilie
L. Paninski
L. Paninski
L. Pierce II
M. Brautbar
M. Charikar
M. Li
M. Li
N. Ahmed
N. Alon
O. Keller
O.V. Kukushkina
R. Cilibrasi
R. Cilibrasi
Ronitt Rubinfeld
S. Janson
S. Raskhodnikova
S. Raskhodnikova
Sofya Raskhodnikova
T. Batu
T. Cover
Z. Bar-Yossef
Z. Kása
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2011
Field of study

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.National Science Foundation (U.S.) (Award CCF-1065125)National Science Foundation (U.S.) (Award CCF-0728645)Marie Curie International Reintegration Grant PIRG03-GA-2008-231077Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09

CiteSeerX

DSpace@MIT

Crossref