123 research outputs found
Dynamic Matrix Factorization with Priors on Unknown Values
Advanced and effective collaborative filtering methods based on explicit
feedback assume that unknown ratings do not follow the same model as the
observed ones (\emph{not missing at random}). In this work, we build on this
assumption, and introduce a novel dynamic matrix factorization framework that
allows to set an explicit prior on unknown values. When new ratings, users, or
items enter the system, we can update the factorization in time independent of
the size of data (number of users, items and ratings). Hence, we can quickly
recommend items even to very recent users. We test our methods on three large
datasets, including two very sparse ones, in static and dynamic conditions. In
each case, we outrank state-of-the-art matrix factorization methods that do not
use a prior on unknown ratings.Comment: in the Proceedings of 21st ACM SIGKDD Conference on Knowledge
Discovery and Data Mining 201
GraphLab: A New Framework for Parallel Machine Learning
Designing and implementing efficient, provably correct parallel machine
learning (ML) algorithms is challenging. Existing high-level parallel
abstractions like MapReduce are insufficiently expressive while low-level tools
like MPI and Pthreads leave ML experts repeatedly solving the same design
challenges. By targeting common patterns in ML, we developed GraphLab, which
improves upon abstractions like MapReduce by compactly expressing asynchronous
iterative algorithms with sparse computational dependencies while ensuring data
consistency and achieving a high degree of parallel performance. We demonstrate
the expressiveness of the GraphLab framework by designing and implementing
parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and
Compressed Sensing. We show that using GraphLab we can achieve excellent
parallel performance on large scale real-world problems
Simulation and data processing of GOMOS measurements
In this paper the data simulation and data inversion studies for stellar occultation measurements are discussed. The specific application is the Global Ozone Monitoring by Occultation of Stars (GOMOS) instrument which has been proposed for the first European Platform, Polar Orbiting Earth Mission (POEM-1)
Distributed GraphLab: A Framework for Machine Learning in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the
design and implementation of large-scale data processing systems, they do not
naturally or efficiently support many important data mining and machine
learning algorithms and can lead to inefficient learning systems. To help fill
this critical void, we introduced the GraphLab abstraction which naturally
expresses asynchronous, dynamic, graph-parallel computation while ensuring data
consistency and achieving a high degree of parallel performance in the
shared-memory setting. In this paper, we extend the GraphLab framework to the
substantially more challenging distributed setting while preserving strong data
consistency guarantees. We develop graph based extensions to pipelined locking
and data versioning to reduce network congestion and mitigate the effect of
network latency. We also introduce fault tolerance to the GraphLab abstraction
using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can
be easily implemented by exploiting the GraphLab abstraction itself. Finally,
we evaluate our distributed implementation of the GraphLab abstraction on a
large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains
over Hadoop-based implementations.Comment: VLDB201
Interplanetary Lyman line profiles: variations with solar activity cycle
Interplanetary Lyman alpha line profiles are derived from the SWAN H cell
data measurements. The measurements cover a 6-year period from solar minimum
(1996) to after the solar maximum of 2001. This allows us to study the
variations of the line profiles with solar activity. These line profiles were
used to derive line shifts and line widths in the interplanetary medium for
various angles of the LOS with the interstellar flow direction. The SWAN data
results were then compared to an interplanetary background upwind spectrum
obtained by STIS/HST in March 2001. We find that the LOS upwind velocity
associated with the mean line shift of the IP \lya line varies from 25.7 km/s
to 21.4 km/s from solar minimum to solar maximum. Most of this change is linked
with variations in the radiation pressure. LOS kinetic temperatures derived
from IP line widths do not vary monotonically with the upwind angle of the LOS.
This is not compatible with calculations of IP line profiles based on hot model
distributions of interplanetary hydrogen. We also find that the line profiles
get narrower during solar maximum. The results obtained on the line widths (LOS
temperature) show that the IP line is composed of two components scattered by
two hydrogen populations with different bulk velocities and temperature. This
is a clear signature of the heliospheric interface on the line profiles seen at
1 AU from the sun.Comment: 9 pages, 9 figure
- …