123 research outputs found

    Dynamic Matrix Factorization with Priors on Unknown Values

    Full text link
    Advanced and effective collaborative filtering methods based on explicit feedback assume that unknown ratings do not follow the same model as the observed ones (\emph{not missing at random}). In this work, we build on this assumption, and introduce a novel dynamic matrix factorization framework that allows to set an explicit prior on unknown values. When new ratings, users, or items enter the system, we can update the factorization in time independent of the size of data (number of users, items and ratings). Hence, we can quickly recommend items even to very recent users. We test our methods on three large datasets, including two very sparse ones, in static and dynamic conditions. In each case, we outrank state-of-the-art matrix factorization methods that do not use a prior on unknown ratings.Comment: in the Proceedings of 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining 201

    GraphLab: A New Framework for Parallel Machine Learning

    Full text link
    Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

    Simulation and data processing of GOMOS measurements

    Get PDF
    In this paper the data simulation and data inversion studies for stellar occultation measurements are discussed. The specific application is the Global Ozone Monitoring by Occultation of Stars (GOMOS) instrument which has been proposed for the first European Platform, Polar Orbiting Earth Mission (POEM-1)

    Distributed GraphLab: A Framework for Machine Learning in the Cloud

    Full text link
    While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.Comment: VLDB201

    Interplanetary Lyman α\alpha line profiles: variations with solar activity cycle

    Full text link
    Interplanetary Lyman alpha line profiles are derived from the SWAN H cell data measurements. The measurements cover a 6-year period from solar minimum (1996) to after the solar maximum of 2001. This allows us to study the variations of the line profiles with solar activity. These line profiles were used to derive line shifts and line widths in the interplanetary medium for various angles of the LOS with the interstellar flow direction. The SWAN data results were then compared to an interplanetary background upwind spectrum obtained by STIS/HST in March 2001. We find that the LOS upwind velocity associated with the mean line shift of the IP \lya line varies from 25.7 km/s to 21.4 km/s from solar minimum to solar maximum. Most of this change is linked with variations in the radiation pressure. LOS kinetic temperatures derived from IP line widths do not vary monotonically with the upwind angle of the LOS. This is not compatible with calculations of IP line profiles based on hot model distributions of interplanetary hydrogen. We also find that the line profiles get narrower during solar maximum. The results obtained on the line widths (LOS temperature) show that the IP line is composed of two components scattered by two hydrogen populations with different bulk velocities and temperature. This is a clear signature of the heliospheric interface on the line profiles seen at 1 AU from the sun.Comment: 9 pages, 9 figure
    corecore