79,717 research outputs found

    Evaluation of recommender systems in streaming environments

    Full text link
    Evaluation of recommender systems is typically done with finite datasets. This means that conventional evaluation methodologies are only applicable in offline experiments, where data and models are stationary. However, in real world systems, user feedback is continuously generated, at unpredictable rates. Given this setting, one important issue is how to evaluate algorithms in such a streaming data environment. In this paper we propose a prequential evaluation protocol for recommender systems, suitable for streaming data environments, but also applicable in stationary settings. Using this protocol we are able to monitor the evolution of algorithms' accuracy over time. Furthermore, we are able to perform reliable comparative assessments of algorithms by computing significance tests over a sliding window. We argue that besides being suitable for streaming data, prequential evaluation allows the detection of phenomena that would otherwise remain unnoticed in the evaluation of both offline and online recommender systems.Comment: Workshop on 'Recommender Systems Evaluation: Dimensions and Design' (REDD 2014), held in conjunction with RecSys 2014. October 10, 2014, Silicon Valley, United State

    A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics

    Full text link
    Numerous statistics have been proposed for the measure of offensive ability in major league baseball. While some of these measures may offer moderate predictive power in certain situations, it is unclear which simple offensive metrics are the most reliable or consistent. We address this issue with a Bayesian hierarchical model for variable selection to capture which offensive metrics are most predictive within players across time. Our sophisticated methodology allows for full estimation of the posterior distributions for our parameters and automatically adjusts for multiple testing, providing a distinct advantage over alternative approaches. We implement our model on a set of 50 different offensive metrics and discuss our results in the context of comparison to other variable selection techniques. We find that 33/50 metrics demonstrate signal. However, these metrics are highly correlated with one another and related to traditional notions of performance (e.g., plate discipline, power, and ability to make contact)

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    Energy rating of a water pumping station using multivariate analysis

    Get PDF
    Among water management policies, the preservation and the saving of energy demand in water supply and treatment systems play key roles. When focusing on energy, the customary metric to determine the performance of water supply systems is linked to the definition of component-based energy indicators. This approach is unfit to account for interactions occurring among system elements or between the system and its environment. On the other hand, the development of information technology has led to the availability of increasing large amount of data, typically gathered from distributed sensor networks in so-called smart grids. In this context, data intensive methodologies address the possibility of using complex network modeling approaches, and advocate the issues related to the interpretation and analysis of large amount of data produced by smart sensor networks. In this perspective, the present work aims to use data intensive techniques in the energy analysis of a water management network. The purpose is to provide new metrics for the energy rating of the system and to be able to provide insights into the dynamics of its operations. The study applies neural network as a tool to predict energy demand, when using flowrate and vibration data as predictor variables
    corecore