79,717 research outputs found
Evaluation of recommender systems in streaming environments
Evaluation of recommender systems is typically done with finite datasets.
This means that conventional evaluation methodologies are only applicable in
offline experiments, where data and models are stationary. However, in real
world systems, user feedback is continuously generated, at unpredictable rates.
Given this setting, one important issue is how to evaluate algorithms in such a
streaming data environment. In this paper we propose a prequential evaluation
protocol for recommender systems, suitable for streaming data environments, but
also applicable in stationary settings. Using this protocol we are able to
monitor the evolution of algorithms' accuracy over time. Furthermore, we are
able to perform reliable comparative assessments of algorithms by computing
significance tests over a sliding window. We argue that besides being suitable
for streaming data, prequential evaluation allows the detection of phenomena
that would otherwise remain unnoticed in the evaluation of both offline and
online recommender systems.Comment: Workshop on 'Recommender Systems Evaluation: Dimensions and Design'
(REDD 2014), held in conjunction with RecSys 2014. October 10, 2014, Silicon
Valley, United State
A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics
Numerous statistics have been proposed for the measure of offensive ability
in major league baseball. While some of these measures may offer moderate
predictive power in certain situations, it is unclear which simple offensive
metrics are the most reliable or consistent. We address this issue with a
Bayesian hierarchical model for variable selection to capture which offensive
metrics are most predictive within players across time. Our sophisticated
methodology allows for full estimation of the posterior distributions for our
parameters and automatically adjusts for multiple testing, providing a distinct
advantage over alternative approaches. We implement our model on a set of 50
different offensive metrics and discuss our results in the context of
comparison to other variable selection techniques. We find that 33/50 metrics
demonstrate signal. However, these metrics are highly correlated with one
another and related to traditional notions of performance (e.g., plate
discipline, power, and ability to make contact)
Scalable aggregation predictive analytics: a query-driven machine learning approach
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method
Energy rating of a water pumping station using multivariate analysis
Among water management policies, the preservation and the saving of energy demand in water supply and treatment systems play key roles. When focusing on energy, the customary metric to determine the performance of water supply systems is linked to the definition of component-based energy indicators. This approach is unfit to account for interactions occurring among system elements or between the system and its environment. On the other hand, the development of information technology has led to the availability of increasing large amount of data, typically gathered from distributed sensor networks in so-called smart grids. In this context, data intensive methodologies address the possibility of using complex network modeling approaches, and advocate the issues related to the interpretation and analysis of large amount of data produced by smart sensor networks.
In this perspective, the present work aims to use data intensive techniques in the energy analysis of a water management network.
The purpose is to provide new metrics for the energy rating of the system and to be able to provide insights into the dynamics of its operations. The study applies neural network as a tool to predict energy demand, when using flowrate and vibration data as predictor variables
- …