4 research outputs found
System Scoring Using Partial Prior Information
ABSTRACT We introduce smoothing of retrieval effectiveness scores, which balances results from prior incomplete query sets against limited additional complete information, in order to obtain more refined system orderings than would be possible on the new queries alone
SleepNet: Attention-Enhanced Robust Sleep Prediction using Dynamic Social Networks
Sleep behavior significantly impacts health and acts as an indicator of
physical and mental well-being. Monitoring and predicting sleep behavior with
ubiquitous sensors may therefore assist in both sleep management and tracking
of related health conditions. While sleep behavior depends on, and is reflected
in the physiology of a person, it is also impacted by external factors such as
digital media usage, social network contagion, and the surrounding weather. In
this work, we propose SleepNet, a system that exploits social contagion in
sleep behavior through graph networks and integrates it with physiological and
phone data extracted from ubiquitous mobile and wearable devices for predicting
next-day sleep labels about sleep duration. Our architecture overcomes the
limitations of large-scale graphs containing connections irrelevant to sleep
behavior by devising an attention mechanism. The extensive experimental
evaluation highlights the improvement provided by incorporating social networks
in the model. Additionally, we conduct robustness analysis to demonstrate the
system's performance in real-life conditions. The outcomes affirm the stability
of SleepNet against perturbations in input data. Further analyses emphasize the
significance of network topology in prediction performance revealing that users
with higher eigenvalue centrality are more vulnerable to data perturbations.Comment: Accepted for publication in Proceedings of the ACM on Interactive,
Mobile, Wearable and Ubiquitous Technologies (IMWUT), 8 (March 2024
Towards Meaningful Statements in IR Evaluation. Mapping Evaluation Measures to Interval Scales
Recently, it was shown that most popular IR measures are not interval-scaled,
implying that decades of experimental IR research used potentially improper
methods, which may have produced questionable results. However, it was unclear
if and to what extent these findings apply to actual evaluations and this
opened a debate in the community with researchers standing on opposite
positions about whether this should be considered an issue (or not) and to what
extent.
In this paper, we first give an introduction to the representational
measurement theory explaining why certain operations and significance tests are
permissible only with scales of a certain level. For that, we introduce the
notion of meaningfulness specifying the conditions under which the truth (or
falsity) of a statement is invariant under permissible transformations of a
scale. Furthermore, we show how the recall base and the length of the run may
make comparison and aggregation across topics problematic.
Then we propose a straightforward and powerful approach for turning an
evaluation measure into an interval scale, and describe an experimental
evaluation of the differences between using the original measures and the
interval-scaled ones.
For all the regarded measures - namely Precision, Recall, Average Precision,
(Normalized) Discounted Cumulative Gain, Rank-Biased Precision and Reciprocal
Rank - we observe substantial effects, both on the order of average values and
on the outcome of significance tests. For the latter, previously significant
differences turn out to be insignificant, while insignificant ones become
significant. The effect varies remarkably between the tests considered but
overall, on average, we observed a 25% change in the decision about which
systems are significantly different and which are not