31,149 research outputs found
From Group Recommendations to Group Formation
There has been significant recent interest in the area of group
recommendations, where, given groups of users of a recommender system, one
wants to recommend top-k items to a group that maximize the satisfaction of the
group members, according to a chosen semantics of group satisfaction. Examples
semantics of satisfaction of a recommended itemset to a group include the
so-called least misery (LM) and aggregate voting (AV). We consider the
complementary problem of how to form groups such that the users in the formed
groups are most satisfied with the suggested top-k recommendations. We assume
that the recommendations will be generated according to one of the two group
recommendation semantics - LM or AV. Rather than assuming groups are given, or
rely on ad hoc group formation dynamics, our framework allows a strategic
approach for forming groups of users in order to maximize satisfaction. We show
that the problem is NP-hard to solve optimally under both semantics.
Furthermore, we develop two efficient algorithms for group formation under LM
and show that they achieve bounded absolute error. We develop efficient
heuristic algorithms for group formation under AV. We validate our results and
demonstrate the scalability and effectiveness of our group formation algorithms
on two large real data sets.Comment: 14 pages, 22 figure
What's unusual in online disease outbreak news?
Background: Accurate and timely detection of public health events of
international concern is necessary to help support risk assessment and response
and save lives. Novel event-based methods that use the World Wide Web as a
signal source offer potential to extend health surveillance into areas where
traditional indicator networks are lacking. In this paper we address the issue
of systematically evaluating online health news to support automatic alerting
using daily disease-country counts text mined from real world data using
BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration
detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance
against expert moderated ProMED-mail postings. Results: We report sensitivity,
specificity, positive predictive value (PPV), negative predictive value (NPV),
mean alerts/100 days and F1, at 95% confidence interval (CI) for 287
ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period.
Results indicate that W2 had the best F1 with a slight benefit for day of week
effect over C2. In drill down analysis we indicate issues arising from the
granular choice of country-level modeling, sudden drops in reporting due to day
of week effects and reporting bias. Automatic alerting has been implemented in
BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news
alerts have the potential to enhance manual analytical methods by increasing
throughput, timeliness and detection rates. Systematic evaluation of health
news aberrations is necessary to push forward our understanding of the complex
relationship between news report volumes and case numbers and to select the
best performing features and algorithms
Recommended from our members
Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time
We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
Towards cross-lingual alerting for bursty epidemic events
Background: Online news reports are increasingly becoming a source for event
based early warning systems that detect natural disasters. Harnessing the
massive volume of information available from multilingual newswire presents as
many challenges as opportunities due to the patterns of reporting complex
spatiotemporal events. Results: In this article we study the problem of
utilising correlated event reports across languages. We track the evolution of
16 disease outbreaks using 5 temporal aberration detection algorithms on
text-mined events classified according to disease and outbreak country. Using
ProMED reports as a silver standard, comparative analysis of news data for 13
languages over a 129 day trial period showed improved sensitivity, F1 and
timeliness across most models using cross-lingual events. We report a detailed
case study analysis for Cholera in Angola 2010 which highlights the challenges
faced in correlating news events with the silver standard. Conclusions: The
results show that automated health surveillance using multilingual text mining
has the potential to turn low value news into high value alerts if informed
choices are used to govern the selection of models and data sources. An
implementation of the C2 alerting algorithm using multilingual news is
available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup
Constraint-wish and satisfied-dissatisfied: an overview of two approaches for dealing with bipolar querying
In recent years, there has been an increasing interest in dealing with user preferences in flexible database querying, expressing both positive and negative information in a heterogeneous way. This is what is usually referred to as bipolar database querying. Different frameworks have been introduced to deal with such bipolarity. In this chapter, an overview of two approaches is given. The first approach is based on mandatory and desired requirements. Hereby the complement of a mandatory requirement can be considered as a specification of what is not desired at all. So, mandatory requirements indirectly contribute to negative information (expressing what the user does not want to retrieve), whereas desired requirements can be seen as positive information (expressing what the user prefers to retrieve). The second approach is directly based on positive requirements (expressing what the user wants to retrieve), and negative requirements (expressing what the user does not want to retrieve). Both approaches use pairs of satisfaction degrees as the underlying framework but have different semantics, and thus also different operators for criteria evaluation, ranking, aggregation, etc
- …