28,108 research outputs found
What's unusual in online disease outbreak news?
Background: Accurate and timely detection of public health events of
international concern is necessary to help support risk assessment and response
and save lives. Novel event-based methods that use the World Wide Web as a
signal source offer potential to extend health surveillance into areas where
traditional indicator networks are lacking. In this paper we address the issue
of systematically evaluating online health news to support automatic alerting
using daily disease-country counts text mined from real world data using
BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration
detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance
against expert moderated ProMED-mail postings. Results: We report sensitivity,
specificity, positive predictive value (PPV), negative predictive value (NPV),
mean alerts/100 days and F1, at 95% confidence interval (CI) for 287
ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period.
Results indicate that W2 had the best F1 with a slight benefit for day of week
effect over C2. In drill down analysis we indicate issues arising from the
granular choice of country-level modeling, sudden drops in reporting due to day
of week effects and reporting bias. Automatic alerting has been implemented in
BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news
alerts have the potential to enhance manual analytical methods by increasing
throughput, timeliness and detection rates. Systematic evaluation of health
news aberrations is necessary to push forward our understanding of the complex
relationship between news report volumes and case numbers and to select the
best performing features and algorithms
Mining Frequent Itemsets Using Genetic Algorithm
In general frequent itemsets are generated from large data sets by applying
association rule mining algorithms like Apriori, Partition, Pincer-Search,
Incremental, Border algorithm etc., which take too much computer time to
compute all the frequent itemsets. By using Genetic Algorithm (GA) we can
improve the scenario. The major advantage of using GA in the discovery of
frequent itemsets is that they perform global search and its time complexity is
less compared to other algorithms as the genetic algorithm is based on the
greedy approach. The main aim of this paper is to find all the frequent
itemsets from given data sets using genetic algorithm
Predictive User Modeling with Actionable Attributes
Different machine learning techniques have been proposed and used for
modeling individual and group user needs, interests and preferences. In the
traditional predictive modeling instances are described by observable
variables, called attributes. The goal is to learn a model for predicting the
target variable for unseen instances. For example, for marketing purposes a
company consider profiling a new user based on her observed web browsing
behavior, referral keywords or other relevant information. In many real world
applications the values of some attributes are not only observable, but can be
actively decided by a decision maker. Furthermore, in some of such applications
the decision maker is interested not only to generate accurate predictions, but
to maximize the probability of the desired outcome. For example, a direct
marketing manager can choose which type of a special offer to send to a client
(actionable attribute), hoping that the right choice will result in a positive
response with a higher probability. We study how to learn to choose the value
of an actionable attribute in order to maximize the probability of a desired
outcome in predictive modeling. We emphasize that not all instances are equally
sensitive to changes in actions. Accurate choice of an action is critical for
those instances, which are on the borderline (e.g. users who do not have a
strong opinion one way or the other). We formulate three supervised learning
approaches for learning to select the value of an actionable attribute at an
instance level. We also introduce a focused training procedure which puts more
emphasis on the situations where varying the action is the most likely to take
the effect. The proof of concept experimental validation on two real-world case
studies in web analytics and e-learning domains highlights the potential of the
proposed approaches
Analysis of monotonicity properties of some rule interestingness measures
One of the crucial problems in the field of knowledge discovery is development of good interestingness measures for evaluation of the discovered patterns. In this paper, we consider quantitative, objective interestingness measures for "if..., then... " association rules. We focus on three popular interestingness measures, namely rule interest function of Piatetsky-Shapiro, gain measure of Fukuda et al., and dependency factor used by Pawlak. We verify whether they satisfy the valuable property M of monotonic dependency on the number of objects satisfying or not the premise or the conclusion of a rule, and property of hypothesis symmetry (HS). Moreover, analytically and through experiments we show an interesting relationship between those measures and two other commonly used measures of rule support and anti-support
- …