288 research outputs found
Efficient AUC Optimization for Information Ranking Applications
Adequate evaluation of an information retrieval system to estimate future
performance is a crucial task. Area under the ROC curve (AUC) is widely used to
evaluate the generalization of a retrieval system. However, the objective
function optimized in many retrieval systems is the error rate and not the AUC
value. This paper provides an efficient and effective non-linear approach to
optimize AUC using additive regression trees, with a special emphasis on the
use of multi-class AUC (MAUC) because multiple relevance levels are widely used
in many ranking applications. Compared to a conventional linear approach, the
performance of the non-linear approach is comparable on binary-relevance
benchmark datasets and is better on multi-relevance benchmark datasets.Comment: 12 page
On Making Good Games - Using Player Virtue Ethics and Gameplay Design Patterns to Identify Generally Desirable Gameplay Features
This paper uses a framework of player virtues to perform a
theoretical exploration of what is required to make a game
good. The choice of player virtues is based upon the view
that games can be seen as implements, and that these are
good if they support an intended use, and the intended use
of games is to support people to be good players. A collection of gameplay design patterns, identified through
their relation to the virtues, is presented to provide specific starting points for considering design options for this type of good games. 24 patterns are identified supporting the virtues, including RISK/REWARD, DYNAMIC ALLIANCES, GAME MASTERS, and PLAYER DECIDED RESULTS, as are 7 countering three or more virtues, including ANALYSIS
PARALYSIS, EARLY ELIMINATION, and GRINDING. The paper concludes by identifying limitations of the approach as well as by showing how it can be applied using other views of what are preferable features in games
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
An automatic critical care urine meter
Nowadays patients admitted to critical care units have most of their physiological parameters measured automatically by sophisticated commercial monitoring devices. More often than not, these devices supervise whether the values of the parameters they measure lie within a pre-established range, and issue warning of deviations from this range by triggering alarms. The automation of measuring and supervising tasks not only discharges the healthcare staff of a considerable workload but also avoids human errors in these repetitive and monotonous tasks. Arguably, the most relevant physiological parameter that is still measured and supervised manually by critical care unit staff is urine output (UO). In this paper we present a patent-pending device that provides continuous and accurate measurements of patient’s UO. The device uses capacitive sensors to take continuous measurements of the height of the column of liquid accumulated in two chambers that make up a plastic container. The first chamber, where the urine inputs, has a small volume. Once it has been filled it overflows into a second bigger chamber. The first chamber provides accurate UO measures of patients whose UO has to be closely supervised, while the second one avoids the need for frequent interventions by the nursing staff to empty the containe
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Family and identity. Catholic and non-Catholic intermarriage: attitudes to children, identity and sharing household responsibilities
info:eu-repo/semantics/publishe
New perspectives on the ecology of tree structure and tree communities through terrestrial laser scanning
Terrestrial laser scanning (TLS) opens up the possibility of describing the three-dimensional structures of trees in natural environments with unprecedented detail and accuracy. It is already being extensively applied to describe how ecosystem biomass and structure vary between sites, but can also facilitate major advances in developing and testing mechanistic theories of tree form and forest structure, thereby enabling us to understand why trees and forests have the biomass and three-dimensional structure they do. Here we focus on the ecological challenges and benefits of understanding tree form, and highlight some advances related to capturing and describing tree shape that are becoming possible with the advent of TLS. We present examples of ongoing work that applies, or could potentially apply, new TLS measurements to better understand the constraints on optimization of tree form. Theories of resource distribution networks, such as metabolic scaling theory, can be tested and further refined. TLS can also provide new approaches to the scaling of woody surface area and crown area, and thereby better quantify the metabolism of trees. Finally, we demonstrate how we can develop a more mechanistic understanding of the effects of avoidance of wind risk on tree form and maximum size. Over the next few years, TLS promises to deliver both major empirical and conceptual advances in the quantitative understanding of trees and tree-dominated ecosystems, leading to advances in understanding the ecology of why trees and ecosystems look and grow the way they do
Особливості формування етнічного складу селянської верстви Степового Побужжя
In this short paper we sketch a brief introduction to our Krimp algorithm. Moreover, we briefly discuss some of the large body of follow up research. Pointers to the relevant papers are provided in the bibliography
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
- …
