288 research outputs found

    Efficient AUC Optimization for Information Ranking Applications

    Full text link
    Adequate evaluation of an information retrieval system to estimate future performance is a crucial task. Area under the ROC curve (AUC) is widely used to evaluate the generalization of a retrieval system. However, the objective function optimized in many retrieval systems is the error rate and not the AUC value. This paper provides an efficient and effective non-linear approach to optimize AUC using additive regression trees, with a special emphasis on the use of multi-class AUC (MAUC) because multiple relevance levels are widely used in many ranking applications. Compared to a conventional linear approach, the performance of the non-linear approach is comparable on binary-relevance benchmark datasets and is better on multi-relevance benchmark datasets.Comment: 12 page

    On Making Good Games - Using Player Virtue Ethics and Gameplay Design Patterns to Identify Generally Desirable Gameplay Features

    Get PDF
    This paper uses a framework of player virtues to perform a theoretical exploration of what is required to make a game good. The choice of player virtues is based upon the view that games can be seen as implements, and that these are good if they support an intended use, and the intended use of games is to support people to be good players. A collection of gameplay design patterns, identified through their relation to the virtues, is presented to provide specific starting points for considering design options for this type of good games. 24 patterns are identified supporting the virtues, including RISK/REWARD, DYNAMIC ALLIANCES, GAME MASTERS, and PLAYER DECIDED RESULTS, as are 7 countering three or more virtues, including ANALYSIS PARALYSIS, EARLY ELIMINATION, and GRINDING. The paper concludes by identifying limitations of the approach as well as by showing how it can be applied using other views of what are preferable features in games

    Learning what matters - Sampling interesting patterns

    Get PDF
    In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals. To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in 1) weighted sampling in SAT and 2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user's interests. We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality-diversity and exploitation-exploration when compared to existing methods.Comment: PAKDD 2017, extended versio

    An automatic critical care urine meter

    Get PDF
    Nowadays patients admitted to critical care units have most of their physiological parameters measured automatically by sophisticated commercial monitoring devices. More often than not, these devices supervise whether the values of the parameters they measure lie within a pre-established range, and issue warning of deviations from this range by triggering alarms. The automation of measuring and supervising tasks not only discharges the healthcare staff of a considerable workload but also avoids human errors in these repetitive and monotonous tasks. Arguably, the most relevant physiological parameter that is still measured and supervised manually by critical care unit staff is urine output (UO). In this paper we present a patent-pending device that provides continuous and accurate measurements of patient’s UO. The device uses capacitive sensors to take continuous measurements of the height of the column of liquid accumulated in two chambers that make up a plastic container. The first chamber, where the urine inputs, has a small volume. Once it has been filled it overflows into a second bigger chamber. The first chamber provides accurate UO measures of patients whose UO has to be closely supervised, while the second one avoids the need for frequent interventions by the nursing staff to empty the containe

    Flexible constrained sampling with guarantees for pattern mining

    Get PDF
    Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

    Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules

    Full text link
    Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape

    New perspectives on the ecology of tree structure and tree communities through terrestrial laser scanning

    Get PDF
    Terrestrial laser scanning (TLS) opens up the possibility of describing the three-dimensional structures of trees in natural environments with unprecedented detail and accuracy. It is already being extensively applied to describe how ecosystem biomass and structure vary between sites, but can also facilitate major advances in developing and testing mechanistic theories of tree form and forest structure, thereby enabling us to understand why trees and forests have the biomass and three-dimensional structure they do. Here we focus on the ecological challenges and benefits of understanding tree form, and highlight some advances related to capturing and describing tree shape that are becoming possible with the advent of TLS. We present examples of ongoing work that applies, or could potentially apply, new TLS measurements to better understand the constraints on optimization of tree form. Theories of resource distribution networks, such as metabolic scaling theory, can be tested and further refined. TLS can also provide new approaches to the scaling of woody surface area and crown area, and thereby better quantify the metabolism of trees. Finally, we demonstrate how we can develop a more mechanistic understanding of the effects of avoidance of wind risk on tree form and maximum size. Over the next few years, TLS promises to deliver both major empirical and conceptual advances in the quantitative understanding of trees and tree-dominated ecosystems, leading to advances in understanding the ecology of why trees and ecosystems look and grow the way they do

    Особливості формування етнічного складу селянської верстви Степового Побужжя

    Get PDF
    In this short paper we sketch a brief introduction to our Krimp algorithm. Moreover, we briefly discuss some of the large body of follow up research. Pointers to the relevant papers are provided in the bibliography

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset
    corecore