258 research outputs found

    Federated Learning from Small Datasets

    Get PDF
    Federated learning allows multiple parties to collaboratively train a joint model without sharing local data. This enables applications of machine learning in settings of inherently distributed, undisclosable data such as in the medical domain. In practice, joint training is usually achieved by aggregating local models, for which local training objectives have to be in expectation similar to the joint (global) objective. Often, however, local datasets are so small that local objectives differ greatly from the global objective, resulting in federated learning to fail. We propose a novel approach that intertwines model aggregations with permutations of local models. The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains. This enables training on extremely small local datasets, such as patient data across hospitals, while retaining the training efficiency and privacy benefits of federated learning

    Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

    Get PDF
    Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-kk subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time

    Label-Descriptive Patterns and their Application to Characterizing Classification Errors

    Get PDF
    State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction. We show this is an instance of the more general label description problem, which we formulate in terms of the Minimum Description Length principle. To discover good pattern sets we propose the efficient and hyperparameter-free Premise algorithm, which through an extensive set of experiments we show on both synthetic and real-world data performs very well in practice; unlike existing solutions it ably recovers ground truth patterns, even on highly imbalanced data over many unique items, or where patterns are only weakly associated to labels. Through two real-world case studies we confirm that Premise gives clear and actionable insight into the systematic errors made by modern NLP classifiers

    Interactive and Iterative Discovery of Entity Network Subgraphs

    Get PDF
    Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions

    Bilateral multiple coronary artery fistulae with angina pectoris and syncope

    Get PDF
    AbstractCoronary artery fistulae (CAF) are rare cardiac anomalies. They frequently arise from the right coronary artery (RCA) with fistulae originating from the left anterior descending artery (LAD) or from multiple arteries being less common. They do not usually cause symptoms and are incidentally diagnosed on routine cardiac imaging. We report a 70years old male patient presenting with chest pain and syncope during physical activity. Diagnostic coronary angiography revealed bilateral multiple CAF originating from both the LAD and RCA. As high blood flow output was recognized in these large vascular anomalies contributing to ‘steal phenomenon’ surgical intervention was planned. This manuscript aimed to present the case and review the current literature for the management and treatment of these coronary anomalies

    Особливості формування етнічного складу селянської верстви Степового Побужжя

    Get PDF
    In this short paper we sketch a brief introduction to our Krimp algorithm. Moreover, we briefly discuss some of the large body of follow up research. Pointers to the relevant papers are provided in the bibliography
    corecore