939 research outputs found

    Multi-label classification using ensembles of pruned sets

    Get PDF
    This paper presents a Pruned Sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account correlations between labels. By pruning these sets, PS focuses only on the most important correlations, which reduces complexity and improves accuracy. By combining pruned sets in an ensemble scheme (EPS), new label sets can be formed to adapt to irregular or complex data. The results from experimental evaluation on a variety of multi-label datasets show that [E]PS can achieve better performance and train much faster than other multi-label methods

    Efficient multi-label classification for evolving data streams

    Get PDF
    Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. This paper proposes a new experimental framework for studying multi-label evolving stream classification, and new efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. We present a Multi-label Hoeffding Tree with multilabel classifiers at the leaves as a base classifier. We obtain fast and accurate methods, that are well suited for this challenging multi-label classification streaming task. Using the new experimental framework, we test our methodology by performing an evaluation study on synthetic and real-world datasets. In comparison to well-known batch multi-label methods, we obtain encouraging results

    Evaluation methods and decision theory for classification of streaming data with temporal dependence

    Get PDF
    Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over time, and models that update themselves during operation are becoming the state-of-the-art. This paper formalizes a learning and evaluation scheme of such predictive models. We theoretically analyze evaluation of classifiers on streaming data with temporal dependence. Our findings suggest that the commonly accepted data stream classification measures, such as classification accuracy and Kappa statistic, fail to diagnose cases of poor performance when temporal dependence is present, therefore they should not be used as sole performance indicators. Moreover, classification accuracy can be misleading if used as a proxy for evaluating change detectors with datasets that have temporal dependence. We formulate the decision theory for streaming data classification with temporal dependence and develop a new evaluation methodology for data stream classification that takes temporal dependence into account. We propose a combined measure for classification performance, that takes into account temporal dependence, and we recommend using it as the main performance measure in classification of streaming data

    Enzyme-linked immunosorbent assay for urinary albumin at low concentrations

    Get PDF
    We describe an enzyme-linked immunosorbent assay (ELISA) for urinary albumin. It requires only commercially available reagents, can detect as little as 16 micrograms of albumin per liter, and analytical recovery ranges from 92 to 116%. The assay is simple, rapid, and inexpensive. Albumin excretion was 6.2 (SD 4.1) mg/24 h in healthy subjects (n = 40), 14.7 (SD 7.2) mg/24 h in albumin-test-strip-negative Type I diabetics (n = 11), and 19.7 (SD 16.2) mg/24 h in patients with essential hypertension (n = 12)

    Scalable Multi-label Classification

    Get PDF
    Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety of approaches, drawing inspiration from many spheres of machine learning, have been able to model these correlations. However, data sources from the real world are growing ever larger and the multi-label task is particularly sensitive to this due to the complexity associated with multiple labels and the correlations between them. Consequently, many methods do not scale up to large problems. This thesis deals with scalable multi-label classification: methods which exhibit high predictive performance, but are also able to scale up to larger problems. The first major contribution is the pruned sets method, which is able to model label correlations directly for high predictive performance, but reduces overfitting and complexity over related methods by pruning and subsampling label sets, and can thus scale up to larger datasets. The second major contribution is the classifier chains method, which models correlations with a chain of binary classifiers. The use of binary models allows for scalability to even larger datasets. Pruned sets and classifier chains are robust with respect to both the variety and scale of data that they can deal with, and can be incorporated into other methods. In an ensemble scheme, these methods are able to compete with state-of-the-art methods in terms of predictive performance as well as scale up to large datasets of hundreds of thousands of training examples. This thesis also puts a special emphasis on multi-label evaluation; introducing a new evaluation measure and studying threshold calibration. With one of the largest and most varied collections of multi-label datasets in the literature, extensive experimental evaluation shows the advantage of these methods, both in terms of predictive performance, and computational efficiency and scalability

    MEKA: A multi-label/multi-target extension to WEKA

    Get PDF
    Multi-label classification has rapidly attracted interest in the machine learning literature, and there are now a large number and considerable variety of methods for this type of learning. We present MEKA: an open-source Java framework based on the well-known WEKA library. MEKA provides interfaces to facilitate practical application, and a wealth of multi-label classifiers, evaluation metrics, and tools for multi-label experiments and development. It supports multi-label and multi-target data, including in incremental and semi- supervised contexts

    Genetic Modulation of Training and Transfer in Older Adults: BDNF Val66Met Polymorphism is Associated with Wider Useful Field of View

    Get PDF
    Western society has an increasing proportion of older adults. Increasing age is associated with a general decrease in the control over task-relevant mental processes. In the present study we investigated the possibility that successful transfer of game-based cognitive improvements to untrained tasks in elderly people is modulated by preexisting neuro-developmental factors as genetic variability related to levels of the brain-derived neurotrophic factor (BDNF), an important neuromodulator underlying cognitive processes. We trained participants, genotyped for the BDNF Val66Met polymorphism, on cognitive tasks developed to improve dynamic attention. Pre-training (baseline) and post-training measures of attentional processes (divided and selective attention) were acquired by means of the useful field of view task. As expected, Val/Val homozygous individuals showed larger beneficial transfer effects than Met/-carriers. Our findings support the idea that genetic predisposition modulates transfer effects

    A 37 kb region upstream of brachyury comprising a notochord enhancer is essential for notochord and tail development

    Get PDF
    The node-streak border region comprising notochord progenitor cells (NPCs) at the posterior node and neuro-mesodermal progenitor cells (NMPs) in the adjacent epiblast is the prime organizing center for axial elongation in mouse embryos. The T-box transcription factor brachyury (T) is essential for both formation of the notochord and maintenance of NMPs, and thus is a key regulator of trunk and tail development. The T promoter controlling T expression in NMPs and nascent mesoderm has been characterized in detail; however, control elements for T expression in the notochord have not been identified yet. We have generated a series of deletion alleles by CRISPR/Cas9 genome editing in mESCs, and analyzed their effects in mutant mouse embryos. We identified a 37 kb region upstream of T that is essential for notochord function and tailbud outgrowth. Within that region, we discovered a T-binding enhancer required for notochord cell specification and differentiation. Our data reveal a complex regulatory landscape controlling cell type-specific expression and function of T in NMP/nascent mesoderm and node/notochord, allowing proper trunk and tail development

    Classifier chains: A review and perspectives

    Get PDF
    The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves chaining together off-the-shelf binary classifiers in a directed structure, such that individual label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of the underlying mechanism and efficacy, and investigation into how it could be improved. In the recent decade, numerous studies have explored the theoretical underpinnings of classifier chains, and many improvements have been made to the training and inference procedures, such that this method remains among the best options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining key issues for future research

    Adaptive random forests for evolving data stream classification

    Get PDF
    Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources
    corecore