73 research outputs found

    Multilabel Classification with R Package mlr

    Full text link
    We implemented several multilabel classification algorithms in the machine learning package mlr. The implemented methods are binary relevance, classifier chains, nested stacking, dependent binary relevance and stacking, which can be used with any base learner that is accessible in mlr. Moreover, there is access to the multilabel classification versions of randomForestSRC and rFerns. All these methods can be easily compared by different implemented multilabel performance measures and resampling methods in the standardized mlr framework. In a benchmark experiment with several multilabel datasets, the performance of the different methods is evaluated.Comment: 18 pages, 2 figures, to be published in R Journal; reference correcte

    OpenML Benchmarking Suites

    Full text link
    Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. Therefore, we advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites are (a) easy to use through standardized data formats, APIs, and client libraries; (b) machine-readable, with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We also present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18)

    On the value of popular crystallographic databases for machine learning prediction of space groups

    Get PDF
    Predicting crystal structure information is a challenging problem in materials science that clearly benefits from artificial intelligence approaches. The leading strategies in machine learning are notoriously data-hungry and although a handful of large crystallographic databases are currently available, their predictive quality has never been assessed. In this article, we have employed composition-driven machine learning models, as well as deep learning, to predict space groups from well known experimental and theoretical databases. The results generated by comprehensive testing indicate that data-abundant repositories such as COD (Crystallography Open Database) and OQMD (Open Quantum Materials Database) do not provide the best models even for heavily populated space groups. Classification models trained on databases such as the Pearson Crystal Database and ICSD (Inorganic Crystal Structure Database), and to a lesser extent the Materials Project, generally outperform their data-richer counterparts due to more balanced distributions of the representative classes. Experimental validation with novel high entropy compounds was used to confirm the predictive value of the different databases and showcase the scope of the machine learning approaches employed.publishedVersio

    Multi-Label Learning with Label Enhancement

    Full text link
    The task of multi-label learning is to predict a set of relevant labels for the unseen instance. Traditional multi-label learning algorithms treat each class label as a logical indicator of whether the corresponding label is relevant or irrelevant to the instance, i.e., +1 represents relevant to the instance and -1 represents irrelevant to the instance. Such label represented by -1 or +1 is called logical label. Logical label cannot reflect different label importance. However, for real-world multi-label learning problems, the importance of each possible label is generally different. For the real applications, it is difficult to obtain the label importance information directly. Thus we need a method to reconstruct the essential label importance from the logical multilabel data. To solve this problem, we assume that each multi-label instance is described by a vector of latent real-valued labels, which can reflect the importance of the corresponding labels. Such label is called numerical label. The process of reconstructing the numerical labels from the logical multi-label data via utilizing the logical label information and the topological structure in the feature space is called Label Enhancement. In this paper, we propose a novel multi-label learning framework called LEMLL, i.e., Label Enhanced Multi-Label Learning, which incorporates regression of the numerical labels and label enhancement into a unified framework. Extensive comparative studies validate that the performance of multi-label learning can be improved significantly with label enhancement and LEMLL can effectively reconstruct latent label importance information from logical multi-label data.Comment: ICDM 201

    Identifying Incident Casual Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach

    Get PDF
    Aviation is a complicated transportation system, and safety is of paramount importance because aircraft failure often involves casualties. Prevention is clearly the best strategy for aviation transportation safety. Learning from past incident data to prevent potential accidents from happening has proved to be a successful approach. To prevent potential safety hazards and make effective prevention plans, aviation safety experts identify primary and contributing factors from incident reports. However, safety experts’ review processes have become prohibitively expensive nowadays. The number of incident reports is increasing rapidly due to the acceleration of advances in information technologies and the growth of the commercial and private aviation transportation industries. Consequently, advanced text mining algorithms should be applied to help aviation safety experts facilitate the process of incident data extraction. This paper focuses on constructing deep-learning-based models to identify causal factors from incident reports. First, we prepare the data sets used for training, validation, and testing with approximately 200,000 qualified incident reports from the Aviation Safety Reporting System (ASRS). Then, we take an open-source natural language model, which is well trained with a large corpus of Wikipedia texts, as the baseline and fine-tune it with the texts in incident reports to make it more suited to our specific research task. Finally, we build and train an attention-based long short-term memory (LSTM) model to identify primary and contributing factors in each incident report. The solution we propose has multilabel capability and is automated and customizable, and it is more accurate and adaptable than traditional machine learning methods in extant research. This novel application of deep learning algorithms to the incident reporting system can efficiently improve aviation safety

    Multi-Instance Multilabel Learning with Weak-Label for Predicting Protein Function in Electricigens

    Get PDF
    Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer from weak-label problem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation
    • …
    corecore