1,055 research outputs found

    NIPS - Not Even Wrong? A Systematic Review of Empirically Complete Demonstrations of Algorithmic Effectiveness in the Machine Learning and Artificial Intelligence Literature

    Get PDF
    Objective: To determine the completeness of argumentative steps necessary to conclude effectiveness of an algorithm in a sample of current ML/AI supervised learning literature. Data Sources: Papers published in the Neural Information Processing Systems (NeurIPS, n\'ee NIPS) journal where the official record showed a 2017 year of publication. Eligibility Criteria: Studies reporting a (semi-)supervised model, or pre-processing fused with (semi-)supervised models for tabular data. Study Appraisal: Three reviewers applied the assessment criteria to determine argumentative completeness. The criteria were split into three groups, including: experiments (e.g real and/or synthetic data), baselines (e.g uninformed and/or state-of-art) and quantitative comparison (e.g. performance quantifiers with confidence intervals and formal comparison of the algorithm against baselines). Results: Of the 121 eligible manuscripts (from the sample of 679 abstracts), 99\% used real-world data and 29\% used synthetic data. 91\% of manuscripts did not report an uninformed baseline and 55\% reported a state-of-art baseline. 32\% reported confidence intervals for performance but none provided references or exposition for how these were calculated. 3\% reported formal comparisons. Limitations: The use of one journal as the primary information source may not be representative of all ML/AI literature. However, the NeurIPS conference is recognised to be amongst the top tier concerning ML/AI studies, so it is reasonable to consider its corpus to be representative of high-quality research. Conclusion: Using the 2017 sample of the NeurIPS supervised learning corpus as an indicator for the quality and trustworthiness of current ML/AI research, it appears that complete argumentative chains in demonstrations of algorithmic effectiveness are rare

    Setting Parameters for Biological Models With ANIMO

    Get PDF
    ANIMO (Analysis of Networks with Interactive MOdeling) is a software for modeling biological networks, such as e.g. signaling, metabolic or gene networks. An ANIMO model is essentially the sum of a network topology and a number of interaction parameters. The topology describes the interactions between biological entities in form of a graph, while the parameters determine the speed of occurrence of such interactions. When a mismatch is observed between the behavior of an ANIMO model and experimental data, we want to update the model so that it explains the new data. In general, the topology of a model can be expanded with new (known or hypothetical) nodes, and enables it to match experimental data. However, the unrestrained addition of new parts to a model causes two problems: models can become too complex too fast, to the point of being intractable, and too many parts marked as "hypothetical" or "not known" make a model unrealistic. Even if changing the topology is normally the easier task, these problems push us to try a better parameter fit as a first step, and resort to modifying the model topology only as a last resource. In this paper we show the support added in ANIMO to ease the task of expanding the knowledge on biological networks, concentrating in particular on the parameter settings

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class

    To Tolerate or To Impute Missing Values in V2X Communications Data?

    Get PDF
    Misbehavior detection is a critical task in vehicular ad hoc networks. However, state-of-the-art data-driven techniques for misbehavior detection are usually conducted through complete V2X communication data collected from simulated experiments. This thesis evaluates the main strategies for the treatment of missing values to find out the best match for misbehavior detection with incomplete V2X communication data. This thesis proposes three novel methods for imputing and tolerating missing data. The first two are novel imputation methods that are based on cooperative clustering and collaborative clustering. The latter is a missing-tolerant method that is an ensemble learning based on the random subspace selection and Dempster-Shafer fusion. The effectiveness of the proposed techniques is evaluated in the ground truth vehicular reference misbehavior data. Moreover, a multi-factor amputation framework has been developed to induce missingness over V2X communication data with different missing ratios, mechanisms, and distributions. This framework provides a comprehensive benchmark resembling missingness over V2X communication data. The proposed methods are compared with some missing-tolerant and imputation methods. The attained results over benchmark data are analyzed and indicated the winner treatments in each aspect
    • …
    corecore