16,962 research outputs found

    A System for Accessible Artificial Intelligence

    Full text link
    While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 worksho

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    Predicting and Evaluating Software Model Growth in the Automotive Industry

    Full text link
    The size of a software artifact influences the software quality and impacts the development process. In industry, when software size exceeds certain thresholds, memory errors accumulate and development tools might not be able to cope anymore, resulting in a lengthy program start up times, failing builds, or memory problems at unpredictable times. Thus, foreseeing critical growth in software modules meets a high demand in industrial practice. Predicting the time when the size grows to the level where maintenance is needed prevents unexpected efforts and helps to spot problematic artifacts before they become critical. Although the amount of prediction approaches in literature is vast, it is unclear how well they fit with prerequisites and expectations from practice. In this paper, we perform an industrial case study at an automotive manufacturer to explore applicability and usability of prediction approaches in practice. In a first step, we collect the most relevant prediction approaches from literature, including both, approaches using statistics and machine learning. Furthermore, we elicit expectations towards predictions from practitioners using a survey and stakeholder workshops. At the same time, we measure software size of 48 software artifacts by mining four years of revision history, resulting in 4,547 data points. In the last step, we assess the applicability of state-of-the-art prediction approaches using the collected data by systematically analyzing how well they fulfill the practitioners' expectations. Our main contribution is a comparison of commonly used prediction approaches in a real world industrial setting while considering stakeholder expectations. We show that the approaches provide significantly different results regarding prediction accuracy and that the statistical approaches fit our data best

    Node Classification in Uncertain Graphs

    Full text link
    In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classification accuracy in the underlying network may be affected adversely. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a first-class citizen. We propose two techniques based on a Bayes model and automatic parameter selection, and show that the incorporation of uncertainty in the classification process as a first-class citizen is beneficial. We experimentally evaluate the proposed approach using different real data sets, and study the behavior of the algorithms under different conditions. The results demonstrate the effectiveness and efficiency of our approach

    NeuroSVM: A Graphical User Interface for Identification of Liver Patients

    Full text link
    Diagnosis of liver infection at preliminary stage is important for better treatment. In todays scenario devices like sensors are used for detection of infections. Accurate classification techniques are required for automatic identification of disease samples. In this context, this study utilizes data mining approaches for classification of liver patients from healthy individuals. Four algorithms (Naive Bayes, Bagging, Random forest and SVM) were implemented for classification using R platform. Further to improve the accuracy of classification a hybrid NeuroSVM model was developed using SVM and feed-forward artificial neural network (ANN). The hybrid model was tested for its performance using statistical parameters like root mean square error (RMSE) and mean absolute percentage error (MAPE). The model resulted in a prediction accuracy of 98.83%. The results suggested that development of hybrid model improved the accuracy of prediction. To serve the medicinal community for prediction of liver disease among patients, a graphical user interface (GUI) has been developed using R. The GUI is deployed as a package in local repository of R platform for users to perform prediction.Comment: 9 pages, 6 figure

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
    corecore