2,993 research outputs found

    Hyperparameter Importance Across Datasets

    Full text link
    With the advent of automated machine learning, automated hyperparameter optimization methods are by now routinely used in data mining. However, this progress is not yet matched by equal progress on automatic analyses that yield information beyond performance-optimizing hyperparameter settings. In this work, we aim to answer the following two questions: Given an algorithm, what are generally its most important hyperparameters, and what are typically good values for these? We present methodology and a framework to answer these questions based on meta-learning across many datasets. We apply this methodology using the experimental meta-data available on OpenML to determine the most important hyperparameters of support vector machines, random forests and Adaboost, and to infer priors for all their hyperparameters. The results, obtained fully automatically, provide a quantitative basis to focus efforts in both manual algorithm design and in automated hyperparameter optimization. The conducted experiments confirm that the hyperparameters selected by the proposed method are indeed the most important ones and that the obtained priors also lead to statistically significant improvements in hyperparameter optimization.Comment: \c{opyright} 2018. Copyright is held by the owner/author(s). Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use, not for redistribution. The definitive Version of Record was published in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Minin

    A crowdsourcing recommendation model for image annotations in cultural heritage platforms

    Get PDF
    Cultural heritage is one of many fields that has seen a significant digital transformation in the form of digitization and asset annotations for heritage preservation, inheritance, and dissemination. However, a lack of accurate and descriptive metadata in this field has an impact on the usability and discoverability of digital content, affecting cultural heritage platform visitors and resulting in an unsatisfactory user experience as well as limiting processing capabilities to add new functionalities. Over time, cultural heritage institutions were responsible for providing metadata for their collection items with the help of professionals, which is expensive and requires significant effort and time. In this sense, crowdsourcing can play a significant role in digital transformation or massive data processing, which can be useful for leveraging the crowd and enriching the metadata quality of digital cultural content. This paper focuses on a very important challenge faced by cultural heritage crowdsourcing platforms, which is how to attract users and make such activities enjoyable for them in order to achieve higher-quality annotations. One way to address this is to offer personalized interesting items based on each user preference, rather than making the user experience random and demanding. Thus, we present an image annotation recommendation system for users of cultural heritage platforms. The recommendation system design incorporates various technologies intending to help users in selecting the best matching images for annotations based on their interests and characteristics. Different classification methods were implemented to validate the accuracy of our work on Egyptian heritage.Agencia Estatal de Investigación | Ref. TIN2017-87604-RXunta de Galicia | Ref. ED431B 2020/3

    Respiration-Based COPD Detection Using UWB Radar Incorporation with Machine Learning

    Get PDF
    COPD is a progressive disease that may lead to death if not diagnosed and treated at an early stage. The examination of vital signs such as respiration rate is a promising approach for the detection of COPD. However, simultaneous consideration of the demographic and medical characteristics of patients is very important for better results. The objective of this research is to investigate the capability of UWB radar as a non-invasive approach to discriminate COPD patients from healthy subjects. The non-invasive approach is beneficial in pandemics such as the ongoing COVID-19 pandemic, where a safe distance between people needs to be maintained. The raw data are collected in a real environment (a hospital) non-invasively from a distance of 1.5 m. Respiration data are then extracted from the collected raw data using signal processing techniques. It was observed that the respiration rate of COPD patients alone is not enough for COPD patient detection. However, incorporating additional features such as age, gender, and smoking history with the respiration rate lead to robust performance. Different machine-learning classifiers, including Naïve Bayes, support vector machine, random forest, k nearest neighbor (KNN), Adaboost, and two deep-learning models—a convolutional neural network and a long short-term memory (LSTM) network—were utilized for COPD detection. Experimental results indicate that LSTM outperforms all employed models and obtained 93% accuracy. Performance comparison with existing studies corroborates the superior performance of the proposed approach

    Computationally intensive, distributed and decentralised machine learning: from theory to applications

    Get PDF
    Machine learning (ML) is currently one of the most important research fields, spanning computer science, statistics, pattern recognition, data mining, and predictive analytics. It plays a central role in automatic data processing and analysis in numerous research domains owing to widely distributed and geographically scattered data sources, powerful computing clouds, and high digitisation requirements. However, aspects such as the accuracy of methods, data privacy, and model explainability remain challenging and require additional research. Therefore, it is necessary to analyse centralised and distributed data processing architectures, and to create novel computationally intensive explainable and privacy-preserving ML methods, to investigate their properties, to propose distributed versions of prospective ML baseline methods, and to evaluate and apply these in various applications. This thesis addresses the theoretical and practical aspects of state-of-the-art ML methods. The contributions of this thesis are threefold. In Chapter 2, novel non-distributed, centralised, computationally intensive ML methods are proposed, their properties are investigated, and state-of-the-art ML methods are applied to real-world data from two domains, namely transportation and bioinformatics. Moreover, algorithms for ‘black-box’ model interpretability are presented. Decentralised ML methods are considered in Chapter 3. First, we investigate data processing as a preliminary step in data-driven, agent-based decision-making. Thereafter, we propose novel decentralised ML algorithms that are based on the collaboration of the local models of agents. Within this context, we consider various regression models. Finally, the explainability of multiagent decision-making is addressed. In Chapter 4, we investigate distributed centralised ML methods. We propose a distributed parallelisation algorithm for the semi-parametric and non-parametric regression types, and implement these in the computational environment and data structures of Apache SPARK. Scalability, speed-up, and goodness-of-fit experiments using real-world data demonstrate the excellent performance of the proposed methods. Moreover, the federated deep-learning approach enables us to address the data privacy challenges caused by processing of distributed private data sources to solve the travel-time prediction problem. Finally, we propose an explainability strategy to interpret the influence of the input variables on this federated deep-learning application. This thesis is based on the contribution made by 11 papers to the theoretical and practical aspects of state-of-the-art and proposed ML methods. We successfully address the stated challenges with various data processing architectures, validate the proposed approaches in diverse scenarios from the transportation and bioinformatics domains, and demonstrate their effectiveness in scalability, speed-up, and goodness-of-fit experiments with real-world data. However, substantial future research is required to address the stated challenges and to identify novel issues in ML. Thus, it is necessary to advance the theoretical part by creating novel ML methods and investigating their properties, as well as to contribute to the application part by using of the state-of-the-art ML methods and their combinations, and interpreting their results for different problem setting
    • …
    corecore