1,053 research outputs found

    Measuring the Quality of Machine Learning and Optimization Frameworks

    Get PDF
    Software frameworks are daily and extensively used in research, both for fundamental studies and applications. Researchers usually trust in the quality of these frameworks without any evidence that they are correctly build, indeed they could contain some defects that potentially could affect to thousands of already published and future papers. Considering the important role of these frameworks in the current state-of-the-art in research, their quality should be quantified to show the weaknesses and strengths of each software package. In this paper we study the main static quality properties, defined in the product quality model proposed by the ISO 25010 standard, of ten well-known frameworks. We provide a quality rating for each characteristic depending on the severity of the issues detected in the analysis. In addition, we propose an overall quality rating of 12 levels (ranging from A+ to D-) considering the ratings of all characteristics. As a result, we have data evidence to claim that the analysed frameworks are not in a good shape, because the best overall rating is just a C+ for Mahout framework, i.e., all packages need to go for a revision in the analysed features. Focusing on the characteristics individually, maintainability is by far the one which needs the biggest effort to fix the found defects. On the other hand, performance obtains the best average rating, a result which conforms to our expectations because frameworks’ authors used to take care about how fast their software runs.University of Malaga. Campus de Excelencia Internacional AndalucĂ­a Tech. We would like to say thank you to all authors of these frameworks that make research easier for all of us. This research has been partially funded by CELTIC C2017/2-2 in collaboration with companies EMERGYA and SECMOTIC with contracts #8.06/5.47.4997 and #8.06/5.47.4996. It has also been funded by the Spanish Ministry of Science and Innovation and /Junta de Andalucı́a/FEDER under contracts TIN2014-57341-R and TIN2017-88213-R, the network of smart cities CI-RTI (TIN2016-81766-REDT

    OpenML Benchmarking Suites

    Full text link
    Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. Therefore, we advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites are (a) easy to use through standardized data formats, APIs, and client libraries; (b) machine-readable, with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We also present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18)

    Providing intelligent decision support systems with flexible data-intensive case-based reasoning

    Get PDF
    In this paper we present a flexible CBR shell for Data-Intensive Case-Based Reasoning Systems which is fully integrated in an Intelligent Data Analysis Tool entitled GESCONDA. The main subgoal of the developed tool is to create a CBR Shell where no fixed domain exists and where letting the expert/user creates (models) his/her own domain. From an abstract point of view, the definition of the CBR can be seen as a methodology composed by four phases and each phase offers different ways to be solved. Then, since the CBR shell is integrated in GESCONDA, it inherits all its functionalities which cover the whole knowledge discovery and data mining process and also, CBR can complement its phases with this functionality. As a result, GESCONDA becomes an intelligent decision support tool which encompasses a number of advantages including domain independence, incremental learning, platform independence and generality.Peer ReviewedPostprint (published version

    CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

    Full text link
    Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

    OpenML: networked science in machine learning

    Full text link
    Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems. We discuss how OpenML relates to other examples of networked science and what benefits it brings for machine learning research, individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure
    • 

    corecore