Search CORE

37,993 research outputs found

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

Clustering as an example of optimizing arbitrarily chosen objective functions

Author: A. Dempster
A. Fraser
A. Jain
J. Dunn
M. Budka
R. Dubes
R. Duda
R. Hamming
R. Jenssen
R. Sibson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This paper is a reflection upon a common practice of solving various types of learning problems by optimizing arbitrarily chosen criteria in the hope that they are well correlated with the criterion actually used for assessment of the results. This issue has been investigated using clustering as an example, hence a unified view of clustering as an optimization problem is first proposed, stemming from the belief that typical design choices in clustering, like the number of clusters or similarity measure can be, and often are suboptimal, also from the point of view of clustering quality measures later used for algorithm comparison and ranking. In order to illustrate our point we propose a generalized clustering framework and provide a proof-of-concept using standard benchmark datasets and two popular clustering methods for comparison

Crossref

Bournemouth University Research Online

What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

Author: He Yandong
Lim Ming
Pratap Saurabh
Zhou Fuli
Publication venue: 'Emerald'
Publication date: 01/01/2019
Field of study

Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective

Coventry University Pure Portal

Enlighten

Recommended from our members

State-of-the-art on research and applications of machine learning in the building life cycle

Author: Hong T
Luo X
Wang Z
Zhang W
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science

eScholarship - University of California

Hyperparameter Importance Across Datasets

Author: Bergstra J.
Bonilla E. V.
Brazdil P.
Demvsar J.
Feurer M.
Jamieson K.
Joachims T.
Klein A.
Li L.
Loshchilov I.
Sobol I. M.
van Rijn J. N.
van Rijn J. N.
Wistuba M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2018
Field of study

With the advent of automated machine learning, automated hyperparameter optimization methods are by now routinely used in data mining. However, this progress is not yet matched by equal progress on automatic analyses that yield information beyond performance-optimizing hyperparameter settings. In this work, we aim to answer the following two questions: Given an algorithm, what are generally its most important hyperparameters, and what are typically good values for these? We present methodology and a framework to answer these questions based on meta-learning across many datasets. We apply this methodology using the experimental meta-data available on OpenML to determine the most important hyperparameters of support vector machines, random forests and Adaboost, and to infer priors for all their hyperparameters. The results, obtained fully automatically, provide a quantitative basis to focus efforts in both manual algorithm design and in automated hyperparameter optimization. The conducted experiments confirm that the hyperparameters selected by the proposed method are indeed the most important ones and that the obtained priors also lead to statistically significant improvements in hyperparameter optimization.Comment: \c{opyright} 2018. Copyright is held by the owner/author(s). Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use, not for redistribution. The definitive Version of Record was published in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Minin

arXiv.org e-Print Archive

Crossref