44,238 research outputs found
IPC: A Benchmark Data Set for Learning with Graph-Structured Data
Benchmark data sets are an indispensable ingredient of the evaluation of
graph-based machine learning methods. We release a new data set, compiled from
International Planning Competitions (IPC), for benchmarking graph
classification, regression, and related tasks. Apart from the graph
construction (based on AI planning problems) that is interesting in its own
right, the data set possesses distinctly different characteristics from
popularly used benchmarks. The data set, named IPC, consists of two
self-contained versions, grounded and lifted, both including graphs of large
and skewedly distributed sizes, posing substantial challenges for the
computation of graph models such as graph kernels and graph neural networks.
The graphs in this data set are directed and the lifted version is acyclic,
offering the opportunity of benchmarking specialized models for directed
(acyclic) structures. Moreover, the graph generator and the labeling are
computer programmed; thus, the data set may be extended easily if a larger
scale is desired. The data set is accessible from
\url{https://github.com/IBM/IPC-graph-data}.Comment: ICML 2019 Workshop on Learning and Reasoning with Graph-Structured
Data. The data set is accessible from https://github.com/IBM/IPC-graph-dat
Benchmarking Graph Neural Networks
Graph neural networks (GNNs) have become the standard toolkit for analyzing
and learning from data on graphs. As the field grows, it becomes critical to
identify key architectures and validate new ideas that generalize to larger,
more complex datasets. Unfortunately, it has been increasingly difficult to
gauge the effectiveness of new models in the absence of a standardized
benchmark with consistent experimental settings. In this paper, we introduce a
reproducible GNN benchmarking framework, with the facility for researchers to
add new models conveniently for arbitrary datasets. We demonstrate the
usefulness of our framework by presenting a principled investigation into the
recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph
convolutional networks (GCNs) for a variety of graph tasks, i.e. graph
regression/classification and node/link prediction, with medium-scale datasets.Comment: Benchmarking framework on GitHub at
https://github.com/graphdeeplearning/benchmarking-gnn
Random forest versus logistic regression: A large-scale benchmark experiment
BACKGROUND AND GOAL The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS In this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases. CONCLUSION RF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =0.022,0.038) for the accuracy, 0.041 (95{\%}-CI =0.031,0.053) for the Area Under the Curve, and - 0.027 (95{\%}-CI =-0.034,-0.021) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
Benchmark of machine learning methods for classification of a Sentinel-2 image
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of
remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue
since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and
orientations.
In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and
classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear
discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered
perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an
independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution
images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few
samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree
plantations (v) grasslands.
Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the
training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five
accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of
data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from
validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from
0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its
ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable
performanc
- …