423,629 research outputs found
Combined optimization of feature selection and algorithm parameters in machine learning of language
Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the 'right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach. Using automatic word sense disambiguation as an example task, we show that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization. We propose genetic algorithms as a practical approach to achieve both higher accuracy within a single approach, and more reliable comparisons
ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions
Networks are powerful data structures, but are challenging to work with for
conventional machine learning methods. Network Embedding (NE) methods attempt
to resolve this by learning vector representations for the nodes, for
subsequent use in downstream machine learning tasks.
Link Prediction (LP) is one such downstream machine learning task that is an
important use case and popular benchmark for NE methods. Unfortunately, while
NE methods perform exceedingly well at this task, they are lacking in
transparency as compared to simpler LP approaches.
We introduce ExplaiNE, an approach to offer counterfactual explanations for
NE-based LP methods, by identifying existing links in the network that explain
the predicted links. ExplaiNE is applicable to a broad class of NE algorithms.
An extensive empirical evaluation for the NE method `Conditional Network
Embedding' in particular demonstrates its accuracy and scalability
Lasso based feature selection for malaria risk exposure prediction
In life sciences, the experts generally use empirical knowledge to recode
variables, choose interactions and perform selection by classical approach. The
aim of this work is to perform automatic learning algorithm for variables
selection which can lead to know if experts can be help in they decision or
simply replaced by the machine and improve they knowledge and results. The
Lasso method can detect the optimal subset of variables for estimation and
prediction under some conditions. In this paper, we propose a novel approach
which uses automatically all variables available and all interactions. By a
double cross-validation combine with Lasso, we select a best subset of
variables and with GLM through a simple cross-validation perform predictions.
The algorithm assures the stability and the the consistency of estimators.Comment: in Petra Perner. Machine Learning and Data Mining in Pattern
Recognition, Jul 2015, Hamburg, Germany. Ibai publishing, 2015, Machine
Learning and Data Mining in Pattern Recognition (proceedings of 11th
International Conference, MLDM 2015
Hyperparameter optimization with approximate gradient
Most models in machine learning contain at least one hyperparameter to
control for model complexity. Choosing an appropriate set of hyperparameters is
both crucial in terms of model accuracy and computationally challenging. In
this work we propose an algorithm for the optimization of continuous
hyperparameters using inexact gradient information. An advantage of this method
is that hyperparameters can be updated before model parameters have fully
converged. We also give sufficient conditions for the global convergence of
this method, based on regularity conditions of the involved functions and
summability of errors. Finally, we validate the empirical performance of this
method on the estimation of regularization constants of L2-regularized logistic
regression and kernel Ridge regression. Empirical benchmarks indicate that our
approach is highly competitive with respect to state of the art methods.Comment: Proceedings of the International conference on Machine Learning
(ICML
- …