25,541 research outputs found
A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models
Penalized estimation has become an established tool for regularization and model selection in regression models.
A variety of penalties with specific features are available
and effective algorithms for specific penalties have been proposed.
But not much is available to fit models that call for a combination of different penalties.
When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model.
Algorithms that can deal with such problems, are in demand.
We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models.
The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded.
The approximation allows to combine all these penalties within one model.
The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement.
Moreover, new penalties can be incorporated quickly.
The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector.
Some illustrative examples and the model for the Munich rent data show promising results
Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review
The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features
Parametric Local Metric Learning for Nearest Neighbor Classification
We study the problem of learning local metrics for nearest neighbor
classification. Most previous works on local metric learning learn a number of
local unrelated metrics. While this "independence" approach delivers an
increased flexibility its downside is the considerable risk of overfitting. We
present a new parametric local metric learning method in which we learn a
smooth metric matrix function over the data manifold. Using an approximation
error bound of the metric matrix function we learn local metrics as linear
combinations of basis metrics defined on anchor points over different regions
of the instance space. We constrain the metric matrix function by imposing on
the linear combinations manifold regularization which makes the learned metric
matrix function vary smoothly along the geodesics of the data manifold. Our
metric learning method has excellent performance both in terms of predictive
power and scalability. We experimented with several large-scale classification
problems, tens of thousands of instances, and compared it with several state of
the art metric learning methods, both global and local, as well as to SVM with
automatic kernel selection, all of which it outperforms in a significant
manner
- …