780 research outputs found
Multiple imputation and selection of ordinal level 2 predictors in multilevel models. An analysis of the relationship between student ratings and teacher beliefs and practices
The paper is motivated by the analysis of the relationship between ratings
and teacher practices and beliefs, which are measured via a set of binary and
ordinal items collected by a specific survey with nearly half missing
respondents. The analysis, which is based on a two-level random effect model,
must face two about the items measuring teacher practices and beliefs: (i)
these items level 2 predictors severely affected by missingness; (ii) there is
redundancy in the number of items and the number of categories of their
measurement scale. tackle the first issue by considering a multiple imputation
strategy based on information at both level 1 and level 2. For the second
issue, we consider regularization techniques for ordinal predictors, also
accounting for the multilevel data structure. The proposed solution combines
existing methods in an original way to solve specific problem at hand, but it
is generally applicable to settings requiring to select predictors affected by
missing values. The results obtained with the final model out that some teacher
practices and beliefs are significantly related to ratings about teacher
ability to motivate students.Comment: Presented at the 12th International Multilevel Conference is held
April 9-10, 2019 , Utrech
LOCALLY REGULARIZED LINEAR REGRESSION IN THE VALUATION OF REAL ESTATE
Regression methods are used for the valuation of real estate in the comparative approach. The basis for the valuation is a data set of similar properties, for which sales transactions were concluded within a short period of time. Large and standardized databases, which meet the requirements of the Polish Financial Supervision Authority, are created in Poland and used by the banks involved in mortgage lending, for example. We assume that in the case of large data sets of transactions, it is more advantageous to build local regression models than a global model. Additionally, we propose a local feature selection via regularization. The empirical research carried out on three data sets from real estate market confirmed the effectiveness of this approach. We paid special attention to the model quality assessment using cross-validation for estimation of the residual standard error
lassopack: Model selection and prediction with regularized regression in Stata
This article introduces lassopack, a suite of programs for regularized
regression in Stata. lassopack implements lasso, square-root lasso, elastic
net, ridge regression, adaptive lasso and post-estimation OLS. The methods are
suitable for the high-dimensional setting where the number of predictors
may be large and possibly greater than the number of observations, . We
offer three different approaches for selecting the penalization (`tuning')
parameters: information criteria (implemented in lasso2), -fold
cross-validation and -step ahead rolling cross-validation for cross-section,
panel and time-series data (cvlasso), and theory-driven (`rigorous')
penalization for the lasso and square-root lasso for cross-section and panel
data (rlasso). We discuss the theoretical framework and practical
considerations for each approach. We also present Monte Carlo results to
compare the performance of the penalization approaches.Comment: 52 pages, 6 figures, 6 tables; submitted to Stata Journal; for more
information see https://statalasso.github.io
A General Framework for Fast Stagewise Algorithms
Forward stagewise regression follows a very simple strategy for constructing
a sequence of sparse regression estimates: it starts with all coefficients
equal to zero, and iteratively updates the coefficient (by a small amount
) of the variable that achieves the maximal absolute inner product
with the current residual. This procedure has an interesting connection to the
lasso: under some conditions, it is known that the sequence of forward
stagewise estimates exactly coincides with the lasso path, as the step size
goes to zero. Furthermore, essentially the same equivalence holds
outside of least squares regression, with the minimization of a differentiable
convex loss function subject to an norm constraint (the stagewise
algorithm now updates the coefficient corresponding to the maximal absolute
component of the gradient).
Even when they do not match their -constrained analogues, stagewise
estimates provide a useful approximation, and are computationally appealing.
Their success in sparse modeling motivates the question: can a simple,
effective strategy like forward stagewise be applied more broadly in other
regularization settings, beyond the norm and sparsity? The current
paper is an attempt to do just this. We present a general framework for
stagewise estimation, which yields fast algorithms for problems such as
group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure
Analysis of overfitting in the regularized Cox model
The Cox proportional hazards model is ubiquitous in the analysis of
time-to-event data. However, when the data dimension p is comparable to the
sample size , maximum likelihood estimates for its regression parameters are
known to be biased or break down entirely due to overfitting. This prompted the
introduction of the so-called regularized Cox model. In this paper we use the
replica method from statistical physics to investigate the relationship between
the true and inferred regression parameters in regularized multivariate Cox
regression with L2 regularization, in the regime where both p and N are large
but with p/N ~ O(1). We thereby generalize a recent study from maximum
likelihood to maximum a posteriori inference. We also establish a relationship
between the optimal regularization parameter and p/N, allowing for
straightforward overfitting corrections in time-to-event analysis
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
- …