4,022 research outputs found
Multidimensional Scaling on Multiple Input Distance Matrices
Multidimensional Scaling (MDS) is a classic technique that seeks vectorial
representations for data points, given the pairwise distances between them.
However, in recent years, data are usually collected from diverse sources or
have multiple heterogeneous representations. How to do multidimensional scaling
on multiple input distance matrices is still unsolved to our best knowledge. In
this paper, we first define this new task formally. Then, we propose a new
algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering
each input distance matrix as one view. Our algorithm is able to learn the
weights of views (i.e., distance matrices) automatically by exploring the
consensus information and complementary nature of views. Experimental results
on synthetic as well as real datasets demonstrate the effectiveness of MVMDS.
We hope that our work encourages a wider consideration in many domains where
MDS is needed
LOCK-IN EFFECTS OF EU R&D SPENDING ON REGIONAL GROWTH. A NON-PARAMETRIC AND SEMI-PARAMETRIC CONDITIONAL QUANTILE REGRESSIONS APPROACH
The purpose of this paper is twofold. First, we study the allocation of European Union (EU) expenditure in Research and Development (R&D) across European regions. Second, we focus on the effects of this variable on regional per capita GDP levels, and on regional growth rates. Using non-parametric and semi-parametric conditional quantiles, we found empirical evidence in favour of different effects of R&D expenditure among conditional quantiles of the per capita income distribution, and of the growth rates distribution. Moreover, we find a ?lock-in effect? of R&D spending. A positive relation between growth rates and this component of the EU expenditure is estimated for regions with higher growth rates, with these regions tending to have a higher and common growth rate as R&D expenditure increases. Furthermore, slow growth regions seem to approach to a common but lower growth rate. The estimates relative to the relationship between the per capita regional GDP and the R&D spending confirm these findings. El objetivo de este trabajo es doble. Primero, se asignan los gastos enInvestigación y Desarrollo (I+D) de la Unión Europea (EU) entre las regiones de lospaíses miembros. Segundo, se estudian los efectos de dicha variable sobre ladistribución del Producto Interior Bruto (PIB) per cápita y sobre la distribución de lastasas de crecimiento del PIB. Utilizando estimaciones cuantílicas condicionales noparamétricasy semi-paramétricas se encuentra evidencia empírica de efectosdiferenciados de los gastos en I+D sobre los cuantiles de dichas distribuciones. Además,se encuentra un efecto ¿cerrojo¿ del gasto europeo en I+D: existe una relación positivaentre este tipo de gasto y las tasas de crecimiento del PIB para regiones con altas tasasde crecimiento. Las regiones con bajos niveles de crecimiento tienden a crecer a tasasinferiores. Las estimaciones relativas al efecto del gasto en I+D sobre la distribución delPIB per capita de las regiones europeas confirman dichos resultados.Presupuesto EU, Gasto I+D, Crecimiento, Cuantiles Condicionales. EU Budget, R&D Expenditure, Growth Rates, Conditional Quantiles.
Monotone deep Boltzmann machines
Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods
ever studied, are multi-layered probabilistic models governed by a pairwise
energy function that describes the likelihood of all variables/nodes in the
network. In practice, DBMs are often constrained, i.e., via the
\emph{restricted} Boltzmann machine (RBM) architecture (which does not permit
intra-layer connections), in order to allow for more efficient inference. In
this work, we revisit the generic DBM approach, and ask the question: are there
other possible restrictions to their design that would enable efficient
(approximate) inference? In particular, we develop a new class of restricted
model, the monotone DBM, which allows for arbitrary self-connection in each
layer, but restricts the \emph{weights} in a manner that guarantees the
existence and global uniqueness of a mean-field fixed point. To do this, we
leverage tools from the recently-proposed monotone Deep Equilibrium model and
show that a particular choice of activation results in a fixed-point iteration
that gives a variational mean-field solution. While this approach is still
largely conceptual, it is the first architecture that allows for efficient
approximate inference in fully-general weight structures for DBMs. We apply
this approach to simple deep convolutional Boltzmann architectures and
demonstrate that it allows for tasks such as the joint completion and
classification of images, within a single deep probabilistic setting, while
avoiding the pitfalls of mean-field inference in traditional RBMs
RIFLE: Robust Inference from Low Order Marginals
The ubiquity of missing values in real-world datasets poses a challenge for
statistical inference and can prevent similar datasets from being analyzed in
the same study, precluding many existing datasets from being used for new
analyses. While an extensive collection of packages and algorithms have been
developed for data imputation, the overwhelming majority perform poorly if
there are many missing values and low sample size, which are unfortunately
common characteristics in empirical data. Such low-accuracy estimations
adversely affect the performance of downstream statistical models. We develop a
statistical inference framework for predicting the target variable without
imputing missing values. Our framework, RIFLE (Robust InFerence via Low-order
moment Estimations), estimates low-order moments with corresponding confidence
intervals to learn a distributionally robust model. We specialize our framework
to linear regression and normal discriminant analysis, and we provide
convergence and performance guarantees. This framework can also be adapted to
impute missing data. In numerical experiments, we compare RIFLE with
state-of-the-art approaches (including MICE, Amelia, MissForest, KNN-imputer,
MIDA, and Mean Imputer). Our experiments demonstrate that RIFLE outperforms
other benchmark algorithms when the percentage of missing values is high and/or
when the number of data points is relatively small. RIFLE is publicly
available.Comment: 32 pages, 10 figure
Multiple imputation with compatibility for high-dimensional data
Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue
On some approximately balanced combinatorial cooperative games
A model of taxation for cooperativen-person games is introduced where proper coalitions Are taxed proportionally to their value. Games with non-empty core under taxation at rateɛ-balanced. Sharp bounds onɛ in matching games (not necessarily bipartite) graphs are estabLished. Upper and lower bounds on the smallestɛ in bin packing games are derived and euclidean random TSP games are seen to be, with high probability,ɛ-balanced forɛ≈0.06
- …