330 research outputs found
Search Result Clustering via Randomized Partitioning of Query-Induced Subgraphs
In this paper, we present an approach to search result clustering, using
partitioning of underlying link graph. We define the notion of "query-induced
subgraph" and formulate the problem of search result clustering as a problem of
efficient partitioning of given subgraph into topic-related clusters. Also, we
propose a novel algorithm for approximative partitioning of such graph, which
results in cluster quality comparable to the one obtained by deterministic
algorithms, while operating in more efficient computation time, suitable for
practical implementations. Finally, we present a practical clustering search
engine developed as a part of this research and use it to get results about
real-world performance of proposed concepts.Comment: 16th Telecommunications Forum TELFOR 200
Robustness in sparse linear models: relative efficiency based on robust approximate message passing
Understanding efficiency in high dimensional linear models is a longstanding
problem of interest. Classical work with smaller dimensional problems dating
back to Huber and Bickel has illustrated the benefits of efficient loss
functions. When the number of parameters is of the same order as the sample
size , , an efficiency pattern different from the one of Huber
was recently established. In this work, we consider the effects of model
selection on the estimation efficiency of penalized methods. In particular, we
explore whether sparsity, results in new efficiency patterns when . In
the interest of deriving the asymptotic mean squared error for regularized
M-estimators, we use the powerful framework of approximate message passing. We
propose a novel, robust and sparse approximate message passing algorithm
(RAMP), that is adaptive to the error distribution. Our algorithm includes many
non-quadratic and non-differentiable loss functions. We derive its asymptotic
mean squared error and show its convergence, while allowing , with and . We identify new
patterns of relative efficiency regarding a number of penalized estimators,
when is much larger than . We show that the classical information bound
is no longer reachable, even for light--tailed error distributions. We show
that the penalized least absolute deviation estimator dominates the penalized
least square estimator, in cases of heavy--tailed distributions. We observe
this pattern for all choices of the number of non-zero parameters , both and . In non-penalized problems where ,
the opposite regime holds. Therefore, we discover that the presence of model
selection significantly changes the efficiency patterns.Comment: 49 pages, 10 figure
Breaking the curse of dimensionality in regression
Models with many signals, high-dimensional models, often impose structures on
the signal strengths. The common assumption is that only a few signals are
strong and most of the signals are zero or close (collectively) to zero.
However, such a requirement might not be valid in many real-life applications.
In this article, we are interested in conducting large-scale inference in
models that might have signals of mixed strengths. The key challenge is that
the signals that are not under testing might be collectively non-negligible
(although individually small) and cannot be accurately learned. This article
develops a new class of tests that arise from a moment matching formulation. A
virtue of these moment-matching statistics is their ability to borrow strength
across features, adapt to the sparsity size and exert adjustment for testing
growing number of hypothesis. GRoup-level Inference of Parameter, GRIP, test
harvests effective sparsity structures with hypothesis formulation for an
efficient multiple testing procedure. Simulated data showcase that GRIPs error
control is far better than the alternative methods. We develop a minimax
theory, demonstrating optimality of GRIP for a broad range of models, including
those where the model is a mixture of a sparse and high-dimensional dense
signals.Comment: 51 page
Historical development of judo
This is an Accepted Manuscript of a book chapter published by Routledge in The Science of Judo on 14 June 2018, available online: https://www.routledge.com/The-Science-of-Judo/Callan/p/book/9780815349136Judo has its roots in the pre-history and mythology of Japan. Legend has it that the origin of the Imperial line is the result of a hand-to-hand wrestling match between two gods, when Takemikazuchi threw Takenimakata (Ashkenazi, 2008), whilst Nomi no Sukune is regarded as the creator of the earliest form of sumo, following a famous match in 23 bc at the request of Emperor Suinin (Guttmann & Thompson, 2001) and Mifune refers to him as the very founder of judo (Mifune, 1956). Early sumo was known as sumai (to struggle). Sumai, applied to combat, became known as kumiuchi (grappling in armour) (Levinson & Christensen, 1996). Kumiuchi is still seen today within Koshiki no Kata, which is required to be demonstrated for a Kōdōkan promotion to 8th Dan. Through the Muromachi and Sengoku periods of Japanese history (1333-1568) combat systems involving archery, swordsmanship and spearmanship were developed as the various clans battled with each other (Nippon-Budōkan, 2009). The feudal warrior class, samurai or bushi, trained in several martial arts, the collective term for these was bugei. The samurai or bushi culture and lifestyle was known as bushido. However the introduction of the musket gun in 1543 changed warfare and led to armour becoming lighter. This meant that there were greater possibilities for movement in combat once the warrior was unarmed (Hoare, 2009).Peer reviewedFinal Accepted Versio
Synthetic learner: model-free inference on treatments over time
Understanding of the effect of a particular treatment or a policy pertains to
many areas of interest -- ranging from political economics, marketing to
health-care and personalized treatment studies. In this paper, we develop a
non-parametric, model-free test for detecting the effects of treatment over
time that extends widely used Synthetic Control tests. The test is built on
counterfactual predictions arising from many learning algorithms. In the
Neyman-Rubin potential outcome framework with possible carry-over effects, we
show that the proposed test is asymptotically consistent for stationary, beta
mixing processes. We do not assume that class of learners captures the correct
model necessarily. We also discuss estimates of the average treatment effect,
and we provide regret bounds on the predictive performance. To the best of our
knowledge, this is the first set of results that allow for example any Random
Forest to be useful for provably valid statistical inference in the Synthetic
Control setting. In experiments, we show that our Synthetic Learner is
substantially more powerful than classical methods based on Synthetic Control
or Difference-in-Differences, especially in the presence of non-linear outcome
models
High-dimensional semi-supervised learning: in search for optimal inference of the mean
We provide a high-dimensional semi-supervised inference framework focused on
the mean and variance of the response. Our data are comprised of an extensive
set of observations regarding the covariate vectors and a much smaller set of
labeled observations where we observe both the response as well as the
covariates. We allow the size of the covariates to be much larger than the
sample size and impose weak conditions on a statistical form of the data. We
provide new estimators of the mean and variance of the response that extend
some of the recent results presented in low-dimensional models. In particular,
at times we will not necessitate consistent estimation of the functional form
of the data. Together with estimation of the population mean and variance, we
provide their asymptotic distribution and confidence intervals where we
showcase gains in efficiency compared to the sample mean and variance. Our
procedure, with minor modifications, is then presented to make important
contributions regarding inference about average treatment effects. We also
investigate the robustness of estimation and coverage and showcase widespread
applicability and generality of the proposed method
- …