31 research outputs found
Fast and Robust Rank Aggregation against Model Misspecification
In rank aggregation, preferences from different users are summarized into a
total order under the homogeneous data assumption. Thus, model misspecification
arises and rank aggregation methods take some noise models into account.
However, they all rely on certain noise model assumptions and cannot handle
agnostic noises in the real world. In this paper, we propose CoarsenRank, which
rectifies the underlying data distribution directly and aligns it to the
homogeneous data assumption without involving any noise model. To this end, we
define a neighborhood of the data distribution over which Bayesian inference of
CoarsenRank is performed, and therefore the resultant posterior enjoys
robustness against model misspecification. Further, we derive a tractable
closed-form solution for CoarsenRank making it computationally efficient.
Experiments on real-world datasets show that CoarsenRank is fast and robust,
achieving consistent improvement over baseline methods
Sanitized Clustering against Confounding Bias
Real-world datasets inevitably contain biases that arise from different
sources or conditions during data collection. Consequently, such inconsistency
itself acts as a confounding factor that disturbs the cluster analysis.
Existing methods eliminate the biases by projecting data onto the orthogonal
complement of the subspace expanded by the confounding factor before
clustering. Therein, the interested clustering factor and the confounding
factor are coarsely considered in the raw feature space, where the correlation
between the data and the confounding factor is ideally assumed to be linear for
convenient solutions. These approaches are thus limited in scope as the data in
real applications is usually complex and non-linearly correlated with the
confounding factor. This paper presents a new clustering framework named
Sanitized Clustering Against confounding Bias (SCAB), which removes the
confounding factor in the semantic latent space of complex data through a
non-linear dependence measure. To be specific, we eliminate the bias
information in the latent space by minimizing the mutual information between
the confounding factor and the latent representation delivered by Variational
Auto-Encoder (VAE). Meanwhile, a clustering module is introduced to cluster
over the purified latent representations. Extensive experiments on complex
datasets demonstrate that our SCAB achieves a significant gain in clustering
performance by removing the confounding bias. The code is available at
\url{https://github.com/EvaFlower/SCAB}.Comment: Machine Learning, in pres
Coarse-to-Fine Contrastive Learning on Graphs
Inspired by the impressive success of contrastive learning (CL), a variety of
graph augmentation strategies have been employed to learn node representations
in a self-supervised manner. Existing methods construct the contrastive samples
by adding perturbations to the graph structure or node attributes. Although
impressive results are achieved, it is rather blind to the wealth of prior
information assumed: with the increase of the perturbation degree applied on
the original graph, 1) the similarity between the original graph and the
generated augmented graph gradually decreases; 2) the discrimination between
all nodes within each augmented view gradually increases. In this paper, we
argue that both such prior information can be incorporated (differently) into
the contrastive learning paradigm following our general ranking framework. In
particular, we first interpret CL as a special case of learning to rank (L2R),
which inspires us to leverage the ranking order among positive augmented views.
Meanwhile, we introduce a self-ranking paradigm to ensure that the
discriminative information among different nodes can be maintained and also be
less altered to the perturbations of different degrees. Experiment results on
various benchmark datasets verify the effectiveness of our algorithm compared
with the supervised and unsupervised models
Earning Extra Performance from Restrictive Feedbacks
Many machine learning applications encounter a situation where model
providers are required to further refine the previously trained model so as to
gratify the specific need of local users. This problem is reduced to the
standard model tuning paradigm if the target data is permissibly fed to the
model. However, it is rather difficult in a wide range of practical cases where
target data is not shared with model providers but commonly some evaluations
about the model are accessible. In this paper, we formally set up a challenge
named \emph{Earning eXtra PerformancE from restriCTive feEDdbacks} (EXPECTED)
to describe this form of model tuning problems. Concretely, EXPECTED admits a
model provider to access the operational performance of the candidate model
multiple times via feedback from a local user (or a group of users). The goal
of the model provider is to eventually deliver a satisfactory model to the
local user(s) by utilizing the feedbacks. Unlike existing model tuning methods
where the target data is always ready for calculating model gradients, the
model providers in EXPECTED only see some feedbacks which could be as simple as
scalars, such as inference accuracy or usage rate. To enable tuning in this
restrictive circumstance, we propose to characterize the geometry of the model
performance with regard to model parameters through exploring the parameters'
distribution. In particular, for the deep models whose parameters distribute
across multiple layers, a more query-efficient algorithm is further
tailor-designed that conducts layerwise tuning with more attention to those
layers which pay off better. Our theoretical analyses justify the proposed
algorithms from the aspects of both efficacy and efficiency. Extensive
experiments on different applications demonstrate that our work forges a sound
solution to the EXPECTED problem.Comment: Accepted by IEEE TPAMI in April 202