4,649 research outputs found
Prioritized Metric Structures and Embedding
Metric data structures (distance oracles, distance labeling schemes, routing
schemes) and low-distortion embeddings provide a powerful algorithmic
methodology, which has been successfully applied for approximation algorithms
\cite{llr}, online algorithms \cite{BBMN11}, distributed algorithms
\cite{KKMPT12} and for computing sparsifiers \cite{ST04}. However, this
methodology appears to have a limitation: the worst-case performance inherently
depends on the cardinality of the metric, and one could not specify in advance
which vertices/points should enjoy a better service (i.e., stretch/distortion,
label size/dimension) than that given by the worst-case guarantee.
In this paper we alleviate this limitation by devising a suit of {\em
prioritized} metric data structures and embeddings. We show that given a
priority ranking of the graph vertices (respectively,
metric points) one can devise a metric data structure (respectively, embedding)
in which the stretch (resp., distortion) incurred by any pair containing a
vertex will depend on the rank of the vertex. We also show that other
important parameters, such as the label size and (in some sense) the dimension,
may depend only on . In some of our metric data structures (resp.,
embeddings) we achieve both prioritized stretch (resp., distortion) and label
size (resp., dimension) {\em simultaneously}. The worst-case performance of our
metric data structures and embeddings is typically asymptotically no worse than
of their non-prioritized counterparts.Comment: To appear at STOC 201
Labelings vs. Embeddings: On Distributed Representations of Distances
We investigate for which metric spaces the performance of distance labeling
and of -embeddings differ, and how significant can this difference
be. Recall that a distance labeling is a distributed representation of
distances in a metric space , where each point is assigned a
succinct label, such that the distance between any two points can
be approximated given only their labels. A highly structured special case is an
embedding into , where each point is assigned a vector
such that is approximately . The
performance of a distance labeling or an -embedding is measured
via its distortion and its label-size/dimension.
We also study the analogous question for the prioritized versions of these
two measures. Here, a priority order of the point set
is given, and higher-priority points should have shorter labels. Formally, a
distance labeling has prioritized label-size if every has
label size at most . Similarly, an embedding
has prioritized dimension if is non-zero only in the first
coordinates. In addition, we compare these their prioritized
measures to their classical (worst-case) versions.
We answer these questions in several scenarios, uncovering a surprisingly
diverse range of behaviors. First, in some cases labelings and embeddings have
very similar worst-case performance, but in other cases there is a huge
disparity. However in the prioritized setting, we most often find a strict
separation between the performance of labelings and embeddings. And finally,
when comparing the classical and prioritized settings, we find that the
worst-case bound for label size often ``translates'' to a prioritized one, but
also a surprising exception to this rule
Applications of Nonlinear Optimization
We apply an interior point algorithm to two nonlinear optimization problems and achieve improved results. We also devise an approximate convex functional alternative for use in one of the problems and estimate its accuracy.
The first problem is maximum variance unfolding in machine learning. The traditional method to solve this problem is to convert it to a semi-definite optimization problem by defining a kernel matrix. We obtain better unfolding and higher speeds with the interior point algorithm on the original non-convex problem for data with less than 10,000 points.
The second problem is a multi-objective dose optimization for intensity modulated radiotherapy, whose goals are to achieve high radiation dose on tumors while sparing normal tissues. Due to tumor motions and patient set-up errors, a robust optimization against motion uncertainties is required to deliver a clinically acceptable treatment plan. The traditional method, to irradiate an enlargement of the tumor region, is very conservative and leads to possibly high radiation dose on sensitive structures. We use a new robust optimization model within the framework of goal programming that consists of multiple optimization steps based on prescription priorities. One metric is defined for each structure of interest. A final robustness optimization step then minimizes the variance of all the goal metrics with respect to the motion probability space, and pushes the mean values of these metrics toward a desired value as well. We show similar high dose coverage on example tumors with reduced dose on sensitive structures.
One clinically important metric for a radiation dose distribution, that describes tumor control probability or normal tissue complication probability, is Dx, the minimum dose value on the hottest x% of a structure. It is not mathematically well-behaved, which impedes its use in optimization. We approximate Dx with a linear function of two generalized equivalent uniform dose metrics, also known as lp norms, requiring that the approximation is concave so that its maximization becomes a convex problem. Results with cross validation on a sampling of radiation therapy plans show that the error of this approximation is less than 1 Gy for the most used range 80 to 95 of x values
Enhancing drug and cell line representations via contrastive learning for improved anti-cancer drug prioritization
Due to cancer's complex nature and variable response to therapy, precision
oncology informed by omics sequence analysis has become the current standard of
care. However, the amount of data produced for each patients makes it difficult
to quickly identify the best treatment regimen. Moreover, limited data
availability has hindered computational methods' abilities to learn patterns
associated with effective drug-cell line pairs. In this work, we propose the
use of contrastive learning to improve learned drug and cell line
representations by preserving relationship structures associated with drug
mechanism of action and cell line cancer types. In addition to achieving
enhanced performance relative to a state-of-the-art method, we find that
classifiers using our learned representations exhibit a more balances reliance
on drug- and cell line-derived features when making predictions. This
facilitates more personalized drug prioritizations that are informed by signals
related to drug resistance.Comment: 60 pages, 4 figures, 4 tables, 11 supplementary tables, 1
supplementary note, submitted to Nature Communication
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization
This paper tackles the problem of large-scale image-based localization (IBL)
where the spatial location of a query image is determined by finding out the
most similar reference images in a large database. For solving this problem, a
critical task is to learn discriminative image representation that captures
informative information relevant for localization. We propose a novel
representation learning method having higher location-discriminating power. It
provides the following contributions: 1) we represent a place (location) as a
set of exemplar images depicting the same landmarks and aim to maximize
similarities among intra-place images while minimizing similarities among
inter-place images; 2) we model a similarity measure as a probability
distribution on L_2-metric distances between intra-place and inter-place image
representations; 3) we propose a new Stochastic Attraction and Repulsion
Embedding (SARE) loss function minimizing the KL divergence between the learned
and the actual probability distributions; 4) we give theoretical comparisons
between SARE, triplet ranking and contrastive losses. It provides insights into
why SARE is better by analyzing gradients. Our SARE loss is easy to implement
and pluggable to any CNN. Experiments show that our proposed method improves
the localization performance on standard benchmarks by a large margin.
Demonstrating the broad applicability of our method, we obtained the third
place out of 209 teams in the 2018 Google Landmark Retrieval Challenge. Our
code and model are available at https://github.com/Liumouliu/deepIBL.Comment: ICC
- β¦