2,203 research outputs found

### Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

This paper is concerned with the interplay between statistical asymmetry and
spectral methods. Suppose we are interested in estimating a rank-1 and
symmetric matrix $\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}$, yet only a
randomly perturbed version $\mathbf{M}$ is observed. The noise matrix
$\mathbf{M}-\mathbf{M}^{\star}$ is composed of zero-mean independent (but not
necessarily homoscedastic) entries and is, therefore, not symmetric in general.
This might arise, for example, when we have two independent samples for each
entry of $\mathbf{M}^{\star}$ and arrange them into an {\em asymmetric} data
matrix $\mathbf{M}$. The aim is to estimate the leading eigenvalue and
eigenvector of $\mathbf{M}^{\star}$. We demonstrate that the leading eigenvalue
of the data matrix $\mathbf{M}$ can be $O(\sqrt{n})$ times more accurate --- up
to some log factor --- than its (unadjusted) leading singular value in
eigenvalue estimation. Further, the perturbation of any linear form of the
leading eigenvector of $\mathbf{M}$ --- say, entrywise eigenvector perturbation
--- is provably well-controlled. This eigen-decomposition approach is fully
adaptive to heteroscedasticity of noise without the need of careful bias
correction or any prior knowledge about the noise variance. We also provide
partial theory for the more general rank-$r$ case. The takeaway message is
this: arranging the data samples in an asymmetric manner and performing
eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page

### Trip Prediction by Leveraging Trip Histories from Neighboring Users

We propose a novel approach for trip prediction by analyzing user's trip
histories. We augment users' (self-) trip histories by adding 'similar' trips
from other users, which could be informative and useful for predicting future
trips for a given user. This also helps to cope with noisy or sparse trip
histories, where the self-history by itself does not provide a reliable
prediction of future trips. We show empirical evidence that by enriching the
users' trip histories with additional trips, one can improve the prediction
error by 15%-40%, evaluated on multiple subsets of the Nancy2012 dataset. This
real-world dataset is collected from public transportation ticket validations
in the city of Nancy, France. Our prediction tool is a central component of a
trip simulator system designed to analyze the functionality of public
transportation in the city of Nancy

### Information Recovery from Pairwise Measurements

A variety of information processing tasks in practice involve recovering $n$
objects from single-shot graph-based measurements, particularly those taken
over the edges of some measurement graph $\mathcal{G}$. This paper concerns the
situation where each object takes value over a group of $M$ different values,
and where one is interested to recover all these values based on observations
of certain pairwise relations over $\mathcal{G}$. The imperfection of
measurements presents two major challenges for information recovery: 1)
$\textit{inaccuracy}$: a (dominant) portion $1-p$ of measurements are
corrupted; 2) $\textit{incompleteness}$: a significant fraction of pairs are
unobservable, i.e. $\mathcal{G}$ can be highly sparse.
Under a natural random outlier model, we characterize the $\textit{minimax
recovery rate}$, that is, the critical threshold of non-corruption rate $p$
below which exact information recovery is infeasible. This accommodates a very
general class of pairwise relations. For various homogeneous random graph
models (e.g. Erdos Renyi random graphs, random geometric graphs, small world
graphs), the minimax recovery rate depends almost exclusively on the edge
sparsity of the measurement graph $\mathcal{G}$ irrespective of other graphical
metrics. This fundamental limit decays with the group size $M$ at a square root
rate before entering a connectivity-limited regime. Under the Erdos Renyi
random graph, a tractable combinatorial algorithm is proposed to approach the
limit for large $M$ ($M=n^{\Omega(1)}$), while order-optimal recovery is
enabled by semidefinite programs in the small $M$ regime.
The extended (and most updated) version of this work can be found at
(http://arxiv.org/abs/1504.01369).Comment: This version is no longer updated -- please find the latest version
at (arXiv:1504.01369

### When Shopbots Meet Emails: Implications for Price Competition on the Internet

The Internet has dramatically reduced search costs for customers through tools such as shopbots. The conventional wisdom is that this reduction in search costs will increase price competition leading to a decline in prices and profits for online firms. In this paper, we provide an argument for why in contrast to conventional wisdom, competition may be reduced and prices may rise as consumer search costs for prices fall. Our argument has particular appeal in the context of the Internet, where email targeting and the ability to track and record customer behavior are institutional features that facilitate cost effective targeted pricing by firms. We show that such targeted pricing can serve as an effective counterweight to keep average prices high despite the downward pressure on prices due to low search costs. Surprisingly, we find that the effectiveness of targeting itself improves as search costs fall; therefore prices and profits can increase as search costs fall. The intuition for our argument is as follows: Consider a market where consumers are heterogeneous in their loyalty as well as their cost per unit time to search. In the brick and mortar world, it takes consumers a very large amount of time to search across multiple firms. Therefore few customers will search in equilibrium because the gains from search will be relatively small compared to the cost of search. In such a market, a firm will not be able to distinguish whether its customers bought from it due to their high loyalty or due to their unwillingness to search for low prices because of the high search cost. On the Internet, the amount of time to search across multiple stores is minimal (say zero). Now irrespective of their opportunity cost of time, all consumers can search because the time to search is negligible. If in spite of this, a consumer does not search in this environment, she is revealing that her loyalty to the firm that she buys from is very high. The key insight is that as search becomes easy for everyone, then lack of search indicates strong customer loyalty and thus can be used as a proxy to segment the market into loyal and price sensitive segments. Thanks to email technology, firms can selectively set differential prices to different customers, i.e. a high price to the loyal segment and a low price to the price sensitive segment, at relatively low cost. The increased competition due to price transparency caused by low search costs can thus be offset by the ability of firms to price discriminate between their loyal (price insensitive) customers and their price sensitive customers. In fact, we find that it can reduce the extent of competition among the firms and raise their profits. Most surprisingly, the positive effect of targeting on prices improves when search costs fall, because firms can learn more about the differences in customer loyalty, thus improving the effectiveness of targeted pricing. The effectiveness of targeted pricing however is moderated by the extent of opt-in by customers who give their permission for firms to contact them directly by email. Our analysis offers interesting strategic insights for managers about how to address the competitive problems associated with low search costs on the Internet: (1) It suggests that firms should invest in better technologies for personalization and targeted pricing so as to prevent the Internet from becoming a competitive minefield that destroys firm profitability. In fact we show that low search costs can facilitate better price personalization and can thus aid in improving the effectiveness of targeted pricing efforts. (2) The analysis also offers guidelines for online customer acquisition efforts. The critical issue for competitive advantage is not in increasing market share per se, but in increasing the loyalty of customers. While a larger share of very loyal customers reduces competitive intensity, surprisingly a larger share of customers who are not very loyal can be a competitive disadvantage. In order for customer acquisition to be profitable, it should be accompanied by a superior product or service that can ensure high loyalty. (3) Investing in online privacy initiatives that assures consumers that their private information will not be abused other than to offer them "deals" is worthwhile. Such assurances will encourage consumers to opt into firm mailing lists. This facilitates successful targeting which in turn ameliorates the competitive threats due to low search costs on the Internet. (4) When the overwhelming majority of customers are satisfied with online privacy, the remaining privacy conscious customers who are not willing to pay a higher price to maintain their privacy will be left out of the market. While this may be of some concern to privacy advocates, it is interesting that total consumer welfare can be higher even if some consumers are left out of the market. Our analysis captures the competitive implications of the interaction between two institutions facilitated by the Internet: Shopbots and Emails. But the research question addressed is more fundamental: What is the nature of competition in an environment with low costs for both consumer search and firm-to-consumer personalized communications? The strategic insights obtained in the paper may be beneficially applied even to offline businesses that can replicate such an environment. For example, offline firms could have websites on which they post prices allowing for easy price comparisons. They could also use tools such as frequency programs to create addressable databases that enable them to communicate with customers by direct mail and email (as many airlines and stores do).

### Scalable Semidefinite Relaxation for Maximum A Posterior Estimation

Maximum a posteriori (MAP) inference over discrete Markov random fields is a
fundamental task spanning a wide spectrum of real-world applications, which is
known to be NP-hard for general graphs. In this paper, we propose a novel
semidefinite relaxation formulation (referred to as SDR) to estimate the MAP
assignment. Algorithmically, we develop an accelerated variant of the
alternating direction method of multipliers (referred to as SDPAD-LR) that can
effectively exploit the special structure of the new relaxation. Encouragingly,
the proposed procedure allows solving SDR for large-scale problems, e.g.,
problems on a grid graph comprising hundreds of thousands of variables with
multiple states per node. Compared with prior SDP solvers, SDPAD-LR is capable
of attaining comparable accuracy while exhibiting remarkably improved
scalability, in contrast to the commonly held belief that semidefinite
relaxation can only been applied on small-scale MRF problems. We have evaluated
the performance of SDR on various benchmark datasets including OPENGM2 and PIC
in terms of both the quality of the solutions and computation time.
Experimental results demonstrate that for a broad class of problems, SDPAD-LR
outperforms state-of-the-art algorithms in producing better MAP assignment in
an efficient manner.Comment: accepted to International Conference on Machine Learning (ICML 2014

### On the Minimax Capacity Loss under Sub-Nyquist Universal Sampling

This paper investigates the information rate loss in analog channels when the
sampler is designed to operate independent of the instantaneous channel
occupancy. Specifically, a multiband linear time-invariant Gaussian channel
under universal sub-Nyquist sampling is considered. The entire channel
bandwidth is divided into $n$ subbands of equal bandwidth. At each time only
$k$ constant-gain subbands are active, where the instantaneous subband
occupancy is not known at the receiver and the sampler. We study the
information loss through a capacity loss metric, that is, the capacity gap
caused by the lack of instantaneous subband occupancy information. We
characterize the minimax capacity loss for the entire sub-Nyquist rate regime,
provided that the number $n$ of subbands and the SNR are both large. The
minimax limits depend almost solely on the band sparsity factor and the
undersampling factor, modulo some residual terms that vanish as $n$ and SNR
grow. Our results highlight the power of randomized sampling methods (i.e. the
samplers that consist of random periodic modulation and low-pass filters),
which are able to approach the minimax capacity loss with exponentially high
probability.Comment: accepted to IEEE Transactions on Information Theory. It has been
presented in part at the IEEE International Symposium on Information Theory
(ISIT) 201

- …