1,210 research outputs found
Exact Matrix Completion via Convex Optimization
We consider a problem of considerable practical interest: the recovery of a
data matrix from a sampling of its entries. Suppose that we observe m entries
selected uniformly at random from a matrix M. Can we complete the matrix and
recover the entries that we have not seen?
We show that one can perfectly recover most low-rank matrices from what
appears to be an incomplete set of entries. We prove that if the number m of
sampled entries obeys m >= C n^{1.2} r log n for some positive numerical
constant C, then with very high probability, most n by n matrices of rank r can
be perfectly recovered by solving a simple convex optimization program. This
program finds the matrix with minimum nuclear norm that fits the data. The
condition above assumes that the rank is not too large. However, if one
replaces the 1.2 exponent with 1.25, then the result holds for all values of
the rank. Similar results hold for arbitrary rectangular matrices as well. Our
results are connected with the recent literature on compressed sensing, and
show that objects other than signals and images can be perfectly reconstructed
from very limited information
Knowledge-aware Complementary Product Representation Learning
Learning product representations that reflect complementary relationship
plays a central role in e-commerce recommender system. In the absence of the
product relationships graph, which existing methods rely on, there is a need to
detect the complementary relationships directly from noisy and sparse customer
purchase activities. Furthermore, unlike simple relationships such as
similarity, complementariness is asymmetric and non-transitive. Standard usage
of representation learning emphasizes on only one set of embedding, which is
problematic for modelling such properties of complementariness. We propose
using knowledge-aware learning with dual product embedding to solve the above
challenges. We encode contextual knowledge into product representation by
multi-task learning, to alleviate the sparsity issue. By explicitly modelling
with user bias terms, we separate the noise of customer-specific preferences
from the complementariness. Furthermore, we adopt the dual embedding framework
to capture the intrinsic properties of complementariness and provide geometric
interpretation motivated by the classic separating hyperplane theory. Finally,
we propose a Bayesian network structure that unifies all the components, which
also concludes several popular models as special cases. The proposed method
compares favourably to state-of-art methods, in downstream classification and
recommendation tasks. We also develop an implementation that scales efficiently
to a dataset with millions of items and customers
Recommended from our members
On the Temperature Dependence of Point-Defect-Mediated Luminescence in Silicon
We present a model of the temperature dependence of point-defect-mediated luminescence in silicon derived from basic kinetics and semiconductor physics and based on the kinetics of bound exciton formation. The model provides a good fit to data for W line electroluminescence and G line photoluminescence in silicon. Strategies are discussed for extending luminescence to room temperature.Engineering and Applied Science
Insulator-to-Metal Transition in Selenium-Hyperdoped Silicon: Observation and Origin
Hyperdoping has emerged as a promising method for designing semiconductors
with unique optical and electronic properties, although such properties
currently lack a clear microscopic explanation. Combining computational and
experimental evidence, we probe the origin of sub-band gap optical absorption
and metallicity in Se-hyperdoped Si. We show that sub-band gap absorption
arises from direct defect-to-conduction band transitions rather than free
carrier absorption. Density functional theory predicts the Se-induced
insulator-to-metal transition arises from merging of defect and conduction
bands, at a concentration in excellent agreement with experiment. Quantum Monte
Carlo calculations confirm the critical concentration, demonstrate that
correlation is important to describing the transition accurately, and suggest
that it is a classic impurity-driven Mott transition.Comment: 5 pages, 3 figures (PRL formatted
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
A Cost-based Optimizer for Gradient Descent Optimization
As the use of machine learning (ML) permeates into diverse application
domains, there is an urgent need to support a declarative framework for ML.
Ideally, a user will specify an ML task in a high-level and easy-to-use
language and the framework will invoke the appropriate algorithms and system
configurations to execute it. An important observation towards designing such a
framework is that many ML tasks can be expressed as mathematical optimization
problems, which take a specific form. Furthermore, these optimization problems
can be efficiently solved using variations of the gradient descent (GD)
algorithm. Thus, to decouple a user specification of an ML task from its
execution, a key component is a GD optimizer. We propose a cost-based GD
optimizer that selects the best GD plan for a given ML task. To build our
optimizer, we introduce a set of abstract operators for expressing GD
algorithms and propose a novel approach to estimate the number of iterations a
GD algorithm requires to converge. Extensive experiments on real and synthetic
datasets show that our optimizer not only chooses the best GD plan but also
allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve
state-of-the-art performance on a variety of machine learning tasks. Several
researchers have recently proposed schemes to parallelize SGD, but all require
performance-destroying memory locking and synchronization. This work aims to
show using novel theoretical analysis, algorithms, and implementation that SGD
can be implemented without any locking. We present an update scheme called
HOGWILD! which allows processors access to shared memory with the possibility
of overwriting each other's work. We show that when the associated optimization
problem is sparse, meaning most gradient updates only modify small parts of the
decision variable, then HOGWILD! achieves a nearly optimal rate of convergence.
We demonstrate experimentally that HOGWILD! outperforms alternative schemes
that use locking by an order of magnitude.Comment: 22 pages, 10 figure
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
- …