13,237 research outputs found
DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction
The knowledge of end-to-end network distances is essential to many Internet
applications. As active probing of all pairwise distances is infeasible in
large-scale networks, a natural idea is to measure a few pairs and to predict
the other ones without actually measuring them. This paper formulates the
distance prediction problem as matrix completion where unknown entries of an
incomplete matrix of pairwise distances are to be predicted. The problem is
solvable because strong correlations among network distances exist and cause
the constructed distance matrix to be low rank. The new formulation circumvents
the well-known drawbacks of existing approaches based on Euclidean embedding.
A new algorithm, so-called Decentralized Matrix Factorization by Stochastic
Gradient Descent (DMFSGD), is proposed to solve the network distance prediction
problem. By letting network nodes exchange messages with each other, the
algorithm is fully decentralized and only requires each node to collect and to
process local measurements, with neither explicit matrix constructions nor
special nodes such as landmarks and central servers. In addition, we compared
comprehensively matrix factorization and Euclidean embedding to demonstrate the
suitability of the former on network distance prediction. We further studied
the incorporation of a robust loss function and of non-negativity constraints.
Extensive experiments on various publicly-available datasets of network delays
show not only the scalability and the accuracy of our approach but also its
usability in real Internet applications.Comment: submitted to IEEE/ACM Transactions on Networking on Nov. 201
A Semantic Collaboration Method Based on Uniform Knowledge Graph
The Semantic Internet of Things is the extension of the Internet of Things and the Semantic Web, which aims to build an interoperable collaborative system to solve the heterogeneous problems in the Internet of Things. However, the Semantic Internet of Things has the characteristics of both the Internet of Things and the Semantic Web environment, and the corresponding semantic data presents many new data features. In this study, we analyze the characteristics of semantic data and propose the concept of a uniform knowledge graph, allowing us to be applied to the environment of the Semantic Internet of Things better. Here, we design a semantic collaboration method based on a uniform knowledge graph. It can take the uniform knowledge graph as the form of knowledge organization and representation, and provide a useful data basis for semantic collaboration by constructing semantic links to complete semantic relation between different data sets, to achieve the semantic collaboration in the Semantic Internet of Things. Our experiments show that the proposed method can analyze and understand the semantics of user requirements better and provide more satisfactory outcomes
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
Domain Generalization by Marginal Transfer Learning
In the problem of domain generalization (DG), there are labeled training data
sets from several related prediction problems, and the goal is to make accurate
predictions on future unlabeled data sets that are not known to the learner.
This problem arises in several applications where data distributions fluctuate
because of environmental, technical, or other sources of variation. We
introduce a formal framework for DG, and argue that it can be viewed as a kind
of supervised learning problem by augmenting the original feature space with
the marginal distribution of feature vectors. While our framework has several
connections to conventional analysis of supervised learning algorithms, several
unique aspects of DG require new methods of analysis.
This work lays the learning theoretic foundations of domain generalization,
building on our earlier conference paper where the problem of DG was introduced
Blanchard et al., 2011. We present two formal models of data generation,
corresponding notions of risk, and distribution-free generalization error
analysis. By focusing our attention on kernel methods, we also provide more
quantitative results and a universally consistent algorithm. An efficient
implementation is provided for this algorithm, which is experimentally compared
to a pooling strategy on one synthetic and three real-world data sets
Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
Transfer operators such as the Perron--Frobenius or Koopman operator play an
important role in the global analysis of complex dynamical systems. The
eigenfunctions of these operators can be used to detect metastable sets, to
project the dynamics onto the dominant slow processes, or to separate
superimposed signals. We extend transfer operator theory to reproducing kernel
Hilbert spaces and show that these operators are related to Hilbert space
representations of conditional distributions, known as conditional mean
embeddings in the machine learning community. Moreover, numerical methods to
compute empirical estimates of these embeddings are akin to data-driven methods
for the approximation of transfer operators such as extended dynamic mode
decomposition and its variants. One main benefit of the presented kernel-based
approaches is that these methods can be applied to any domain where a
similarity measure given by a kernel is available. We illustrate the results
with the aid of guiding examples and highlight potential applications in
molecular dynamics as well as video and text data analysis
- …