404 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Mining Butterflies in Streaming Graphs
This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection.
sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals
Curvature corrected tangent space-based approximation of manifold-valued data
When generalizing schemes for real-valued data approximation or decomposition
to data living in Riemannian manifolds, tangent space-based schemes are very
attractive for the simple reason that these spaces are linear. An open
challenge is to do this in such a way that the generalized scheme is applicable
to general Riemannian manifolds, is global-geometry aware and is
computationally feasible. Existing schemes have been unable to account for all
three of these key factors at the same time.
In this work, we take a systematic approach to developing a framework that is
able to account for all three factors. First, we will restrict ourselves to the
-- still general -- class of symmetric Riemannian manifolds and show how
curvature affects general manifold-valued tensor approximation schemes. Next,
we show how the latter observations can be used in a general strategy for
developing approximation schemes that are also global-geometry aware. Finally,
having general applicability and global-geometry awareness taken into account
we restrict ourselves once more in a case study on low-rank approximation. Here
we show how computational feasibility can be achieved and propose the
curvature-corrected truncated higher-order singular value decomposition
(CC-tHOSVD), whose performance is subsequently tested in numerical experiments
with both synthetic and real data living in symmetric Riemannian manifolds with
both positive and negative curvature
Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach
Matrix manifolds, such as manifolds of Symmetric Positive Definite (SPD)
matrices and Grassmann manifolds, appear in many applications. Recently, by
applying the theory of gyrogroups and gyrovector spaces that is a powerful
framework for studying hyperbolic geometry, some works have attempted to build
principled generalizations of Euclidean neural networks on matrix manifolds.
However, due to the lack of many concepts in gyrovector spaces for the
considered manifolds, e.g., the inner product and gyroangles, techniques and
mathematical tools provided by these works are still limited compared to those
developed for studying hyperbolic geometry. In this paper, we generalize some
notions in gyrovector spaces for SPD and Grassmann manifolds, and propose new
models and layers for building neural networks on these manifolds. We show the
effectiveness of our approach in two applications, i.e., human action
recognition and knowledge graph completion
Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Modern ML applications increasingly rely on complex deep learning models and
large datasets. There has been an exponential growth in the amount of
computation needed to train the largest models. Therefore, to scale computation
and data, these models are inevitably trained in a distributed manner in
clusters of nodes, and their updates are aggregated before being applied to the
model. However, a distributed setup is prone to Byzantine failures of
individual nodes, components, and software. With data augmentation added to
these settings, there is a critical need for robust and efficient aggregation
systems. We define the quality of workers as reconstruction ratios ,
and formulate aggregation as a Maximum Likelihood Estimation procedure using
Beta densities. We show that the Regularized form of log-likelihood wrt
subspace can be approximately solved using iterative least squares solver, and
provide convergence guarantees using recent Convex Optimization landscape
results. Our empirical findings demonstrate that our approach significantly
enhances the robustness of state-of-the-art Byzantine resilient aggregators. We
evaluate our method in a distributed setup with a parameter server, and show
simultaneous improvements in communication efficiency and accuracy across
various tasks. The code is publicly available at
https://github.com/hamidralmasi/FlagAggregato
Contributions in functional data analysis and functional-analytic statistics
Functional data analysis is the study of statistical algorithms which are applied in the scenario when the observed data is a collection of functions. Since this type of data is becoming cheaper and easier to collect, there is an increased need to develop statistical tools to handle such data. The first part of this thesis focuses on deriving distances between distributions over function spaces and applying these to two-sample testing, goodness-of-fit testing and sample quality assessment. This presents a wide range of contributions since currently there exists either very few or no methods at all to tackle these problems for functional data. The second part of this thesis adopts the functional-analytic perspective to two statistical algorithms. This is a perspective where functions are viewed as living in specific function spaces and the tool box of functional analysis is applied to identify and prove properties of the algorithms. The two algorithms are variational Gaussian processes, used widely throughout machine learning for function modelling with large observation data sets, and functional statistical depth, used widely as a means to evaluate outliers and perform testing for functional data sets. The results presented contribute a taxonomy of the variational Gaussian process methodology and multiple new results in the theory of functional depth including the open problem of providing a depth which characterises distributions on function spaces.Open Acces
Data analysis with merge trees
Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram
A Riemannian Optimization Approach to Clustering Problems
This paper considers the optimization problem in the form of where is smooth, , and
is a given positive vector. The clustering models including but not limited
to the models used by -means, community detection, and normalized cut can be
reformulated as such optimization problems. It is proven that the domain
forms a compact embedded submanifold of and optimization-related tools including a family of computationally
efficient retractions and an orthonormal basis of any normal space of
are derived. An inexact accelerated Riemannian proximal
gradient method that allows adaptive step size is proposed and its global
convergence is established. Numerical experiments on community detection in
networks and normalized cut for image segmentation are used to demonstrate the
performance of the proposed method
- …