404 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Mining Butterflies in Streaming Graphs

    Get PDF
    This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection. sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals

    Curvature corrected tangent space-based approximation of manifold-valued data

    Full text link
    When generalizing schemes for real-valued data approximation or decomposition to data living in Riemannian manifolds, tangent space-based schemes are very attractive for the simple reason that these spaces are linear. An open challenge is to do this in such a way that the generalized scheme is applicable to general Riemannian manifolds, is global-geometry aware and is computationally feasible. Existing schemes have been unable to account for all three of these key factors at the same time. In this work, we take a systematic approach to developing a framework that is able to account for all three factors. First, we will restrict ourselves to the -- still general -- class of symmetric Riemannian manifolds and show how curvature affects general manifold-valued tensor approximation schemes. Next, we show how the latter observations can be used in a general strategy for developing approximation schemes that are also global-geometry aware. Finally, having general applicability and global-geometry awareness taken into account we restrict ourselves once more in a case study on low-rank approximation. Here we show how computational feasibility can be achieved and propose the curvature-corrected truncated higher-order singular value decomposition (CC-tHOSVD), whose performance is subsequently tested in numerical experiments with both synthetic and real data living in symmetric Riemannian manifolds with both positive and negative curvature

    Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach

    Full text link
    Matrix manifolds, such as manifolds of Symmetric Positive Definite (SPD) matrices and Grassmann manifolds, appear in many applications. Recently, by applying the theory of gyrogroups and gyrovector spaces that is a powerful framework for studying hyperbolic geometry, some works have attempted to build principled generalizations of Euclidean neural networks on matrix manifolds. However, due to the lack of many concepts in gyrovector spaces for the considered manifolds, e.g., the inner product and gyroangles, techniques and mathematical tools provided by these works are still limited compared to those developed for studying hyperbolic geometry. In this paper, we generalize some notions in gyrovector spaces for SPD and Grassmann manifolds, and propose new models and layers for building neural networks on these manifolds. We show the effectiveness of our approach in two applications, i.e., human action recognition and knowledge graph completion

    Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

    Full text link
    Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to Byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We define the quality of workers as reconstruction ratios (0,1]\in (0,1], and formulate aggregation as a Maximum Likelihood Estimation procedure using Beta densities. We show that the Regularized form of log-likelihood wrt subspace can be approximately solved using iterative least squares solver, and provide convergence guarantees using recent Convex Optimization landscape results. Our empirical findings demonstrate that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators. We evaluate our method in a distributed setup with a parameter server, and show simultaneous improvements in communication efficiency and accuracy across various tasks. The code is publicly available at https://github.com/hamidralmasi/FlagAggregato

    Contributions in functional data analysis and functional-analytic statistics

    Get PDF
    Functional data analysis is the study of statistical algorithms which are applied in the scenario when the observed data is a collection of functions. Since this type of data is becoming cheaper and easier to collect, there is an increased need to develop statistical tools to handle such data. The first part of this thesis focuses on deriving distances between distributions over function spaces and applying these to two-sample testing, goodness-of-fit testing and sample quality assessment. This presents a wide range of contributions since currently there exists either very few or no methods at all to tackle these problems for functional data. The second part of this thesis adopts the functional-analytic perspective to two statistical algorithms. This is a perspective where functions are viewed as living in specific function spaces and the tool box of functional analysis is applied to identify and prove properties of the algorithms. The two algorithms are variational Gaussian processes, used widely throughout machine learning for function modelling with large observation data sets, and functional statistical depth, used widely as a means to evaluate outliers and perform testing for functional data sets. The results presented contribute a taxonomy of the variational Gaussian process methodology and multiple new results in the theory of functional depth including the open problem of providing a depth which characterises distributions on function spaces.Open Acces

    Data analysis with merge trees

    Get PDF
    Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram

    A Riemannian Optimization Approach to Clustering Problems

    Full text link
    This paper considers the optimization problem in the form of minXFvf(x)+λX1,\min_{X \in \mathcal{F}_v} f(x) + \lambda \|X\|_1, where ff is smooth, Fv={XRn×q:XTX=Iq,vspan(X)}\mathcal{F}_v = \{X \in \mathbb{R}^{n \times q} : X^T X = I_q, v \in \mathrm{span}(X)\}, and vv is a given positive vector. The clustering models including but not limited to the models used by kk-means, community detection, and normalized cut can be reformulated as such optimization problems. It is proven that the domain Fv\mathcal{F}_v forms a compact embedded submanifold of Rn×q\mathbb{R}^{n \times q} and optimization-related tools including a family of computationally efficient retractions and an orthonormal basis of any normal space of Fv\mathcal{F}_v are derived. An inexact accelerated Riemannian proximal gradient method that allows adaptive step size is proposed and its global convergence is established. Numerical experiments on community detection in networks and normalized cut for image segmentation are used to demonstrate the performance of the proposed method
    corecore