2,494 research outputs found
Exploiting Structural Properties in the Analysis of High-dimensional Dynamical Systems
The physical and cyber domains with which we interact are filled with high-dimensional dynamical systems. In machine learning, for instance, the evolution of overparametrized neural networks can be seen as a dynamical system. In networked systems, numerous agents or nodes dynamically interact with each other. A deep understanding of these systems can enable us to predict their behavior, identify potential pitfalls, and devise effective solutions for optimal outcomes. In this dissertation, we will discuss two classes of high-dimensional dynamical systems with specific structural properties that aid in understanding their dynamic behavior.
In the first scenario, we consider the training dynamics of multi-layer neural networks. The high dimensionality comes from overparametrization: a typical network has a large depth and hidden layer width. We are interested in the following question regarding convergence: Do network weights converge to an equilibrium point corresponding to a global minimum of our training loss, and how fast is the convergence rate? The key to those questions is the symmetry of the weights, a critical property induced by the multi-layer architecture. Such symmetry leads to a set of time-invariant quantities, called weight imbalance, that restrict the training trajectory to a low-dimensional manifold defined by the weight initialization. A tailored convergence analysis is developed over this low-dimensional manifold, showing improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the convergence rate.
In the second scenario, we consider large-scale networked systems with multiple weakly-connected groups. Such a multi-cluster structure leads to a time-scale separation between the fast intra-group interaction due to high intra-group connectivity, and the slow inter-group oscillation, due to the weak inter-group connection. We develop a novel frequency-domain network coherence analysis that captures both the coherent behavior within each group, and the dynamical interaction between groups, leading to a structure-preserving model-reduction methodology for large-scale dynamic networks with multiple clusters under general node dynamics assumptions
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Game-theoretic statistics and safe anytime-valid inference
Safe anytime-valid inference (SAVI) provides measures of statistical evidence
and certainty -- e-processes for testing and confidence sequences for
estimation -- that remain valid at all stopping times, accommodating continuous
monitoring and analysis of accumulating data and optional stopping or
continuation for any reason. These measures crucially rely on test martingales,
which are nonnegative martingales starting at one. Since a test martingale is
the wealth process of a player in a betting game, SAVI centrally employs
game-theoretic intuition, language and mathematics. We summarize the SAVI goals
and philosophy, and report recent advances in testing composite hypotheses and
estimating functionals in nonparametric settings.Comment: 25 pages. Under review. ArXiv does not compile/space some references
properl
Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes
Robotic Information Gathering (RIG) is a foundational research topic that
answers how a robot (team) collects informative data to efficiently build an
accurate model of an unknown target function under robot embodiment
constraints. RIG has many applications, including but not limited to autonomous
exploration and mapping, 3D reconstruction or inspection, search and rescue,
and environmental monitoring. A RIG system relies on a probabilistic model's
prediction uncertainty to identify critical areas for informative data
collection. Gaussian Processes (GPs) with stationary kernels have been widely
adopted for spatial modeling. However, real-world spatial data is typically
non-stationary -- different locations do not have the same degree of
variability. As a result, the prediction uncertainty does not accurately reveal
prediction error, limiting the success of RIG algorithms. We propose a family
of non-stationary kernels named Attentive Kernel (AK), which is simple, robust,
and can extend any existing kernel to a non-stationary one. We evaluate the new
kernel in elevation mapping tasks, where AK provides better accuracy and
uncertainty quantification over the commonly used stationary kernels and the
leading non-stationary kernels. The improved uncertainty quantification guides
the downstream informative planner to collect more valuable data around the
high-error area, further increasing prediction accuracy. A field experiment
demonstrates that the proposed method can guide an Autonomous Surface Vehicle
(ASV) to prioritize data collection in locations with significant spatial
variations, enabling the model to characterize salient environmental features.Comment: International Journal of Robotics Research (IJRR). arXiv admin note:
text overlap with arXiv:2205.0642
Spatio-Temporal Wildfire Prediction using Multi-Modal Data
Due to severe societal and environmental impacts, wildfire prediction using
multi-modal sensing data has become a highly sought-after data-analytical tool
by various stakeholders (such as state governments and power utility companies)
to achieve a more informed understanding of wildfire activities and plan
preventive measures. A desirable algorithm should precisely predict fire risk
and magnitude for a location in real time. In this paper, we develop a flexible
spatio-temporal wildfire prediction framework using multi-modal time series
data. We first predict the wildfire risk (the chance of a wildfire event) in
real-time, considering the historical events using discrete mutually exciting
point process models. Then we further develop a wildfire magnitude prediction
set method based on the flexible distribution-free time-series conformal
prediction (CP) approach. Theoretically, we prove a risk model parameter
recovery guarantee, as well as coverage and set size guarantees for the CP
sets. Through extensive real-data experiments with wildfire data in California,
we demonstrate the effectiveness of our methods, as well as their flexibility
and scalability in large regions
When Deep Learning Meets Polyhedral Theory: A Survey
In the past decade, deep learning became the prevalent methodology for
predictive modeling thanks to the remarkable accuracy of deep neural networks
in tasks such as computer vision and natural language processing. Meanwhile,
the structure of neural networks converged back to simpler representations
based on piecewise constant and piecewise linear functions such as the
Rectified Linear Unit (ReLU), which became the most commonly used type of
activation function in neural networks. That made certain types of network
structure \unicode{x2014}such as the typical fully-connected feedforward
neural network\unicode{x2014} amenable to analysis through polyhedral theory
and to the application of methodologies such as Linear Programming (LP) and
Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this
paper, we survey the main topics emerging from this fast-paced area of work,
which bring a fresh perspective to understanding neural networks in more detail
as well as to applying linear optimization techniques to train, verify, and
reduce the size of such networks
AI: Limits and Prospects of Artificial Intelligence
The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence
Adaptive novelty detection with false discovery rate guarantee
This paper studies the semi-supervised novelty detection problem where a set
of "typical" measurements is available to the researcher. Motivated by recent
advances in multiple testing and conformal inference, we propose AdaDetect, a
flexible method that is able to wrap around any probabilistic classification
algorithm and control the false discovery rate (FDR) on detected novelties in
finite samples without any distributional assumption other than
exchangeability. In contrast to classical FDR-controlling procedures that are
often committed to a pre-specified p-value function, AdaDetect learns the
transformation in a data-adaptive manner to focus the power on the directions
that distinguish between inliers and outliers. Inspired by the multiple testing
literature, we further propose variants of AdaDetect that are adaptive to the
proportion of nulls while maintaining the finite-sample FDR control. The
methods are illustrated on synthetic datasets and real-world datasets,
including an application in astrophysics
On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI
Sensing and edge artificial intelligence (AI) are two key features of the
sixth-generation (6G) mobile networks. Their natural integration, termed
Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging
Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy,
multi-view features are uploaded to an edge server for aggregation and
inference using an AI model. The view aggregation is realized efficiently using
over-the-air computing (AirComp), which also aggregates channels to suppress
channel noise. At its nascent stage, ISEA still lacks a characterization of the
fundamental performance gains from view-and-channel aggregation, which
motivates this work. Our framework leverages a well-established distribution
model of multi-view sensing data where the classic Gaussian-mixture model is
modified by adding sub-spaces matrices to represent individual sensor
observation perspectives. Based on the model, we study the End-to-End sensing
(inference) uncertainty, a popular measure of inference accuracy, of the said
ISEA system by a novel approach involving designing a scaling-tight uncertainty
surrogate function, global discriminant gain, distribution of receive
Signal-to-Noise Ratio (SNR), and channel induced discriminant loss. We prove
that the E2E sensing uncertainty diminishes at an exponential rate as the
number of views/sensors grows, where the rate is proportional to global
discriminant gain. Given channel distortion, we further show that the
exponential scaling remains with a reduced decay rate related to the channel
induced discriminant loss. Furthermore, we benchmark AirComp against equally
fast, traditional analog orthogonal access, which reveals a sensing-accuracy
crossing point between the schemes, leading to the proposal of adaptive
access-mode switching. Last, the insights from our framework are validated by
experiments using real-world dataset.Comment: 13 pages, 8 figure
- …