66 research outputs found
Ergodic Randomized Algorithms and Dynamics over Networks
Algorithms and dynamics over networks often involve randomization, and
randomization may result in oscillating dynamics which fail to converge in a
deterministic sense. In this paper, we observe this undesired feature in three
applications, in which the dynamics is the randomized asynchronous counterpart
of a well-behaved synchronous one. These three applications are network
localization, PageRank computation, and opinion dynamics. Motivated by their
formal similarity, we show the following general fact, under the assumptions of
independence across time and linearities of the updates: if the expected
dynamics is stable and converges to the same limit of the original synchronous
dynamics, then the oscillations are ergodic and the desired limit can be
locally recovered via time-averaging.Comment: 11 pages; submitted for publication. revised version with fixed
technical flaw and updated reference
A Stochastic System Model for PageRank: Parameter Estimation and Adaptive Control
A key feature of modern web search engines is the ability to display relevant and reputable pages near the top of the list of query results. The PageRank algorithm provides one way of achieving such a useful hierarchical indexing by assigning a measure of relative importance, called the PageRank value, to each webpage. PageRank is motivated by the inherently hypertextual structure of the World Wide Web; specifically, the idea that pages with more incoming hyperlinks should be considered more popular and that popular pages should rank highly in search results, all other factors being equal. We begin by overviewing the original PageRank algorithm and discussing subsequent developments in the mathematical theory of PageRank. We focus on important contributions to improving the quality of rankings via topic-dependent or "personalized" PageRank, as well as techniques for improving the efficiency of PageRank computation based on Monte Carlo methods, extrapolation and adaptive methods, and aggregation methods We next present a model for PageRank whose dynamics are described by a controlled stochastic system that depends on an unknown parameter. The fact that the value of the parameter is unknown implies that the system is unknown. We establish strong consistency of a least squares estimator for the parameter. Furthermore, motivated by recent work on distributed randomized methods for PageRank computation, we show that the least squares estimator remains strongly consistent within a distributed framework. Finally, we consider the problem of controlling the stochastic system model for PageRank. Under various cost criteria, we use the least squares estimates of the unknown parameter to iteratively construct an adaptive control policy whose performance, according to the long-run average cost, is equivalent to the optimal stationary control that would be used if we had knowledge of the true value of the parameter. This research lays a foundation for future work in a number of areas, including testing the estimation and control procedures on real data or larger scale simulation models, considering more general parameter estimation methods such as weighted least squares, and introducing other types of control policies
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz,
Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft
Complex Quantum Networks: a Topical Review
These are exciting times for quantum physics as new quantum technologies are
expected to soon transform computing at an unprecedented level. Simultaneously
network science is flourishing proving an ideal mathematical and computational
framework to capture the complexity of large interacting systems. Here we
provide a comprehensive and timely review of the rising field of complex
quantum networks. On one side, this subject is key to harness the potential of
complex networks in order to provide design principles to boost and enhance
quantum algorithms and quantum technologies. On the other side this subject can
provide a new generation of quantum algorithms to infer significant complex
network properties. The field features fundamental research questions as
diverse as designing networks to shape Hamiltonians and their corresponding
phase diagram, taming the complexity of many-body quantum systems with network
theory, revealing how quantum physics and quantum algorithms can predict novel
network properties and phase transitions, and studying the interplay between
architecture, topology and performance in quantum communication networks. Our
review covers all of these multifaceted aspects in a self-contained
presentation aimed both at network-curious quantum physicists and at
quantum-curious network theorists. We provide a framework that unifies the
field of quantum complex networks along four main research lines:
network-generalized, quantum-applied, quantum-generalized and quantum-enhanced.
Finally we draw attention to the connections between these research lines,
which can lead to new opportunities and new discoveries at the interface
between quantum physics and network science.Comment: 103 pages + 29 pages of references, 26 figure
Domain-Specific Optimization For Machine Learning System
The machine learning (ML) system has been an indispensable part of the ML ecosystem in recent years. The rapid growth of ML brings new system challenges such as the need of handling more large-scale data and computation, the requirements for higher execution performance, and lower resource usage, stimulating the demand for improving ML system. General-purpose system optimization is widely used but brings limited benefits because ML applications vary in execution behaviors based on their algorithms, input data, and configurations. It\u27s difficult to perform comprehensive ML system optimizations without application specific information. Therefore, domain-specific optimization, a method that optimizes particular types of ML applications based on their unique characteristics, is necessary for advanced ML systems. This dissertation performs domain-specific system optimizations for three important ML applications: graph-based applications, SGD-based applications, and Python-based applications. For SGD-based applications, this dissertation proposes a lossy compression scheme for application checkpoint constructions (called {LC-Checkpoint\xspace}). {LC-Checkpoint\xspace} intends to simultaneously maximize the compression rate of checkpoints and reduce the recovery cost of SGD-based training processes. Extensive experiments show that {LC-Checkpoint\xspace} achieves a high compression rate with a lower recovery cost over a state-of-the-art algorithm. For kernel regression applications, this dissertation designs and implements a parallel software that targets to handle million-scale datasets. The software is evaluated on two million-scale downstream applications (i.e., equity return forecasting problem on the US stock dataset, and image classification problem on the ImageNet dataset) to demonstrate its efficacy and efficiency. For graph-based applications, this dissertation introduces {ATMem\xspace}, a runtime framework to optimize application data placement on heterogeneous memory systems. {ATMem\xspace} aims to maximize the fast memory (small-capacity) utilization by placing only critical data regions that yield the highest performance gains on the fast memory. Experimental results show that {ATMem\xspace} achieves significant speedup over the baseline that places all data on slow memory (large-capacity) with only placing a minority portion of the data on the fast memory. The future research direction is to adapt ML algorithms for software systems/architectures, deeply bind the design of ML algorithms to the implementation of ML systems, to achieve optimal solutions for ML applications
Open Problems in (Hyper)Graph Decomposition
Large networks are useful in a wide range of applications. Sometimes problem
instances are composed of billions of entities. Decomposing and analyzing these
structures helps us gain new insights about our surroundings. Even if the final
application concerns a different problem (such as traversal, finding paths,
trees, and flows), decomposing large graphs is often an important subproblem
for complexity reduction or parallelization. This report is a summary of
discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph
Decomposition" and presents currently open problems and future directions in
the area of (hyper)graph decomposition
- …