91 research outputs found
Regular decomposition of large graphs and other structures: scalability and robustness towards missing data
A method for compression of large graphs and matrices to a block structure is
further developed. Szemer\'edi's regularity lemma is used as a generic
motivation of the significance of stochastic block models. Another ingredient
of the method is Rissanen's minimum description length principle (MDL). We
continue our previous work on the subject, considering cases of missing data
and scaling of algorithms to extremely large size of graphs. In this way it
would be possible to find out a large scale structure of a huge graphs of
certain type using only a tiny part of graph information and obtaining a
compact representation of such graphs useful in computations and visualization.Comment: Accepted for publication in: Fourth International Workshop on High
Performance Big Graph Data Management, Analysis, and Mining, December 11,
2017, Bosto U.S.
On the stability of two-chunk file-sharing systems
We consider five different peer-to-peer file sharing systems with two chunks,
with the aim of finding chunk selection algorithms that have provably stable
performance with any input rate and assuming non-altruistic peers who leave the
system immediately after downloading the second chunk. We show that many
algorithms that first looked promising lead to unstable or oscillating
behavior. However, we end up with a system with desirable properties. Most of
our rigorous results concern the corresponding deterministic large system
limits, but in two simplest cases we provide proofs for the stochastic systems
also.Comment: 19 pages, 7 figure
Regular Decomposition of Large Graphs: Foundation of a Sampling Approach to Stochastic Block Model Fitting
We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.Peer reviewe
Record linkage of population-based cohort data from minors with national register data : A scoping review and comparative legal analysis of four European countries
Funding Information: We would like to acknowledge Evert-Ben van Veen from the MLC Foundation, Dagelijkse Groenmarkt 2, 2513 AL Den Haag, the Netherlands. The results on the country-specific text on the Netherlands was based on his contribution. Publisher Copyright: © 2021 Doetsch JN et al.Background: The GDPR was implemented to build an overarching framework for personal data protection across the EU/EEA. Linkage of data directly collected from cohort participants, potentially serving as a prominent tool for health research, must respect data protection rules and privacy rights. Our objective was to investigate law possibilities of linking cohort data of minors with routinely collected education and health data comparing EU/EEA member states. Methods: A legal comparative analysis and scoping review was conducted of openly accessible published laws and regulations in EUR-Lex and national law databases on GDPR's implementation in Portugal, Finland, Norway, and the Netherlands and its connected national regulations purposing record linkage for health research that have been implemented up until April 30, 2021. Results: The GDPR does not ensure total uniformity in data protection legislation across member states offering flexibility for national legislation. Exceptions to process personal data, e.g., public interest and scientific research, must be laid down in EU/EEA or national law. Differences in national interpretation caused obstacles in cross-national research and record linkage: Portugal requires written consent and ethical approval; Finland allows linkage mostly without consent through the national Social and Health Data Permit Authority; Norway when based on regional ethics committee's approval and adequate information technology safeguarding confidentiality; the Netherlands mainly bases linkage on the opt-out system and Data Protection Impact Assessment. Conclusions: Though the GDPR is the most important legal framework, national legislation execution matters most when linking cohort data with routinely collected health and education data. As national interpretation varies, legal intervention balancing individual right to informational self-determination and public good is gravely needed for health research. More harmonization across EU/EEA could be helpful but should not be detrimental in those member states which already opened a leeway for registries and research for the public good without explicit consent.Peer reviewe
- …