1,174 research outputs found
Signal Confidence Limits from a Neural Network Data Analysis
This paper deals with a situation of some importance for the analysis of
experimental data via Neural Network (NN) or similar devices: Let data be
given, such that , where is the number of signals, the
number of background events, both unknown. Assume that a NN has been trained,
such that it will tag signals with efficiency , and background
data with , . Applying the NN yields tagged events. We
demonstrate that the knowledge of is sufficient to calculate confidence
bounds for the signal likelihood, which have the same statistical
interpretation as the Clopper-Pearson bounds for the well-studied case of
direct signal observation. Subsequently, we discuss rigorous bounds for the
a-posteriori distribution function of the signal probability, as well as for
the (closely related) likelihood that there are signals in the data. We
compare them with results obtained by starting off with a maximum entropy type
assumption for the a-priori likelihood that there are signals in the data
and applying the Bayesian theorem. Difficulties are encountered with the latter
method.Comment: 17 pages, 10 eps figures, LaTeX, major revisions due to referee
Repor
Sampling random graph homomorphisms and applications to network data analysis
A graph homomorphism is a map between two graphs that preserves adjacency
relations. We consider the problem of sampling a random graph homomorphism from
a graph into a large network . We propose two complementary
MCMC algorithms for sampling a random graph homomorphisms and establish bounds
on their mixing times and concentration of their time averages. Based on our
sampling algorithms, we propose a novel framework for network data analysis
that circumvents some of the drawbacks in methods based on independent and
neigborhood sampling. Various time averages of the MCMC trajectory give us
various computable observables, including well-known ones such as homomorphism
density and average clustering coefficient and their generalizations.
Furthermore, we show that these network observables are stable with respect to
a suitably renormalized cut distance between networks. We provide various
examples and simulations demonstrating our framework through synthetic
networks. We also apply our framework for network clustering and classification
problems using the Facebook100 dataset and Word Adjacency Networks of a set of
classic novels.Comment: 51 pages, 33 figures, 2 table
New approaches in network data analysis
This thesis introduces two extensions to statistical approaches improving modeling and estimation in the field of network data analysis. The first contributing publication focuses on cross-sectional networks based on Markov graphs, whereas the second takes the evolution of networks with dynamical structure into account.
Analyzing network data is challenging in terms of modeling and computation due to large and dependent data sets. The dissertation starts with an overview of network data in general and gives an introduction to the well-known model framework of exponential random graphs models with its dependence assumptions, estimation routines, challenges, and solution approaches. At the end of the introduction, main ideas of dynamic network models, the profile likelihood approach for multivariate counting processes for network data, and the analogy of the Cox proportional hazards and Poisson model with semiparametric estimation are presented.
The first part of this work proposes an extension for sampling Markov graphs as a subclass of exponential random graph models in parallel to accelerate computation time in simulation-based routines. The estimation of network models, especially of large networks, is demanding and requires Markov chain Monte Carlo simulations. This publication recommends to exploit the conditional independence structure in networks to make use of parallel draws. This idea is applied to a large ego network of Facebook friendships, where an additional log transformation of network statistics accounts for degeneracy problems. This extension is implemented in the open source R package pergm, available on GitHub and a short introduction to the main functionalities is elaborated on in the thesis.
The second part of this work focuses on dynamic networks. In comparison to cross-sectional networks from the first part, the development and application of longitudinal network data concentrates on modeling changes of relations. Therefore, a profile likelihood approach to model time-stamped event data is combined with a semiparametric approach including covariates built from network history. This flexible semiparametric approach is applicable to large networks because standard software can be used for estimation due to the analogy of the Cox proportional hazards and Poisson model with artificial data structure. This extended method is applied to patent collaboration data of patents submitted jointly by inventors with German residency between 2000 and 2013. Based on penalized smoothing techniques, we include time dependent network statistics and exogenous covariates to capture internal and external effects
Social network data analysis for event detection
Cities concentrate enough Social Network (SN) activity to empower rich models. We present an approach to event discovery based on the information provided by three SN, minimizing the data properties used to maximize the total amount of usable data. We build a model of the normal city behavior which we use to detect abnormal situations (events). After collecting half a year of data we show examples of the events detected and introduce some applications.Peer ReviewedPostprint (published version
New approaches in statistical network data analysis
This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems.
A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed.
To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications.
Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles.
All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschlieĂend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen.
Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhÀngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binÀren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert.
Um der Arbeit einen generellen Kontext zu geben, wird den angehÀngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben.
Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von GroĂwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Ăbersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen.
Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar
New approaches in statistical network data analysis
This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems.
A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed.
To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications.
Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles.
All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschlieĂend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen.
Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhÀngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binÀren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert.
Um der Arbeit einen generellen Kontext zu geben, wird den angehÀngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben.
Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von GroĂwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Ăbersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen.
Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar
Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis
Traditional data centers are designed with a rigid architecture of
fit-for-purpose servers that provision resources beyond the average workload in
order to deal with occasional peaks of data. Heterogeneous data centers are
pushing towards more cost-efficient architectures with better resource
provisioning. In this paper we study the feasibility of using disaggregated
architectures for intensive data applications, in contrast to the monolithic
approach of server-oriented architectures. Particularly, we have tested a
proactive network analysis system in which the workload demands are highly
variable. In the context of the dReDBox disaggregated architecture, the results
show that the overhead caused by using remote memory resources is significant,
between 66\% and 80\%, but we have also observed that the memory usage is one
order of magnitude higher for the stress case with respect to average
workloads. Therefore, dimensioning memory for the worst case in conventional
systems will result in a notable waste of resources. Finally, we found that,
for the selected use case, parallelism is limited by memory. Therefore, using a
disaggregated architecture will allow for increased parallelism, which, at the
same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper
will be presented during the IEEE International Conference on High
Performance Computing and Communications in Bangkok, Thailand. 18 - 20
December, 2017. To be published in the conference proceeding
NOESIS: A Framework for Complex Network Data Analysis
Network data mining has attracted a lot of attention since a large number of real-world problems have to deal with complex
network data. In this paper, we present NOESIS, an open-source framework for network-based data mining. NOESIS features a
large number of techniques and methods for the analysis of structural network properties, network visualization, community
detection, link scoring, and link prediction. Âe proposed framework has been designed following solid design principles and
exploits parallel computing using structured parallel programming. NOESIS also provides a stand-alone graphical user interface
allowing the use of advanced software analysis techniques to users without prior programming experience. Âis framework is
available under a BSD open-source software license.The NOESIS project was partially supported by the Spanish
Ministry of Economy and the European Regional Development
Fund (FEDER), under grant TIN2012â36951, and the
Spanish Ministry of Education under the program âAyudas
para contratos predoctorales para la formaciĂłn de doctores
2013â (predoctoral grant BESâ2013â064699)
- âŠ