1,174 research outputs found

    Signal Confidence Limits from a Neural Network Data Analysis

    Get PDF
    This paper deals with a situation of some importance for the analysis of experimental data via Neural Network (NN) or similar devices: Let NN data be given, such that N=Ns+NbN=N_s+N_b, where NsN_s is the number of signals, NbN_b the number of background events, both unknown. Assume that a NN has been trained, such that it will tag signals with efficiency FsF_s, (0<Fs<1)(0<F_s<1) and background data with FbF_b, (0<Fb<1)(0<F_b<1). Applying the NN yields NYN^Y tagged events. We demonstrate that the knowledge of NYN^Y is sufficient to calculate confidence bounds for the signal likelihood, which have the same statistical interpretation as the Clopper-Pearson bounds for the well-studied case of direct signal observation. Subsequently, we discuss rigorous bounds for the a-posteriori distribution function of the signal probability, as well as for the (closely related) likelihood that there are NsN_s signals in the data. We compare them with results obtained by starting off with a maximum entropy type assumption for the a-priori likelihood that there are NsN_s signals in the data and applying the Bayesian theorem. Difficulties are encountered with the latter method.Comment: 17 pages, 10 eps figures, LaTeX, major revisions due to referee Repor

    Sampling random graph homomorphisms and applications to network data analysis

    Full text link
    A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph FF into a large network G\mathcal{G}. We propose two complementary MCMC algorithms for sampling a random graph homomorphisms and establish bounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neigborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut distance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also apply our framework for network clustering and classification problems using the Facebook100 dataset and Word Adjacency Networks of a set of classic novels.Comment: 51 pages, 33 figures, 2 table

    New approaches in network data analysis

    Get PDF
    This thesis introduces two extensions to statistical approaches improving modeling and estimation in the field of network data analysis. The first contributing publication focuses on cross-sectional networks based on Markov graphs, whereas the second takes the evolution of networks with dynamical structure into account. Analyzing network data is challenging in terms of modeling and computation due to large and dependent data sets. The dissertation starts with an overview of network data in general and gives an introduction to the well-known model framework of exponential random graphs models with its dependence assumptions, estimation routines, challenges, and solution approaches. At the end of the introduction, main ideas of dynamic network models, the profile likelihood approach for multivariate counting processes for network data, and the analogy of the Cox proportional hazards and Poisson model with semiparametric estimation are presented. The first part of this work proposes an extension for sampling Markov graphs as a subclass of exponential random graph models in parallel to accelerate computation time in simulation-based routines. The estimation of network models, especially of large networks, is demanding and requires Markov chain Monte Carlo simulations. This publication recommends to exploit the conditional independence structure in networks to make use of parallel draws. This idea is applied to a large ego network of Facebook friendships, where an additional log transformation of network statistics accounts for degeneracy problems. This extension is implemented in the open source R package pergm, available on GitHub and a short introduction to the main functionalities is elaborated on in the thesis. The second part of this work focuses on dynamic networks. In comparison to cross-sectional networks from the first part, the development and application of longitudinal network data concentrates on modeling changes of relations. Therefore, a profile likelihood approach to model time-stamped event data is combined with a semiparametric approach including covariates built from network history. This flexible semiparametric approach is applicable to large networks because standard software can be used for estimation due to the analogy of the Cox proportional hazards and Poisson model with artificial data structure. This extended method is applied to patent collaboration data of patents submitted jointly by inventors with German residency between 2000 and 2013. Based on penalized smoothing techniques, we include time dependent network statistics and exogenous covariates to capture internal and external effects

    Social network data analysis for event detection

    Get PDF
    Cities concentrate enough Social Network (SN) activity to empower rich models. We present an approach to event discovery based on the information provided by three SN, minimizing the data properties used to maximize the total amount of usable data. We build a model of the normal city behavior which we use to detect abnormal situations (events). After collecting half a year of data we show examples of the events detected and introduce some applications.Peer ReviewedPostprint (published version

    New approaches in statistical network data analysis

    Get PDF
    This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhĂ€ngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binĂ€ren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehĂ€ngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar

    New approaches in statistical network data analysis

    Get PDF
    This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhĂ€ngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binĂ€ren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehĂ€ngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar

    Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis

    Get PDF
    Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are pushing towards more cost-efficient architectures with better resource provisioning. In this paper we study the feasibility of using disaggregated architectures for intensive data applications, in contrast to the monolithic approach of server-oriented architectures. Particularly, we have tested a proactive network analysis system in which the workload demands are highly variable. In the context of the dReDBox disaggregated architecture, the results show that the overhead caused by using remote memory resources is significant, between 66\% and 80\%, but we have also observed that the memory usage is one order of magnitude higher for the stress case with respect to average workloads. Therefore, dimensioning memory for the worst case in conventional systems will result in a notable waste of resources. Finally, we found that, for the selected use case, parallelism is limited by memory. Therefore, using a disaggregated architecture will allow for increased parallelism, which, at the same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper will be presented during the IEEE International Conference on High Performance Computing and Communications in Bangkok, Thailand. 18 - 20 December, 2017. To be published in the conference proceeding

    NOESIS: A Framework for Complex Network Data Analysis

    Get PDF
    Network data mining has attracted a lot of attention since a large number of real-world problems have to deal with complex network data. In this paper, we present NOESIS, an open-source framework for network-based data mining. NOESIS features a large number of techniques and methods for the analysis of structural network properties, network visualization, community detection, link scoring, and link prediction. ­e proposed framework has been designed following solid design principles and exploits parallel computing using structured parallel programming. NOESIS also provides a stand-alone graphical user interface allowing the use of advanced software analysis techniques to users without prior programming experience. ­is framework is available under a BSD open-source software license.The NOESIS project was partially supported by the Spanish Ministry of Economy and the European Regional Development Fund (FEDER), under grant TIN2012–36951, and the Spanish Ministry of Education under the program “Ayudas para contratos predoctorales para la formación de doctores 2013” (predoctoral grant BES–2013–064699)
    • 

    corecore