28 research outputs found

    A Bayesian Method for Matching Two Similar Graphs without Seeds

    Get PDF
    Approximate graph matching (AGM) refers to the problem of mapping the vertices of two structurally similar graphs, which has applications in social networks, computer vision, chemistry, and biology. Given its computational cost, AGM has mostly been limited to either small graphs (e.g., tens or hundreds of nodes), or to large graphs in combination with side information beyond the graph structure (e.g., a seed set of pre-mapped node pairs). In this paper, we cast AGM in a Bayesian framework based on a clean definition of the probability of correctly mapping two nodes, which leads to a polynomial time algorithm that does not require side information. Node features such as degree and distances to other nodes are used as fingerprints. The algorithm proceeds in rounds, such that the most likely pairs are mapped first; these pairs subsequently generate additional features in the fingerprints of other nodes. We evaluate our method over real social networks and show that it achieves a very low matching error provided the two graphs are sufficiently similar. We also evaluate our method on random graph models to characterize its behavior under various levels of node clustering

    Novel approaches to anonymity and privacy in decentralized, open settings

    Get PDF
    The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with. In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting. Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks. At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.Das Internet hat in den letzten zwei Jahrzehnten eine drastische Transformation erlebt und entwickelte sich dabei von einem einfachen Kommunikationsnetzwerk zu einer globalen Multimedia Plattform auf der Milliarden von Nutzern aktiv Informationen austauschen. Diese Transformation hat zwar einen gewaltigen Nutzen und vielfĂ€ltige Vorteile fĂŒr die Gesellschaft mit sich gebracht, hat aber gleichzeitig auch neue Herausforderungen und Gefahren fĂŒr online Privacy mit sich gebracht mit der die aktuelle Technologie nicht mithalten kann. In dieser Dissertation prĂ€sentieren wir zwei neue AnsĂ€tze fĂŒr AnonymitĂ€t und Privacy in dezentralisierten und offenen Systemen. Mit unserem ersten Ansatz untersuchen wir das Problem der Attribut- und IdentitĂ€tspreisgabe in offenen Netzwerken und entwickeln hierzu den Begriff der (k, d)-AnonymitĂ€t fĂŒr offene Systeme welchen wir extensiv analysieren und anschließend experimentell validieren. ZusĂ€tzlich untersuchen wir die Beziehung zwischen AnonymitĂ€t und Unlinkability in offenen Systemen mithilfe des Begriff der (k, d)-AnonymitĂ€t und zeigen, dass, im Gegensatz zu traditionell betrachteten, abgeschlossenen Systeme, AnonymitĂ€t innerhalb einer Online Community nicht zwingend die Unlinkability zwischen verschiedenen Online Communitys impliziert. Mit unserem zweiten Ansatz untersuchen wir die transitive Diffusion von Information die in Sozialen Netzwerken geteilt wird und sich dann durch die paarweisen Interaktionen von Nutzern durch eben dieses Netzwerk ausbreitet. Wir entwickeln eine neue Methode zur Kontrolle der Ausbreitung dieser Information durch die Minimierung ihrer Exposure, was dem Besitzer dieser Information erlaubt zu kontrollieren wie weit sich deren Information ausbreitet indem diese initial mit einer sorgfĂ€ltig gewĂ€hlten Menge von Nutzern geteilt wird. Wir implementieren die hierzu entwickelten Algorithmen und untersuchen die praktischen Grenzen der Exposure Minimierung, wenn sie von Nutzerseite fĂŒr große Netzwerke ausgefĂŒhrt werden soll. Beide hier vorgestellten AnsĂ€tze verbindet eine Neuausrichtung der Aussagen die diese bezĂŒglich Privacy treffen: wir bewegen uns weg von beweisbaren Privacy Garantien fĂŒr abgeschlossene Systeme, und machen einen Schritt zu robusten Privacy RisikoeinschĂ€tzungen fĂŒr dezentralisierte, offene Systeme in denen solche beweisbaren Garantien nicht möglich sind

    Privacy and Dynamics of Social Networks (PhD Thesis: pre-print)

    Get PDF
    Over the past decade, investigations in different fields have focused on studying and understanding real networks, ranging from biological to social to technological. These networks, called complex networks, exhibit common topological features, such as a heavy-tailed degree distribution and the small world effect. In this thesis we address two interesting aspects of complex, and more specifically, social networks: (1) users' privacy, and the vulnerability of a network to user identification, and (2) dynamics, or the evolution of the network over time. For this purpose, we base our contributions on a central tool in the study of graphs and complex networks: graph sampling. We conjecture that each network snapshot can be treated as a sample from an underlying network. Using this, a sampling process can be viewed as a way to observe dynamic networks, and to model the similarity of two correlated graphs by assuming that the graphs are samples from an underlying generator graph. We take the thesis in two directions. For the first, we focus on the privacy problem in social networks. There have been hot debates on the extent to which the release of anonymized information to the public can leak personally identifiable information (PII). Recent works have shown methods that are able to infer true user identities, under certain conditions and by relying on side information. Our approach to this problem relies on the graph structure, where we investigate the feasibility of de-anonymizing an unlabeled social network by using the structural similarity to an auxiliary network. We propose a model where the two partially overlapping networks of interest are considered samples of an underlying graph. Using such a model, first, we propose a theoretical framework for the de-anonymization problem, we obtain minimal conditions under which de-anonymization is feasible, and we establish a threshold on the similarity of the two networks above which anonymity could be lost. Then, we propose a novel algorithm based on a Bayesian framework, which is capable of matching two graphs of thousands of nodes - with no side information other than network structures. Our method has several potential applications, e.g., inferring user identities in an anonymized network by using a similar public network, cross-referencing dictionaries of different languages, correlating data from different domains, etc. We also introduce a novel privacy-preserving mechanism for social recommender systems, where users can receive accurate recommendations while hiding their profiles from an untrusted recommender server. For the second direction of this work, we focus on models for network growth, more specifically for network densification, by using a sampling process. The densification phenomenon has been recently observed in various real networks, and we argue that it can be explained simply through the way we observe (sample) the networks. We introduce a process of sampling the edges of a fixed graph, which results in the super-linear growth of edges versus nodes and the increase of the average degree as the network evolves

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum

    SWKM 2008: Social Web and Knowledge Management, Proceedings:CEUR Workshop Proceedings

    Get PDF

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Topology Reconstruction of Dynamical Networks via Constrained Lyapunov Equations

    Get PDF
    The network structure (or topology) of a dynamical network is often unavailable or uncertain. Hence, we consider the problem of network reconstruction. Network reconstruction aims at inferring the topology of a dynamical network using measurements obtained from the network. In this technical note we define the notion of solvability of the network reconstruction problem. Subsequently, we provide necessary and sufficient conditions under which the network reconstruction problem is solvable. Finally, using constrained Lyapunov equations, we establish novel network reconstruction algorithms, applicable to general dynamical networks. We also provide specialized algorithms for specific network dynamics, such as the well-known consensus and adjacency dynamics.Comment: 8 page
    corecore