214 research outputs found

    Practical Attacks Against Graph-based Clustering

    Full text link
    Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.Comment: ACM CCS 201

    Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities

    Get PDF
    BACKGROUND: Existing biological databases support a variety of queries such as keyword or definition search. However, they do not provide any measure of relevance for the instances reported, and result sets are usually sorted arbitrarily. RESULTS: We describe a system that builds upon the complex infrastructure of the Biozon database and applies methods similar to those of Google to rank documents that match queries. We explore different prominence models and study the spectral properties of the corresponding data graphs. We evaluate the information content of principal and non-principal eigenspaces, and test various scoring functions which combine contributions from multiple eigenspaces. We also test the effect of similarity data and other variations which are unique to the biological knowledge domain on the quality of the results. Query result sets are assessed using a probabilistic approach that measures the significance of coherence between directly connected nodes in the data graph. This model allows us, for the first time, to compare different prominence models quantitatively and effectively and to observe unique trends. CONCLUSION: Our tests show that the ranked query results outperform unsorted results with respect to our significance measure and the top ranked entities are typically linked to many other biological entities. Our study resulted in a working ranking system of biological entities that was integrated into Biozon at

    Distributed Graph Storage And Querying System

    Get PDF
    Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication. In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework

    Contact replacement for NMR resonance assignment

    Get PDF
    Motivation: Complementing its traditional role in structural studies of proteins, nuclear magnetic resonance (NMR) spectroscopy is playing an increasingly important role in functional studies. NMR dynamics experiments characterize motions involved in target recognition, ligand binding, etc., while NMR chemical shift perturbation experiments identify and localize protein–protein and protein–ligand interactions. The key bottleneck in these studies is to determine the backbone resonance assignment, which allows spectral peaks to be mapped to specific atoms. This article develops a novel approach to address that bottleneck, exploiting an available X-ray structure or homology model to assign the entire backbone from a set of relatively fast and cheap NMR experiments

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Privacy and spectral analysis of social network randomization

    Get PDF
    Social networks are of significant importance in various application domains. Un- derstanding the general properties of real social networks has gained much attention due to the proliferation of networked data. Many applications of networks such as anonymous web browsing and data publishing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general ap- proach for this problem is to randomize the edges in true networks, and only release the randomized networks for data analysis. Our research focuses on the development of randomization techniques such that the released networks can preserve data utility while preserving data privacy. Data privacy refers to the sensitive information in the network data. The released network data after a simple randomization could incur various disclosures including identity disclosure, link disclosure and attribute disclosure. Data utility refers to the information, features, and patterns contained in the network data. Many important features may not be preserved in the released network data after a simple randomiza- tion. In this dissertation, we develop advanced randomization techniques to better preserve data utility of the network data while still preserving data privacy. Specifi- cally we develop two advanced randomization strategies that can preserve the spectral properties of the network or can preserve the real features (e.g., modularity) of the network. We quantify to what extent various randomization techniques can protect data privacy when attackers use different attacks or have different background knowl- edge. To measure the data utility, we also develop a consistent spectral framework to measure the non-randomness (importance) of the edges, nodes, and the overall graph. Exploiting the spectral space of network topology, we further develop fraud detection techniques for various collaborative attacks in social networks. Extensive theoretical analysis and empirical evaluations are conducted to demonstrate the efficacy of our developed techniques

    Unweaving complex reactivity: graph-based tools to handle chemical reaction networks

    Get PDF
    La informació a nivell molecular obtinguda mitjançant estudis "in silico" s’ha establert com una eina essencial per a la caracterització de mecanismes de reacció complexos. A més, l’aplicabilitat de la química computacional s’ha vist substancialment ampliada a causa de l’increment continuat de la potència de càlcul disponible durant les darreres dècades. Així, no només han augmentat la precisió dels mètodes a utilitzar o la mida dels sistemes a modelitzar sinó també el grau de detall que es pot aconseguir en les descripcions mecanístiques resultants. Tanmateix, aquestes caracteritzacions més profundes, usualment assistides per tècniques d’automatització que permeten l’exploració de regions més extenses de l’espai químic, suposen un increment de la complexitat dels sistemes estudiats i per tant una limitació de la seva interpretabilitat. En aquesta Tesi s’han proposat, desenvolupat i posat a prova diverses eines amb el fi de fer el processament d’aquest tipus de xarxes de reacció químiques (CRNs) més simple i millorar la comprensió de processos reactius i catalítics complexos. Aquesta col·lecció d’eines té com fonament la utilització de grafs per modelitzar les xarxes (CRNs) corresponents, per poder fer servir els mètodes de la Teoria de Grafs (cerca de camins, isomorfismes...) en un context químic. Més concretament, aquestes eines inclouen amk-tools, una llibreria per a la visualització interactiva de xarxes de reacció descobertes de manera automàtica, gTOFfee, per a l’aplicació del "energy span model" pel càlcul de la freqüència de recanvi de cicles catalítics complexos calculats computacionalment, i OntoRXN, una ontologia per descriure CRNs de forma semàntica, integrant la topologia de la xarxa i la informació calculada en una única entitat organitzada segons els principis del "Semantic Data".La información a nivel molecular obtenida por medio de estudios "in silico" se ha convertido en una herramienta indispensable para la caracterización y comprensión de mecanismos de reacción complejos. Asimismo, la aplicabilidad de la química computacional se ha ampliado sustancialmente como consecuencia del continuo incremento de la potencia de cálculo durante las últimas décadas. Así, no sólo han aumentado la precisión de los métodos o el tamaño de los sistemas modelizables, sino también el grado de detalle en la descripción mecanística. Sin embargo, aumentar la profundidad de la caracterización de un sistema químico, usualmente a través de técnicas de automatización que permiten explorar ecciones más extensas del espacio químico, supone un aumento en la complejidad de los sistemas resultantes, dificultando la interpretación de los resultados. En esta Tesis se han propuesto, desarrollado y puesto a prueba distintas herramientas para simplificar el procesado de este tipo de redes de reacción químicas (CRNs), con el fin de mejorar la comprensión de procesos reactivos y catalíticos complejos. Este conjunto de herramientas se basa en el uso de grafos para modelizar las redes (CRNs) correspondientes, con tal de poder emplear los métodos de la Teoría de Grafos (búsqueda de caminos, isomorfismos...) bajo un contexto químico. Concretamente, estas herramientas incluyen amk-tools, para la visualización interactiva de redes de reacción descubiertas automáticamente, gTOFfee, para la aplicación del “energy span model” para calcular la frecuencia de recambio de ciclos catalíticos complejos caracterizados computacionalmente, y OntoRXN, una ontología para describir CRNs de manera semántica, integrando la topología de la red y la información calculada en una única entidad organizada bajo los principios del “Semantic Data”.The molecular-level insights gathered through "in silico" studies have become an essential asset for the elucidation and understanding of complex reaction mechanisms. Indeed, the applicability of computational chemistry has strongly widened due to the vast increase in computational power along the last decades. In this sense, not only the accuracy of the applied methods or the size of the target systems have increased, but also the level of detail attained for the mechanistic description. However, performing deeper descriptions of chemical systems, most often resorting to automation techniques that allow to easily explore larger parts of the chemical space, comes at the cost of also augmenting their complexity, rendering the results much harder to interpret. Throughout this Thesis, we have proposed, developed and tested a collection of tools aiming to process this kind of complex chemical reaction networks (CRNs), in order to provide new insights on reactive and catalytic processes. All of these tools employ graphs to model the target CRNs, in order to be able to use the methods of Graph Theory (e.g. path searches, isomorphisms...) in a chemical context. The tools that are discussed include amk-tools, a framework for the interactive visualization of automatically discovered reaction networks, gTOFfee, for the application of the energy span model to compute the turnover frequency of computationally characterized catalytic cycles, and OntoRXN, an ontology for the description of CRNs in a semantic manner integrating network topology and calculation information in a single, highly-structured entity

    Bayesian methods for small molecule identification

    Get PDF
    Confident identification of small molecules remains a major challenge in untargeted metabolomics, natural product research and related fields. Liquid chromatography-tandem mass spectrometry is a predominant technique for the high-throughput analysis of small molecules and can detect thousands of different compounds in a biological sample. The automated interpretation of the resulting tandem mass spectra is highly non-trivial and many studies are limited to re-discovering known compounds by searching mass spectra in spectral reference libraries. But these libraries are vastly incomplete and a large portion of measured compounds remains unidentified. This constitutes a major bottleneck in the comprehensive, high-throughput analysis of metabolomics data. In this thesis, we present two computational methods that address different steps in the identification process of small molecules from tandem mass spectra. ZODIAC is a novel method for de novo that is, database-independent molecular formula annotation in complete datasets. It exploits similarities of compounds co-occurring in a sample to find the most likely molecular formula for each individual compound. ZODIAC improves on the currently best-performing method SIRIUS; on one dataset by 16.5 fold. We show that de novo molecular formula annotation is not just a theoretical advantage: We discover multiple novel molecular formulas absent from PubChem, one of the biggest structure databases. Furthermore, we introduce a novel scoring for CSI:FingerID, a state-of-the-art method for searching tandem mass spectra in a structure database. This scoring models dependencies between different molecular properties in a predicted molecular fingerprint via Bayesian networks. This problem has the unusual property, that the marginal probabilities differ for each predicted query fingerprint. Thus, we need to apply Bayesian networks in a novel, non-standard fashion. Modeling dependencies improves on the currently best scoring

    Graph algorithms for NMR resonance assignment and cross-link experiment planning

    Get PDF
    The study of three-dimensional protein structures produces insights into protein function at the molecular level. Graphs provide a natural representation of protein structures and associated experimental data, and enable the development of graph algorithms to analyze the structures and data. This thesis develops such graph representations and algorithms for two novel applications: structure-based NMR resonance assignment and disulfide cross-link experiment planning for protein fold determination. The first application seeks to identify correspondences between spectral peaks in NMR data and backbone atoms in a structure (from x-ray crystallography or homology modeling), by computing correspondences between a contact graph representing the structure and an analogous but very noisy and ambiguous graph representing the data. The assignment then supports further NMR studies of protein dynamics and protein-ligand interactions. A hierarchical grow-and-match algorithm was developed for smaller assignment problems, ensuring completeness of assignment, while a random graph approach was developed for larger problems, provably determining unique matches in polynomial time with high probability. Test results show that our algorithms are robust to typical levels of structural variation, noise, and missings, and achieve very good overall assignment accuracy. The second application aims to rapidly determine the overall organization of secondary structure elements of a target protein by probing it with a set of planned disulfide cross-links. A set of informative pairs of secondary structure elements is selected from graphs representing topologies of predicted structure models. For each pair in this ``fingerprint\u27\u27, a set of informative disulfide probes is selected from graphs representing residue proximity in the models. Information-theoretic planning algorithms were developed to maximize information gain while minimizing experimental complexity, and Bayes error plan assessment frameworks were developed to characterize the probability of making correct decisions given experimental data. Evaluation of the approach on a number of structure prediction case studies shows that the optimized plans have low risk of error while testing only a very small portion of the quadratic number of possible cross-link candidates

    Digital Forensics Event Graph Reconstruction

    Get PDF
    Ontological data representation and data normalization can provide a structured way to correlate digital artifacts. This can reduce the amount of data that a forensics examiner needs to process in order to understand the sequence of events that happened on the system. However, ontology processing suffers from large disk consumption and a high computational cost. This paper presents Property Graph Event Reconstruction (PGER), a novel data normalization and event correlation system that leverages a native graph database to improve the speed of queries common in ontological data. PGER reduces the processing time of event correlation grammars and maintains accuracy over a relational database storage format
    corecore