486 research outputs found

    대용량 의생물학 링크드 데이터를 위한 그래프 경로 탐색

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과, 2017. 2. 김홍기.A drug could give rise to an adverse effect when combined with another particular drug. Addressing the underlying causes of the adverse effects is crucial for researchers to develop new drugs and for clinicians to prescribe medicine. Most existing approaches attempt to identify a set of target genes for which drugs are most effective, which provides insufficient information regarding these causes in terms of biological dynamics. Drugs should instead be considered as participants in activating a sequence of pathways that lead to some effects. I believe that the causes can better be understood by such linked pathways. Therefore, the purpose of this thesis is to develop algorithms and tools that can be used to discover a sequence of pathways that is activated by a particular drug combination. Furthermore, these algorithms are required to be scalable to manage massive biomedical Linked Data because up-to-date results of biomedical research are increasingly available in Linked Data. My hypothesis is that for a drug combination, when a drug up-regulates particular pathways in one direction and another drug down-regulates the same pathways in an opposite direction, adverse effects may occur by the drug combination. In this regard, the problem of revealing the causes of adverse effects of drug combinations is cast into the problem of discovering paths of a sequence of linked pathways that begins and ends at the genes that the given drugs target. Therefore, the scalable graph path discovery and matching algorithms are devised such that they work with a distributed computing environment. A pathway graph model is defined to integrate diverse biomedical datasets and a visualization tool is implemented to provide biomedical researchers and clinicians with intuitive interfaces for revealing the causes of the adverse effects. An algorithm for the shortest graph path discovery is proposed. An existing relational database approach is adapted to address the shortest graph path discovery in a distributed computing framework, in particular, Spark. The 2-hop reachability index is exploited to prune non-reachable paths during discovery computation. A vertex re-labeling technique is proposed to reduce the size of the 2-hop reachability index. Experimental results show that the proposed approach can successfully manage a large graph, which previous studies have failed to do. The discovered shortest graph path can be transformed into a graph path query to find another similar graph path. To achieve this, a MapReduce algorithm for graph path matching, based on multi-way joins, is proposed. A signature encoding technique is devised to prune intermediate data that is not relevant to the given query. Experiments against RDF (Resource Description Framework) datasets show that SPARQL query processing is faster than the state-of-the-art approaches. To adapt these algorithms into the problem of drug combinations causing adverse effects, a novel pathway graph model is proposed. In particular, a pathway relationship model is describeddirected links between pathways are established using protein–protein interactions and up/down regulations between genes. A prototype system based on a visualization framework is implemented and applied to a pathway graph that is built on the basis of several biomedical Linked Data (e.g. Reactome, KEGG, BioGrid, STRING and etc). A list of candidate drug combinations is obtained using the proposed system, which is compared with known drug-drug combinations available in DrugBank. A scalable graph path discovery solution is proposed in this thesis. Distributed computing frameworks and several index structures are exploited to efficiently handle massive graphs. A pathway graph model is defined and a prototype system for biomedical researchers is implemented to apply the algorithms to the problem of drug combinations causing adverse effects. In future works, the solution will be generalized to address the temporal organization of signaling pathways, thereby enabling the causes of adverse effects of drug combination to be better understood.I. Introduction 1 1.1 Background and Motivation 1 1.2 Contributions 4 1.2.1 Shortest Graph Path Discovery based on Reachability Index 4 1.2.2 Graph Path Matching based on Signature Encoding 5 1.2.3 Application to Biomedical Linked Data 6 1.3 Thesis Organization 6 II. Preliminaries and RelatedWork 9 2.1 Graph 9 2.2 Graph Path 10 2.3 Acyclic Transformation 11 2.4 Reachability 11 2.5 Distributed Computing Frameworks 12 2.6 RDF & SPARQL 12 2.7 SPARQL Processing Engines 14 III. Shortest Graph Path Discovery based on Reachability Index 17 3.1 Introduction 17 3.2 Space Reduction of Reachability Index 18 3.2.1 Introduction 18 3.2.2 Related Work 21 3.2.3 The Proposed Approach 24 3.2.4 Theoretical Analysis 25 3.2.5 Experimental Results 31 3.2.6 Conclusion and Future Work 33 3.3 Shortest Path Discovery 40 3.3.1 Introduction 40 3.3.2 FEM 41 3.3.3 FEM-SR 42 3.3.4 Theoretical Analysis 46 3.3.5 Experimental Results 51 3.3.6 Federated Shortest Path Discovery 53 3.4 Conclusion 55 IV. Graph Path Matching based on Signature Encoding 61 4.1 Introduction 61 4.2 Related Work 67 4.3 Limitations of MapReduce-based SPARQL engines 68 4.4 SigMR 69 4.5 Index Structure 70 4.5.1 Encoding Joined Triples 72 4.6 Index Building 76 4.7 Query Processing 83 4.8 Theoretical Analysis 88 4.8.1 Cost Model 89 4.8.2 Correctness 92 4.9 Experiments 94 4.9.1 Index Building Time and Space Requirements 95 4.9.2 Query Execution Time 98 4.9.3 Effect of Signature Encoding 100 4.9.4 Effect of the Size of Join Matrix 100 4.10 Conclusion 102 V. Application to Biomedical Linked Data 105 5.1 Introduction 105 5.2 Related Work 106 5.3 Data Model 108 5.4 CyHadoop 116 5.5 Scenario 119 5.6 Preliminary Results 120 5.7 Future Directions 121 VI. Conclusion 129 References 131 Appendix 141 초록 153Docto

    Adaptive Constraint Solving for Information Flow Analysis

    Get PDF
    In program analysis, unknown properties for terms are typically represented symbolically as variables. Bound constraints on these variables can then specify multiple optimisation goals for computer programs and nd application in areas such as type theory, security, alias analysis and resource reasoning. Resolution of bound constraints is a problem steeped in graph theory; interdependencies between the variables is represented as a constraint graph. Additionally, constants are introduced into the system as concrete bounds over these variables and constants themselves are ordered over a lattice which is, once again, represented as a graph. Despite graph algorithms being central to bound constraint solving, most approaches to program optimisation that use bound constraint solving have treated their graph theoretic foundations as a black box. Little has been done to investigate the computational costs or design e cient graph algorithms for constraint resolution. Emerging examples of these lattices and bound constraint graphs, particularly from the domain of language-based security, are showing that these graphs and lattices are structurally diverse and could be arbitrarily large. Therefore, there is a pressing need to investigate the graph theoretic foundations of bound constraint solving. In this thesis, we investigate the computational costs of bound constraint solving from a graph theoretic perspective for Information Flow Analysis (IFA); IFA is a sub- eld of language-based security which veri es whether con dentiality and integrity of classified information is preserved as it is manipulated by a program. We present a novel framework based on graph decomposition for solving the (atomic) bound constraint problem for IFA. Our approach enables us to abstract away from connections between individual vertices to those between sets of vertices in both the constraint graph and an accompanying security lattice which defines ordering over constants. Thereby, we are able to achieve significant speedups compared to state-of-the-art graph algorithms applied to bound constraint solving. More importantly, our algorithms are highly adaptive in nature and seamlessly adapt to the structure of the constraint graph and the lattice. The computational costs of our approach is a function of the latent scope of decomposition in the constraint graph and the lattice; therefore, we enjoy the fastest runtime for every point in the structure-spectrum of these graphs and lattices. While the techniques in this dissertation are developed with IFA in mind, they can be extended to other application of the bound constraints problem, such as type inference and program analysis frameworks which use annotated type systems, where constants are ordered over a lattice

    Proceedings of the 8th Cologne-Twente Workshop on Graphs and Combinatorial Optimization

    No full text
    International audienceThe Cologne-Twente Workshop (CTW) on Graphs and Combinatorial Optimization started off as a series of workshops organized bi-annually by either Köln University or Twente University. As its importance grew over time, it re-centered its geographical focus by including northern Italy (CTW04 in Menaggio, on the lake Como and CTW08 in Gargnano, on the Garda lake). This year, CTW (in its eighth edition) will be staged in France for the first time: more precisely in the heart of Paris, at the Conservatoire National d’Arts et Métiers (CNAM), between 2nd and 4th June 2009, by a mixed organizing committee with members from LIX, Ecole Polytechnique and CEDRIC, CNAM

    Fast Routing Table Construction Using Small Messages

    Full text link
    We describe a distributed randomized algorithm computing approximate distances and routes that approximate shortest paths. Let n denote the number of nodes in the graph, and let HD denote the hop diameter of the graph, i.e., the diameter of the graph when all edges are considered to have unit weight. Given 0 < eps <= 1/2, our algorithm runs in weak-O(n^(1/2 + eps) + HD) communication rounds using messages of O(log n) bits and guarantees a stretch of O(eps^(-1) log eps^(-1)) with high probability. This is the first distributed algorithm approximating weighted shortest paths that uses small messages and runs in weak-o(n) time (in graphs where HD in weak-o(n)). The time complexity nearly matches the lower bounds of weak-Omega(sqrt(n) + HD) in the small-messages model that hold for stateless routing (where routing decisions do not depend on the traversed path) as well as approximation of the weigthed diameter. Our scheme replaces the original identifiers of the nodes by labels of size O(log eps^(-1) log n). We show that no algorithm that keeps the original identifiers and runs for weak-o(n) rounds can achieve a polylogarithmic approximation ratio. Variations of our techniques yield a number of fast distributed approximation algorithms solving related problems using small messages. Specifically, we present algorithms that run in weak-O(n^(1/2 + eps) + HD) rounds for a given 0 < eps <= 1/2, and solve, with high probability, the following problems: - O(eps^(-1))-approximation for the Generalized Steiner Forest (the running time in this case has an additive weak-O(t^(1 + 2eps)) term, where t is the number of terminals); - O(eps^(-2))-approximation of weighted distances, using node labels of size O(eps^(-1) log n) and weak-O(n^(eps)) bits of memory per node; - O(eps^(-1))-approximation of the weighted diameter; - O(eps^(-3))-approximate shortest paths using the labels 1,...,n.Comment: 40 pages, 2 figures, extended abstract submitted to STOC'1

    ClouDiA: a deployment advisor for public clouds

    Get PDF
    An increasing number of distributed data-driven applications are moving into shared public clouds. By sharing resources and oper-ating at scale, public clouds promise higher utilization and lower costs than private clusters. To achieve high utilization, however, cloud providers inevitably allocate virtual machine instances non-contiguously, i.e., instances of a given application may end up in physically distant machines in the cloud. This allocation strategy can lead to large differences in average latency between instances. For a large class of applications, this difference can result in signif-icant performance degradation, unless care is taken in how applica-tion components are mapped to instances. In this paper, we propose ClouDiA, a general deployment ad-visor that selects application node deployments minimizing either (i) the largest latency between application nodes, or (ii) the longest critical path among all application nodes. ClouDiA employs mixed-integer programming and constraint programming techniques to ef-ficiently search the space of possible mappings of application nodes to instances. Through experiments with synthetic and real applica-tions in Amazon EC2, we show that our techniques yield a 15 % to 55 % reduction in time-to-solution or service response time, without any need for modifying application code. 1

    Subject Index Volumes 1–200

    Get PDF

    Optimizing and Reoptimizing: tackling static and dynamic combinatorial problems

    Get PDF
    As suggested by the title, in this thesis both static and dynamic problems of Operations Research will be addressed by either designing new procedures or adapting well-known algorithmic schemes. Specifically, the first part of the thesis is devoted to the discussion of three variants of the widely studied Shortest Path Problem, one of which is defined on dynamic graphs. Namely, first the Reoptimization of Shortest Paths in case of multiple and generic cost changes is dealt with an exact algorithm whose performance is compared with Dijkstra's label setting procedure in order to detect which approach has to be preferred. Secondly, the k-Color Shortest Path Problem is tackled. It is a recent problem, defined on an edge-constrained graph, for which a Dynamic Programming algorithm is proposed here; its performance is compared with the state of the art solution approach, namely a Branch & Bound procedure. Finally, the Resource Constrained Clustered Shortest Path Tree Problem is presented. It is a newly defined problem for which both a mathematical model and a Branch & Price procedure are detailed here. Moreover, the performance of this solution approach is compared with that of CPLEX solver. Furthermore, in the first part of the thesis, also the Path Planning in Urban Air Mobility, is discussed by considering both the definition of the Free-Space Maps and the computation of the trajectories. For the former purpose, three different but correlated discretization methods are described; as for the latter, a two steps resolution, offline and online, of the resulting shortest path problems is performed. In addition, it is checked whether the reoptimization algorithm can be used in the online step. In the second part of this thesis, the recently studied Additive Manufacturing Machine Scheduling Problem with not identical machines is presented. Specifically, a Reinforcement Learning Iterated Local Search meta-heuristic featuring a Q-learning Variable Neighbourhood Search is described to solve this problem and its performance is compared with the one of CPLEX solver. It is worthwhile mentioning that, for each of the proposed approaches, a thorough experimentation is performed and each Chapter is equipped with a detailed analysis of the results in order to appraise the performance of the method and to detect its limits

    A q-SDH-based Graph Signature Scheme on Full-Domain Messages with Efficient Protocols

    Get PDF
    A graph signature scheme is a digital signature scheme that allows a recipient to obtain a signature on a graph and subsequently prove properties thereof in zero-knowledge proofs of knowledge. While known to be expressive enough to encode statements from NP languages, one main use of graph signatures is in topology certification and confidentiality-preserving security assurance. In this paper, we present an efficient and provably secure graph signature scheme in the standard model with tight reduction. Based on the MoniPoly attribute-based credential system, this new graph signature scheme offers zero-knowledge proofs of possession of the signature itself as well as confidentiality-preserving show proofs on logical statements such as the existence of vertices, graph connectivity or isolation
    corecore