115,262 research outputs found

    Composition of Biochemical Networks using Domain Knowledge

    Get PDF
    Graph composition has applications in a variety of practical applications. In drug development, for instance, in order to understand possible drug interactions, one has to merge known networks and examine topological variants arising from such composition. Similarly, the design of sensor nets may use existing network infrastructures, and the superposition of one network on another can help with network design and optimisation. The problem of network composition has not received much attention in algorithm and database research. Here, we work with biological networks encoded in Systems Biology Markup Language (SBML), based on XML syntax. We focus on XML merging and examine the algorithmic and performance challenges we encountered in our work and the possible solutions to the graph merge problem. We show that our XML graph merge solution performs well in practice and improves on the existing toolsets. This leads us into future work directions and the plan of research which will aim to implement graph merging primitives using domain knowledge to perform composition and decomposition on specific graphs in the biological domain

    ON LEARNING COMPOSABLE AND DECOMPOSABLE GENERATIVE MODELS USING PRIOR INFORMATION

    Get PDF
    Within the field of machining learning, supervised learning has gained much success recently, and the research focus moves towards unsupervised learning. A generative model is a powerful way of unsupervised learning that models data distribution. Deep generative models like generative adversarial networks (GANs), can generate high-quality samples for various applications. However, these generative models are not easy to understand. While it is easy to generate samples from these models, the breadth of the samples that can be generated is difficult to ascertain. Further, most existing models are trained from scratch and do not take advantage of the compositional nature of the data. To address these deficiencies, I propose a composition and decomposition framework for generative models. This framework includes three types of components: part generators, composition operation, and decomposition operation. In the framework, a generative model could have multiple part generators that generate different parts of a sample independently. What a part generator should generate is explicitly defined by users. This explicit ”division of responsibility” provides more modularity to the whole system. Similar to software design, this modular modeling makes each module (part generators) more reusable and allows users to build increasingly complex generative models from simpler ones. The composition operation composes the parts from the part generators into a whole sample, whereas the decomposition operation is an inversed operation of composition. On the other hand, given the composed data, components of the framework are not necessarily identifiable. Inspired by other signal decomposition methods, we incorporate prior information to the model to solve this problem. We show that we can identify all of the components by incorporating prior information about one or more of the components. Furthermore, we show both theoretically and experimentally how much prior information is needed to identify the components of the model. Concerning the applications of this framework, we apply the framework to sparse dictionary learning (SDL) and offer our dictionary learning method, MOLDL. With MOLDL, we can easily include prior information about part generators; thus, we learn a generative model that results in a better signal decomposition operation. The experiments show our method decomposes ion mass signals more accurately than other signal decomposition methods. Further, we apply the framework to generative adversarial networks (GANs). Our composition/decomposition GAN learns the foreground part and background part generators that are responsible for different parts of the data. The resulting generators are easier to control and understand. Also, we show both theoretically and experimentally how much prior information is needed to identify different components of the framework. Precisely, we show that we can learn a reasonable part generator given only the composed data and composition operation. Moreover, we show the composable generators has better performance than their non-composable generative counterparts. Lastly, we propose two use cases that show transfer learning is feasible under this framework.Doctor of Philosoph

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Computational Complexity of Atomic Chemical Reaction Networks

    Full text link
    Informally, a chemical reaction network is "atomic" if each reaction may be interpreted as the rearrangement of indivisible units of matter. There are several reasonable definitions formalizing this idea. We investigate the computational complexity of deciding whether a given network is atomic according to each of these definitions. Our first definition, primitive atomic, which requires each reaction to preserve the total number of atoms, is to shown to be equivalent to mass conservation. Since it is known that it can be decided in polynomial time whether a given chemical reaction network is mass-conserving, the equivalence gives an efficient algorithm to decide primitive atomicity. Another definition, subset atomic, further requires that all atoms are species. We show that deciding whether a given network is subset atomic is in NP\textsf{NP}, and the problem "is a network subset atomic with respect to a given atom set" is strongly NP\textsf{NP}-Complete\textsf{Complete}. A third definition, reachably atomic, studied by Adleman, Gopalkrishnan et al., further requires that each species has a sequence of reactions splitting it into its constituent atoms. We show that there is a polynomial-time algorithm\textbf{polynomial-time algorithm} to decide whether a given network is reachably atomic, improving upon the result of Adleman et al. that the problem is decidable\textbf{decidable}. We show that the reachability problem for reachably atomic networks is Pspace\textsf{Pspace}-Complete\textsf{Complete}. Finally, we demonstrate equivalence relationships between our definitions and some special cases of another existing definition of atomicity due to Gnacadja

    Automatic Mechanism Generation for Pyrolysis of Di-Tert-Butyl Sulfide

    Get PDF
    The automated Reaction Mechanism Generator (RMG), using rate parameters derived from ab initio CCSD(T) calculations, is used to build reaction networks for the thermal decomposition of di-tert-butyl sulfide. Simulation results were compared with data from pyrolysis experiments with and without the addition of a cyclohexene inhibitor. Purely free-radical chemistry did not properly explain the reactivity of di-tert-butyl sulfide, as the previous experimental work showed that the sulfide decomposed via first-order kinetics in the presence and absence of the radical inhibitor. The concerted unimolecular decomposition of di-tert-butyl sulfide to form isobutene and tert-butyl thiol was found to be a key reaction in both cases, as it explained the first-order sulfide decomposition. The computer-generated kinetic model predictions quantitatively match most of the experimental data, but the model is apparently missing pathways for radical-induced decomposition of thiols to form elemental sulfur. Cyclohexene has a significant effect on the composition of the radical pool, and this led to dramatic changes in the resulting product distribution

    Detection of the elite structure in a virtual multiplex social system by means of a generalized KK-core

    Get PDF
    Elites are subgroups of individuals within a society that have the ability and means to influence, lead, govern, and shape societies. Members of elites are often well connected individuals, which enables them to impose their influence to many and to quickly gather, process, and spread information. Here we argue that elites are not only composed of highly connected individuals, but also of intermediaries connecting hubs to form a cohesive and structured elite-subgroup at the core of a social network. For this purpose we present a generalization of the KK-core algorithm that allows to identify a social core that is composed of well-connected hubs together with their `connectors'. We show the validity of the idea in the framework of a virtual world defined by a massive multiplayer online game, on which we have complete information of various social networks. Exploiting this multiplex structure, we find that the hubs of the generalized KK-core identify those individuals that are high social performers in terms of a series of indicators that are available in the game. In addition, using a combined strategy which involves the generalized KK-core and the recently introduced MM-core, the elites of the different 'nations' present in the game are perfectly identified as modules of the generalized KK-core. Interesting sudden shifts in the composition of the elite cores are observed at deep levels. We show that elite detection with the traditional KK-core is not possible in a reliable way. The proposed method might be useful in a series of more general applications, such as community detection.Comment: 13 figures, 3 tables, 19 pages. Accepted for publication in PLoS ON

    A design model for Open Distributed Processing systems

    Get PDF
    This paper proposes design concepts that allow the conception, understanding and development of complex technical structures for open distributed systems. The proposed concepts are related to, and partially motivated by, the present work on Open Distributed Processing (ODP). As opposed to the current ODP approach, the concepts are aimed at supporting a design trajectory with several, related abstraction levels. Simple examples are used to illustrate the proposed concepts
    corecore