115,262 research outputs found
Composition of Biochemical Networks using Domain Knowledge
Graph composition has applications in a variety of practical applications. In drug development, for instance, in order to understand possible drug interactions, one has to merge known networks and examine topological variants arising from such composition. Similarly, the design of sensor nets may use existing network infrastructures, and the superposition of one network on another can help with network design and optimisation. The problem of network composition has not received much attention in algorithm and database research. Here, we work with biological networks encoded in Systems Biology Markup Language (SBML), based on XML syntax. We focus on XML merging and examine the algorithmic and performance challenges we encountered in our work and the possible solutions to the graph merge problem. We show that our XML graph merge solution performs well in practice and improves on the existing toolsets. This leads us into future work directions and the plan of research which will aim to implement graph merging primitives using domain knowledge to perform composition and decomposition on specific graphs in the biological domain
ON LEARNING COMPOSABLE AND DECOMPOSABLE GENERATIVE MODELS USING PRIOR INFORMATION
Within the field of machining learning, supervised learning has gained much success recently, and the research focus moves towards unsupervised learning. A generative model is a powerful way of unsupervised learning that models data distribution. Deep generative models like generative adversarial networks (GANs), can generate high-quality samples for various applications. However, these generative models are not easy to understand. While it is easy to generate samples from these models, the breadth of the samples that can be generated is difficult to ascertain. Further, most existing models are trained from scratch and do not take advantage of the compositional nature of the data. To address these deficiencies, I propose a composition and decomposition framework for generative models. This framework includes three types of components: part generators, composition operation, and decomposition operation. In the framework, a generative model could have multiple part generators that generate different parts of a sample independently. What a part generator should generate is explicitly defined by users. This explicit ”division of responsibility” provides more modularity to the whole system. Similar to software design, this modular modeling makes each module (part generators) more reusable and allows users to build increasingly complex generative models from simpler ones. The composition operation composes the parts from the part generators into a whole sample, whereas the decomposition operation is an inversed operation of composition. On the other hand, given the composed data, components of the framework are not necessarily identifiable. Inspired by other signal decomposition methods, we incorporate prior information to the model to solve this problem. We show that we can identify all of the components by incorporating prior information about one or more of the components. Furthermore, we show both theoretically and experimentally how much prior information is needed to identify the components of the model. Concerning the applications of this framework, we apply the framework to sparse dictionary learning (SDL) and offer our dictionary learning method, MOLDL. With MOLDL, we can easily include prior information about part generators; thus, we learn a generative model that results in a better signal decomposition operation. The experiments show our method decomposes ion mass signals more accurately than other signal decomposition methods. Further, we apply the framework to generative adversarial networks (GANs). Our composition/decomposition GAN learns the foreground part and background part generators that are responsible for different parts of the data. The resulting generators are easier to control and understand. Also, we show both theoretically and experimentally how much prior information is needed to identify different components of the framework. Precisely, we show that we can learn a reasonable part generator given only the composed data and composition operation. Moreover, we show the composable generators has better performance than their non-composable generative counterparts. Lastly, we propose two use cases that show transfer learning is feasible under this framework.Doctor of Philosoph
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Recommended from our members
Warming-induced permafrost thaw exacerbates tundra soil carbon decomposition mediated by microbial community.
BACKGROUND:It is well-known that global warming has effects on high-latitude tundra underlain with permafrost. This leads to a severe concern that decomposition of soil organic carbon (SOC) previously stored in this region, which accounts for about 50% of the world's SOC storage, will cause positive feedback that accelerates climate warming. We have previously shown that short-term warming (1.5 years) stimulates rapid, microbe-mediated decomposition of tundra soil carbon without affecting the composition of the soil microbial community (based on the depth of 42684 sequence reads of 16S rRNA gene amplicons per 3 g of soil sample). RESULTS:We show that longer-term (5 years) experimental winter warming at the same site altered microbial communities (p < 0.040). Thaw depth correlated the strongest with community assembly and interaction networks, implying that warming-accelerated tundra thaw fundamentally restructured the microbial communities. Both carbon decomposition and methanogenesis genes increased in relative abundance under warming, and their functional structures strongly correlated (R2 > 0.725, p < 0.001) with ecosystem respiration or CH4 flux. CONCLUSIONS:Our results demonstrate that microbial responses associated with carbon cycling could lead to positive feedbacks that accelerate SOC decomposition in tundra regions, which is alarming because SOC loss is unlikely to subside owing to changes in microbial community composition. Video Abstract
Computational Complexity of Atomic Chemical Reaction Networks
Informally, a chemical reaction network is "atomic" if each reaction may be
interpreted as the rearrangement of indivisible units of matter. There are
several reasonable definitions formalizing this idea. We investigate the
computational complexity of deciding whether a given network is atomic
according to each of these definitions.
Our first definition, primitive atomic, which requires each reaction to
preserve the total number of atoms, is to shown to be equivalent to mass
conservation. Since it is known that it can be decided in polynomial time
whether a given chemical reaction network is mass-conserving, the equivalence
gives an efficient algorithm to decide primitive atomicity.
Another definition, subset atomic, further requires that all atoms are
species. We show that deciding whether a given network is subset atomic is in
, and the problem "is a network subset atomic with respect to a
given atom set" is strongly -.
A third definition, reachably atomic, studied by Adleman, Gopalkrishnan et
al., further requires that each species has a sequence of reactions splitting
it into its constituent atoms. We show that there is a to decide whether a given network is reachably atomic, improving
upon the result of Adleman et al. that the problem is . We
show that the reachability problem for reachably atomic networks is
-.
Finally, we demonstrate equivalence relationships between our definitions and
some special cases of another existing definition of atomicity due to Gnacadja
Automatic Mechanism Generation for Pyrolysis of Di-Tert-Butyl Sulfide
The automated Reaction Mechanism Generator (RMG), using rate parameters derived from ab initio CCSD(T) calculations, is used to build reaction networks for the thermal decomposition of di-tert-butyl sulfide. Simulation results were compared with data from pyrolysis experiments with and without the addition of a cyclohexene inhibitor. Purely free-radical chemistry did not properly explain the reactivity of di-tert-butyl sulfide, as the previous experimental work showed that the sulfide decomposed via first-order kinetics in the presence and absence of the radical inhibitor. The concerted unimolecular decomposition of di-tert-butyl sulfide to form isobutene and tert-butyl thiol was found to be a key reaction in both cases, as it explained the first-order sulfide decomposition. The computer-generated kinetic model predictions quantitatively match most of the experimental data, but the model is apparently missing pathways for radical-induced decomposition of thiols to form elemental sulfur. Cyclohexene has a significant effect on the composition of the radical pool, and this led to dramatic changes in the resulting product distribution
Detection of the elite structure in a virtual multiplex social system by means of a generalized -core
Elites are subgroups of individuals within a society that have the ability
and means to influence, lead, govern, and shape societies. Members of elites
are often well connected individuals, which enables them to impose their
influence to many and to quickly gather, process, and spread information. Here
we argue that elites are not only composed of highly connected individuals, but
also of intermediaries connecting hubs to form a cohesive and structured
elite-subgroup at the core of a social network. For this purpose we present a
generalization of the -core algorithm that allows to identify a social core
that is composed of well-connected hubs together with their `connectors'. We
show the validity of the idea in the framework of a virtual world defined by a
massive multiplayer online game, on which we have complete information of
various social networks. Exploiting this multiplex structure, we find that the
hubs of the generalized -core identify those individuals that are high
social performers in terms of a series of indicators that are available in the
game. In addition, using a combined strategy which involves the generalized
-core and the recently introduced -core, the elites of the different
'nations' present in the game are perfectly identified as modules of the
generalized -core. Interesting sudden shifts in the composition of the elite
cores are observed at deep levels. We show that elite detection with the
traditional -core is not possible in a reliable way. The proposed method
might be useful in a series of more general applications, such as community
detection.Comment: 13 figures, 3 tables, 19 pages. Accepted for publication in PLoS ON
A design model for Open Distributed Processing systems
This paper proposes design concepts that allow the conception, understanding and development of complex technical structures for open distributed systems. The proposed concepts are related to, and partially motivated by, the present work on Open Distributed Processing (ODP). As opposed to the current ODP approach, the concepts are aimed at supporting a design trajectory with several, related abstraction levels. Simple examples are used to illustrate the proposed concepts
- …