667 research outputs found

    Distributed Knowledge Discovery in Large Scale Peer-to-Peer Networks

    Get PDF
    Explosive growth in the availability of various kinds of data in distributed locations has resulted in unprecedented opportunity to develop distributed knowledge discovery (DKD) techniques. DKD embraces the growing trend of merging computation with communication by performing distributed data analysis and modeling with minimal communication of data. Most of the current state-of-the-art DKD systems suffer from the lack of scalability, robustness and adaptability due to their dependence on a centralized model for building the knowledge discovery model. Peer-to-Peer networks offer a better scalable and fault-tolerant computing platform for building distributed knowledge discovery models than client-server based platforms. Algorithms and communication protocols have been developed for file search and discovery services in peer-to-peer networks. The file search algorithms are concerned with identification of a peer and discovery of a file on that specified peer, so most of the current peer-to-peer networks for file search act as directory services. The problem of distributed knowledge discovery is different from file search services, however new issues and challenges have to be addressed. The algorithms and communication protocols for knowledge discovery deal with implementing algorithms by which every peer in the network discovers the correct knowledge discovery model, as if it were given the combined database. Therefore, algorithms and communication protocols for DKD mainly deal with distributed computing. The distributed computations are entirely asynchronous, impose very little communication overhead, transparently tolerate network topology changes and peer failures and quickly adjust to changes in the data as they occur. Another important aspect of the distributed computations in a peer-to-peer network is that most of the communication between peer nodes is local i.e. the knowledge discovery model is learned at each peer using information gathered from a very small neighborhood, whose size is independent of the size of the peer-to-peer network. The peer-to-peer constraints on data and/or computing are the hard ones, so the challenge is to show that it is still possible to extract useful information from the distributed data effectively and dependably. The implementation of a distributed algorithm in an asynchronous and decentralized environment is the hardest challenge. DKD in a peer-to-peer network raises issues related to impracticality of global communications and global synchronization, on-the-fly data updates, lack of control, accuracy of computation, the need to share resources with other applications, and frequent failure and recovery of resources. We propose a methodology based on novel distributed algorithms and communication protocols to perform DKD in a peer-to-peer network. We investigate the performance of our algorithms and communication protocols by means of analysis and simulations

    High-precision high-coverage functional inference from integrated data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.</p> <p>Results</p> <p>We first apply this framework to <it>Saccharomyces cerevisiae</it>. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.</p> <p>Conclusion</p> <p>We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.</p

    Flexible Variational Bayes based on a Copula of a Mixture of Normals

    Full text link
    Variational Bayes methods approximate the posterior density by a family of tractable distributions and use optimisation to estimate the unknown parameters of the approximation. Variational approximation is useful when exact inference is intractable or very costly. Our article develops a flexible variational approximation based on a copula of a mixture of normals, which is implemented using the natural gradient and a variance reduction method. The efficacy of the approach is illustrated by using simulated and real datasets to approximate multimodal, skewed and heavy-tailed posterior distributions, including an application to Bayesian deep feedforward neural network regression models. Each example shows that the proposed variational approximation is much more accurate than the corresponding Gaussian copula and a mixture of normals variational approximations.Comment: 39 page

    Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components

    Get PDF
    Biomolecular pathways are built from diverse types of pairwise interactions, ranging from physical protein-protein interactions and modifications to indirect regulatory relationships. One goal of systems biology is to bridge three aspects of this complexity: the growing body of high-throughput data assaying these interactions; the specific interactions in which individual genes participate; and the genome-wide patterns of interactions in a system of interest. Here, we describe methodology for simultaneously predicting specific types of biomolecular interactions using high-throughput genomic data. This results in a comprehensive compendium of whole-genome networks for yeast, derived from ∼3,500 experimental conditions and describing 30 interaction types, which range from general (e.g. physical or regulatory) to specific (e.g. phosphorylation or transcriptional regulation). We used these networks to investigate molecular pathways in carbon metabolism and cellular transport, proposing a novel connection between glycogen breakdown and glucose utilization supported by recent publications. Additionally, 14 specific predicted interactions in DNA topological change and protein biosynthesis were experimentally validated. We analyzed the systems-level network features within all interactomes, verifying the presence of small-world properties and enrichment for recurring network motifs. This compendium of physical, synthetic, regulatory, and functional interaction networks has been made publicly available through an interactive web interface for investigators to utilize in future research at http://function.princeton.edu/bioweaver/

    Separating what is evaluated from what is selected in artificial evolution

    Get PDF
    In artificial evolution, selection and evaluation are separate and distinct steps. This distinction is rather different in natural evolution, where fitness (corresponding to evaluation) is a direct consequence of selection rather than a precursor to it. This thesis presents a new way of thinking about artificial evolution that separates evaluation and selection and consequently opens up the space of potential evolutionary algorithms beyond the limitations imposed by ignoring this distinction. In Part I of the thesis we explore how varying the level of evaluation and selection impacts evolution. Using novel genetic algorithms (GAs) we show how group level evaluation allows evolution to find solutions to problems that require niching or a division of labour amongst component parts, something that cannot be accomplished using a standard GA. One of the inspirations for testing GAs with group-level evaluation was recent research into bacterial evolution which shows in bacterial colonies, distinguishing between the individual and group is very difficult because of the symbiotic relationship between different bacteria. We find that depending on the task it sometimes makes sense to select the individual while in other cases simply selecting groups is the best choice. Finally, we present a method for evolving the group size in these types of GAs that has the benefit of avoiding the need to know the optimal division of labour ahead of time. In Part II we move away from studying the relationship between evaluation and selection to show how our novel view of evolution can be used to develop GAs that implement horizontal gene transfer which was again inspired by looking at bacterial evolution. By testing these GAs on a variety of different tasks we show how this promiscuous gene swapping is often beneficial to evolution because it can reduce the probability of the population getting stuck on a sub-optimal solution. The thesis demonstrates the benefits of of looking at artificial evolution in terms of both evaluation and selection when it comes to algorithm development, and thus provides the GA community with a new context in which they can choose different algorithms appropriate to different tasks

    Estimation of distribution algorithms in logistics : Analysis, design, and application

    Get PDF
    This thesis considers the analysis, design and application of Estimation of Distribution Algorithms (EDA) in Logistics. It approaches continouos nonlinear optimization problems (standard test problems and stochastic transportation problems) as well as location problems, strategic safety stock placement problems and lotsizing problems. The thesis adds to the existing literature by proposing theoretical advances for continuous EDAs and practical applications of discrete EDAs. Thus, it should be of interest for researchers from evolutionary computation, as well as practitioners that are in need of efficient algorithms for the above mentioned problems
    • …
    corecore