1,366 research outputs found

    Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering

    Get PDF
    Background: How to extract useful information from complex biological networks is a major goal in many fields, especially in genomics and proteomics. We have shown in several works that iterative hierarchical clustering, as implemented in the UVCluster program, is a powerful tool to analyze many of those networks. However, the amount of computation time required to perform UVCluster analyses imposed significant limitations to its use. Methodology/Principal Findings: We describe the suite Jerarca, designed to efficiently convert networks of interacting units into dendrograms by means of iterative hierarchical clustering. Jerarca is divided into three main sections. First, weighted distances among units are computed using up to three different approaches: a more efficient version of UVCluster and two new, related algorithms called RCluster and SCluster. Second, Jerarca builds dendrograms based on those distances, using well-known phylogenetic algorithms, such as UPGMA or Neighbor-Joining. Finally, Jerarca provides optimal partitions of the trees using statistical criteria based on the distribution of intra- and intercluster connections. Outputs compatible with the phylogenetic software MEGA and the Cytoscape package are generated, allowing the results to be easily visualized. Conclusions/Significance: The four main advantages of Jerarca in respect to UVCluster are: 1) Improved speed of a novel UVCluster algorithm; 2) Additional, alternative strategies to perform iterative hierarchical clustering; 3) Automatic evaluatio

    Bayesian stochastic blockmodeling

    Full text link
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires

    Full text link
    The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the quantitative and molecular-level profiling of immune repertoires thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity in order to understand the dynamics of adaptive immunity. Here, we review the current research on (i) diversity, (ii) clustering and network, (iii) phylogenetic and (iv) machine learning methods applied to dissect, quantify and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology towards coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.Comment: 27 pages, 2 figure

    System of Terrain Analysis, Energy Estimation and Path Planning for Planetary Exploration by Robot Teams

    Get PDF
    NASA’s long term plans involve a return to manned moon missions, and eventually sending humans to mars. The focus of this project is the use of autonomous mobile robotics to enhance these endeavors. This research details the creation of a system of terrain classification, energy of traversal estimation and low cost path planning for teams of inexpensive and potentially expendable robots. The first stage of this project was the creation of a model which estimates the energy requirements of the traversal of varying terrain types for a six wheel rocker-bogie rover. The wheel/soil interaction model uses Shibly’s modified Bekker equations and incorporates a new simplified rocker-bogie model for estimating wheel loads. In all but a single trial the relative energy requirements for each soil type were correctly predicted by the model. A path planner for complete coverage intended to minimize energy consumption was designed and tested. It accepts as input terrain maps detailing the energy consumption required to move to each adjacent location. Exploration is performed via a cost function which determines the robot’s next move. This system was successfully tested for multiple robots by means of a shared exploration map. At peak efficiency, the energy consumed by our path planner was only 56% that used by the best case back and forth coverage pattern. After performing a sensitivity analysis of Shibly’s equations to determine which soil parameters most affected energy consumption, a neural network terrain classifier was designed and tested. The terrain classifier defines all traversable terrain as one of three soil types and then assigns an assumed set of soil parameters. The classifier performed well over all, but had some difficulty distinguishing large rocks from sand. This work presents a system which successfully classifies terrain imagery into one of three soil types, assesses the energy requirements of terrain traversal for these soil types and plans efficient paths of complete coverage for the imaged area. While there are further efforts that can be made in all areas, the work achieves its stated goals

    Computational fluids domain reduction to a simplified fluid network

    Get PDF
    The primary goal of this project is to demonstrate the practical use of data mining algorithms to cluster a solved steady-state computational fluids simulation (CFD) flow domain into a simplified lumped-parameter network. A commercial-quality code, “cfdMine” was created using a volume-weighted k-means clustering that that can accomplish the clustering of a 20 million cell CFD domain on a single CPU in several hours or less. Additionally agglomeration and k-means Mahalanobis were added as optional post-processing steps to further enhance the separation of the clusters. The resultant nodal network is considered a reduced-order model and can be solved transiently at a very minimal computational cost. The reduced order network is then instantiated in the commercial thermal solver MuSES to perform transient conjugate heat transfer using convection predicted using a lumped network (based on steady-state CFD). When inserting the lumped nodal network into a MuSES model, the potential for developing a “localized heat transfer coefficient” is shown to be an improvement over existing techniques. Also, it was found that the use of the clustering created a new flow visualization technique. Finally, fixing clusters near equipment newly demonstrates a capability to track temperatures near specific objects (such as equipment in vehicles)

    Modelling communities and populations: An introduction to computational social science

    Get PDF
    In sociology, interest in modelling has not yet become widespread. However, the methodology has been gaining increased attention in parallel with its growing popularity in economics and other social sciences, notably psychology and political science, and the growing volume of social data being measured and collected. In this paper, we present representative computational methodologies from both data-driven (such as “black box”) and rule-based (such as “per analogy”) approaches. We show how to build simple models, and discuss both the greatest successes and the major limitations of modelling societies. We claim that the end goal of computational tools in sociology is providing meaningful analyses and calculations in order to allow making causal statements in sociological explanation and support decisions of great importance for society
    corecore