31,357 research outputs found

    Maximal information component analysis: a novel non-linear network analysis method.

    Get PDF
    BackgroundNetwork construction and analysis algorithms provide scientists with the ability to sift through high-throughput biological outputs, such as transcription microarrays, for small groups of genes (modules) that are relevant for further research. Most of these algorithms ignore the important role of non-linear interactions in the data, and the ability for genes to operate in multiple functional groups at once, despite clear evidence for both of these phenomena in observed biological systems.ResultsWe have created a novel co-expression network analysis algorithm that incorporates both of these principles by combining the information-theoretic association measure of the maximal information coefficient (MIC) with an Interaction Component Model. We evaluate the performance of this approach on two datasets collected from a large panel of mice, one from macrophages and the other from liver by comparing the two measures based on a measure of module entropy, Gene Ontology (GO) enrichment, and scale-free topology (SFT) fit. Our algorithm outperforms a widely used co-expression analysis method, weighted gene co-expression network analysis (WGCNA), in the macrophage data, while returning comparable results in the liver dataset when using these criteria. We demonstrate that the macrophage data has more non-linear interactions than the liver dataset, which may explain the increased performance of our method, termed Maximal Information Component Analysis (MICA) in that case.ConclusionsIn making our network algorithm more accurately reflect known biological principles, we are able to generate modules with improved relevance, particularly in networks with confounding factors such as gene by environment interactions

    Delay Parameter Selection in Permutation Entropy Using Topological Data Analysis

    Full text link
    Permutation Entropy (PE) is a powerful tool for quantifying the predictability of a sequence which includes measuring the regularity of a time series. Despite its successful application in a variety of scientific domains, PE requires a judicious choice of the delay parameter Ï„\tau. While another parameter of interest in PE is the motif dimension nn, Typically nn is selected between 44 and 88 with 55 or 66 giving optimal results for the majority of systems. Therefore, in this work we focus solely on choosing the delay parameter. Selecting Ï„\tau is often accomplished using trial and error guided by the expertise of domain scientists. However, in this paper, we show that persistent homology, the flag ship tool from Topological Data Analysis (TDA) toolset, provides an approach for the automatic selection of Ï„\tau. We evaluate the successful identification of a suitable Ï„\tau from our TDA-based approach by comparing our results to a variety of examples in published literature

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    Models of incremental concept formation

    Get PDF
    Given a set of observations, humans acquire concepts that organize those observations and use them in classifying future experiences. This type of concept formation can occur in the absence of a tutor and it can take place despite irrelevant and incomplete information. A reasonable model of such human concept learning should be both incremental and capable of handling this type of complex experiences that people encounter in the real world. In this paper, we review three previous models of incremental concept formation and then present CLASSIT, a model that extends these earlier systems. All of the models integrate the process of recognition and learning, and all can be viewed as carrying out search through the space of possible concept hierarchies. In an attempt to show that CLASSIT is a robust concept formation system, we also present some empirical studies of its behavior under a variety of conditions

    Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance

    Full text link
    This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance. At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • …
    corecore