252 research outputs found

    Bayesian refinement of protein functional site matching

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Matching functional sites is a key problem for the understanding of protein function and evolution. The commonly used graph theoretic approach, and other related approaches, require adjustment of a matching distance threshold <it>a priori </it>according to the noise in atomic positions. This is difficult to pre-determine when matching sites related by varying evolutionary distances and crystallographic precision. Furthermore, sometimes the graph method is unable to identify alternative but important solutions in the neighbourhood of the distance based solution because of strict distance constraints. We consider the Bayesian approach to improve graph based solutions. In principle this approach applies to other methods with strict distance matching constraints. The Bayesian method can flexibly incorporate all types of prior information on specific binding sites (e.g. amino acid types) in contrast to combinatorial formulations.</p> <p>Results</p> <p>We present a new meta-algorithm for matching protein functional sites (active sites and ligand binding sites) based on an initial graph matching followed by refinement using a Markov chain Monte Carlo (MCMC) procedure. This procedure is an innovative extension to our recent work. The method accounts for the 3-dimensional structure of the site as well as the physico-chemical properties of the constituent amino acids. The MCMC procedure can lead to a significant increase in the number of significant matches compared to the graph method as measured independently by rigorously derived p-values.</p> <p>Conclusion</p> <p>MCMC refinement step is able to significantly improve graph based matches. We apply the method to matching NAD(P)(H) binding sites within single Rossmann fold families, between different families in the same superfamily, and in different folds. Within families sites are often well conserved, but there are examples where significant shape based matches do not retain similar amino acid chemistry, indicating that even within families the same ligand may be bound using substantially different physico-chemistry. We also show that the procedure finds significant matches between binding sites for the same co-factor in different families and different folds.</p

    Linear Time Parameterized Algorithms via Skew-Symmetric Multicuts

    Full text link
    A skew-symmetric graph (D=(V,A),σ)(D=(V,A),\sigma) is a directed graph DD with an involution σ\sigma on the set of vertices and arcs. In this paper, we introduce a separation problem, dd-Skew-Symmetric Multicut, where we are given a skew-symmetric graph DD, a family of T\cal T of dd-sized subsets of vertices and an integer kk. The objective is to decide if there is a set X⊆AX\subseteq A of kk arcs such that every set JJ in the family has a vertex vv such that vv and σ(v)\sigma(v) are in different connected components of D′=(V,A∖(X∪σ(X))D'=(V,A\setminus (X\cup \sigma(X)). In this paper, we give an algorithm for this problem which runs in time O((4d)k(m+n+ℓ))O((4d)^{k}(m+n+\ell)), where mm is the number of arcs in the graph, nn the number of vertices and ℓ\ell the length of the family given in the input. Using our algorithm, we show that Almost 2-SAT has an algorithm with running time O(4kk4ℓ)O(4^kk^4\ell) and we obtain algorithms for {\sc Odd Cycle Transversal} and {\sc Edge Bipartization} which run in time O(4kk4(m+n))O(4^kk^4(m+n)) and O(4kk5(m+n))O(4^kk^5(m+n)) respectively. This resolves an open problem posed by Reed, Smith and Vetta [Operations Research Letters, 2003] and improves upon the earlier almost linear time algorithm of Kawarabayashi and Reed [SODA, 2010]. We also show that Deletion q-Horn Backdoor Set Detection is a special case of 3-Skew-Symmetric Multicut, giving us an algorithm for Deletion q-Horn Backdoor Set Detection which runs in time O(12kk5ℓ)O(12^kk^5\ell). This gives the first fixed-parameter tractable algorithm for this problem answering a question posed in a paper by a superset of the authors [STACS, 2013]. Using this result, we get an algorithm for Satisfiability which runs in time O(12kk5ℓ)O(12^kk^5\ell) where kk is the size of the smallest q-Horn deletion backdoor set, with ℓ\ell being the length of the input formula

    Innovative Algorithms and Evaluation Methods for Biological Motif Finding

    Get PDF
    Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins

    Montana Kaimin, December 1, 1999

    Get PDF
    Student newspaper of the University of Montana, Missoula.https://scholarworks.umt.edu/studentnewspaper/10294/thumbnail.jp

    Exact Dimensionality Selection for Bayesian PCA

    Get PDF
    We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

    Collusion and Collective Action in the Patent System: A Proposal for Patent Bounties

    Get PDF
    Persistent commentary contends that the Patent Office is issuing patents that appropriate public domain concepts at an alarming frequency. Complaints of low patent quality enjoy growing resonance with regard to business methods, computer software, and other inventions for which patents were not traditionally sought. In this article, Professor Jay Thomas explains how the judiciary\u27s lenient view of patentable subject matter and utility standards, along with miserly congressional funding policies, have rendered the Patent Office an increasingly porous agency. Professor Thomas next reviews existing proposals for improving patent quality, including the conventional wisdom that adoption of an opposition system will contribute meaningfully to the solution of our patent quality problem. Exploring the political economy of patent challenges, Professor Thomas reasons that oppositions do little to solve collective action problems, the possibility of collusion between the prior art holder and patentee, and the existence of the first inventor defense. Professor Thomas instead proposes that the Patent Office recruit members of the public to act as private patent examiners. By awarding prior art informants with a bounty assessed against applicants, the Patent Office can restore order to the patent system and reduce its social costs

    Separating Auxiliary Arity Hierarchy of First-Order Incremental Evaluation Using (3+1)-ary Input Relations

    Get PDF
    Presents a first-order incremental evaluation system that uses first-order queries to maintain a database view defined by a non-first-order query. Reduction of the arity of queries to understand the power of foies; Use of a key lemma for proving a query which encodes the multiple parity problem

    Washington University Record, December 9, 1999

    Get PDF
    https://digitalcommons.wustl.edu/record/1848/thumbnail.jp
    • …
    corecore