Search CORE

40,537 research outputs found

A methodology for determining amino-acid substitution matrices from set covers

Author: A. Bahr
A.D. McLachlan
D.F. Feng
G. Vogt
G.H. Gonnet
J. Setubal
J.D. Blake
J.K.M. Rao
M. Gribskov
M.F. Sagot
R.B. Russell
R.E. Green
R.F. Smith
S. Henikoff
S.A. Benner
T. Müller
T.P. Li
W.S.J. Valdar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2005
Field of study

We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

arXiv.org e-Print Archive

Crossref

Recommended from our members

A Haystack Heuristic for Autoimmune Disease Biomarker Discovery Using Next-Gen Immune Repertoire Sequencing Data.

Author: Apeltsin Leonard
Sirota Marina
von Büdingen H-Christian
Wang Shengzhi
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

Large-scale DNA sequencing of immunological repertoires offers an opportunity for the discovery of novel biomarkers for autoimmune disease. Available bioinformatics techniques however, are not adequately suited for elucidating possible biomarker candidates from within large immunosequencing datasets due to unsatisfactory scalability and sensitivity. Here, we present the Haystack Heuristic, an algorithm customized to computationally extract disease-associated motifs from next-generation-sequenced repertoires by contrasting disease and healthy subjects. This technique employs a local-search graph-theory approach to discover novel motifs in patient data. We apply the Haystack Heuristic to nine million B-cell receptor sequences obtained from nearly 100 individuals in order to elucidate a new motif that is significantly associated with multiple sclerosis. Our results demonstrate the effectiveness of the Haystack Heuristic in computing possible biomarker candidates from high throughput sequencing data and could be generalized to other datasets

eScholarship - University of California

Emergence of hidden phases of methylammonium lead-iodide (CH $_3$ NH $_3$ PbI $_3$ ) upon compression

Author: Amsler Maximilian
Boziki Ariadni
Flores-Livas José A.
Goedecker Stefan
Rothlisberger Ursula
Tomerini Daniele
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2018
Field of study

We perform a thorough structural search with the minima hopping method (MHM) to explore low-energy structures of methylammonium lead iodide. By combining the MHM with a forcefield, we efficiently screen vast portions of the configurational space with large simulation cells containing up to 96 atoms. Our search reveals two structures of methylammonium iodide perovskite (MAPI) that are substantially lower in energy than the well-studied experimentally observed low-temperature

Pnma

orthorhombic phase according to density functional calculations. Both structures have not yet been reported in the literature for MAPI, but our results show that they could emerge as thermodynamically stable phases via compression at low temperatures. In terms of the electronic properties, the two phases exhibit larger band gaps than the standard perovskite-type structures. Hence, pressure induced phase selection at technologically achievable pressures (i.e., via thin-film strain) is a route towards the synthesis of several MAPI polymorph with variable band gaps

arXiv.org e-Print Archive

edoc

Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations

Author: Albert
Altschul
Andolfo
Bailey
Bakker
Blankenberg
Brommonschenkel
Cock
Cronn
Dangl
Ernst
Fiume
Giardine
Goecks
Hodges
Ishibashi
Jaccoud
Jones
Jupe
Kent
Kurtz
Lanfermeijer
Li
Loveland
Lozano
Maclean
Meyers
Milligan
Ori
Park
Parla
Potato Genome Sequencing Consortium
Rauscher
Saintenac
Schornack
Schulze-Lefert
Tai
The Arabidopsis Genome Initiative
Tomato Genome Consortium
Weigel
Yandell
Zerbino
Zhang
Śliwka
Publication venue: 'Wiley'
Publication date: 13/08/2013
Field of study

RenSeq is a NB-LRR (nucleotide binding-site leucine-rich repeat) gene-targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM, and increased the number of identified NB-LRRs from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB-LRRs and can be accessed through a genome browser that we provide. We compared these NB-LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum ‘Heinz 1706’ extended the NB-LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co-segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild Solanum species S. berthaultii (Rpi-ber2) and S. ruiz-ceballosii (Rpi-rzc1), we were able to apply RenSeq successfully to identify markers that co-segregate with resistance towards the late blight pathogen Phytophthora infestans. These SNP identification workflows were designed as easy-to-adapt Galaxy pipelines

Crossref

University of Strathclyde Institutional Repository

PubMed Central

University of Dundee Online Publications

University of East Anglia digital repository

ConSole: using modularity of contact maps to locate solenoid domains in protein structures.

Author: Godzik Adam
Hrabe Thomas
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

BackgroundPeriodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.ResultsConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.ConclusionsThe ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org

Springer - Publisher Connector

PubMed Central

eScholarship - University of California