40,537 research outputs found
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
Recommended from our members
A Haystack Heuristic for Autoimmune Disease Biomarker Discovery Using Next-Gen Immune Repertoire Sequencing Data.
Large-scale DNA sequencing of immunological repertoires offers an opportunity for the discovery of novel biomarkers for autoimmune disease. Available bioinformatics techniques however, are not adequately suited for elucidating possible biomarker candidates from within large immunosequencing datasets due to unsatisfactory scalability and sensitivity. Here, we present the Haystack Heuristic, an algorithm customized to computationally extract disease-associated motifs from next-generation-sequenced repertoires by contrasting disease and healthy subjects. This technique employs a local-search graph-theory approach to discover novel motifs in patient data. We apply the Haystack Heuristic to nine million B-cell receptor sequences obtained from nearly 100 individuals in order to elucidate a new motif that is significantly associated with multiple sclerosis. Our results demonstrate the effectiveness of the Haystack Heuristic in computing possible biomarker candidates from high throughput sequencing data and could be generalized to other datasets
Emergence of hidden phases of methylammonium lead-iodide (CHNHPbI) upon compression
We perform a thorough structural search with the minima hopping method (MHM)
to explore low-energy structures of methylammonium lead iodide. By combining
the MHM with a forcefield, we efficiently screen vast portions of the
configurational space with large simulation cells containing up to 96 atoms.
Our search reveals two structures of methylammonium iodide perovskite (MAPI)
that are substantially lower in energy than the well-studied experimentally
observed low-temperature orthorhombic phase according to density
functional calculations. Both structures have not yet been reported in the
literature for MAPI, but our results show that they could emerge as
thermodynamically stable phases via compression at low temperatures. In terms
of the electronic properties, the two phases exhibit larger band gaps than the
standard perovskite-type structures. Hence, pressure induced phase selection at
technologically achievable pressures (i.e., via thin-film strain) is a route
towards the synthesis of several MAPI polymorph with variable band gaps
Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations
RenSeq is a NB-LRR (nucleotide binding-site leucine-rich repeat) gene-targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM, and increased the number of identified NB-LRRs from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB-LRRs and can be accessed through a genome browser that we provide. We compared these NB-LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum ‘Heinz 1706’ extended the NB-LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co-segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild Solanum species S. berthaultii (Rpi-ber2) and S. ruiz-ceballosii (Rpi-rzc1), we were able to apply RenSeq successfully to identify markers that co-segregate with resistance towards the late blight pathogen Phytophthora infestans. These SNP identification workflows were designed as easy-to-adapt Galaxy pipelines
ConSole: using modularity of contact maps to locate solenoid domains in protein structures.
BackgroundPeriodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.ResultsConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.ConclusionsThe ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org
- …