65,316 research outputs found

    MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

    Full text link
    Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog

    Pairwise alignment incorporating dipeptide covariation

    Full text link
    Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

    Alignment of helical membrane protein sequences using AlignMe

    Get PDF
    Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme​/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set

    Simulating Light-Weight Personalised Recommender Systems in Learning Networks: A Case for Pedagogy-Oriented and Rating-Based Hybrid Recommendation Strategies

    Get PDF
    Recommender systems for e-learning demand specific pedagogy-oriented and hybrid recommendation strategies. Current systems are often based on time-consuming, top down information provisioning combined with intensive data-mining collaborative filtering approaches. However, such systems do not seem appropriate for Learning Networks where distributed information can often not be identified beforehand. Providing sound way-finding support for lifelong learners in Learning Networks requires dedicated personalised recommender systems (PRS), that offer the learners customised advise on which learning actions or programs to study next. Such systems should also be practically feasible and be developed with minimized effort. Currently, such so called light-weight PRS systems are scarcely available. This study shows that simulation studies can support the analysis and optimisation of PRS requirements prior to starting the costly process of their development, and practical implementation (including testing and revision) during field experiments in real-life learning situations. This simulation study confirms that providing recommendations leads towards more effective, more satisfied, and faster goal achievement. Furthermore, this study reveals that a light-weight hybrid PRS-system based on ratings is a good alternative for an ontology-based system, in particular for low-level goal achievement. Finally, it is found that rating-based light-weight hybrid PRS-systems enable more effective, more satisfied, and faster goal attainment than peer-based light-weight hybrid PRS-systems (incorporating collaborative techniques without rating).Recommendation Strategy; Simulation Study; Way-Finding; Collaborative Filtering; Rating

    Tackling Rapid Radiations With Targeted Sequencing

    Get PDF
    In phylogenetic studies across angiosperms, at various taxonomic levels, polytomies have persisted despite efforts to resolve them by increasing sampling of taxa and loci. The large amount of genomic data now available and statistical tools to analyze them provide unprecedented power for phylogenetic inference. Targeted sequencing has emerged as a strong tool for estimating species trees in the face of rapid radiations, lineage sorting, and introgression. Evolutionary relationships in Cyperaceae have been studied mostly using Sanger sequencing until recently. Despite ample taxon sampling, relationships in many genera remain poorly understood, hampered by diversification rates that outpace mutation rates in the loci used. The C4 Cyperus clade of the genus Cyperus has been particularly difficult to resolve. Previous studies based on a limited set of markers resolved relationships among Cyperus species using the C3 photosynthetic pathway, but not among C4 Cyperus clade taxa. We test the ability of two targeted sequencing kits to resolve relationships in the C4 Cyperus clade, the universal Angiosperms-353 kit and a Cyperaceae-specific kit. Sequences of the targeted loci were recovered from data generated with both kits and used to investigate overlap in data between kits and relative efficiency of the general and custom approaches. The power to resolve shallow-level relationships was tested using a summary species tree method and a concatenated maximum likelihood approach. High resolution and support are obtained using both approaches, but high levels of missing data disproportionately impact the latter. Targeted sequencing provides new insights into the evolution of morphology in the C4 Cyperus clade, demonstrating for example that the former segregate genus Alinula is polyphyletic despite its seeming morphological integrity. An unexpected result is that the Cyperus margaritaceus-Cyperus niveus complex comprises a clade separate from and sister to the core C4 Cyperus clade. Our results demonstrate that data generated with a family-specific kit do not necessarily have more power than those obtained with a universal kit, but that data generated with different targeted sequencing kits can often be merged for downstream analyses. Moreover, our study contributes to the growing consensus that targeted sequencing data are a powerful tool in resolving rapid radiationsEspaña Ministry of Economy and Competitiveness (project CGL2016- 77401-P

    Improved energy resolution for VHE gamma-ray astronomy with systems of Cherenkov telescopes

    Get PDF
    We present analysis techniques to improve the energy resolution of stereoscopic systems of imaging atmospheric Cherenkov telescopes, using the HEGRA telescope system as an example. The techniques include (i) the determination of the height of the shower maximum, which is then taken into account in the energy determination, and (ii) the determination of the location of the shower core with the additional constraint that the direction of the gamma rays is known a priori. This constraint can be applied for gamma-ray point sources, and results in a significant improvement in the localization of the shower core, which translates into better energy resolution. Combining both techniques, the HEGRA telescopes reach an energy resolution between 9% and 12%, over the entire energy range from 1 TeV to almost 100 TeV. Options for further improvements of the energy resolution are discussed.Comment: 13 Pages, 7 figures, Latex. Astroparticle Physics, in pres

    Eigenvector Synchronization, Graph Rigidity and the Molecule Problem

    Full text link
    The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3D-ASAP algorithm, for the graph realization problem in R3\mathbb{R}^3, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in R3\mathbb{R}^3, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-SP-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar state-of-the art localization algorithms.Comment: 49 pages, 8 figure
    • …
    corecore