58 research outputs found

    A Simplified Model for Fatigue Load Calculations of Small Wind- Turbines with Vertical Axis of Rotation

    Get PDF
    Wind Turbines with a vertical axis of rotation (VAWT) recently regained new interest. In this paper we summarize results from [2] in which our new aerodynamic model [1] was applied to an existing 50 kW machine. The presentation is organized as follows: First we present our simplified model which was formulated in closest analogy along the rules from IEC 61400-2 used for horizontal axis wind-turbines including some rigid-body extensions. Second we shortly discuss the only to us known aeroelastic code for VAWT, GAROS. Finally we present our results for fatigue loads from application to an 50 kW (140 m² swept area) prototype from both computations. As a main result we see that the differences between our simplified model (rigid body model with most equations coming from engineering mechanics) and a full aeroelastic modelling seems to be largest for the beam system supporting the blades on the shaf

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

    Get PDF
    Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space

    Alignathon: A competitive assessment of whole-genome alignment methods

    Full text link
    © 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments

    Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes

    Get PDF
    BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available under GPL at http://murasaki.sourceforge.net

    Cystinosin, MPDU1, SWEETs and KDELR Belong to a Well-Defined Protein Family with Putative Function of Cargo Receptors Involved in Vesicle Trafficking

    Get PDF
    Classification of proteins into families based on remote homology often helps prediction of their biological function. Here we describe prediction of protein cargo receptors involved in vesicle formation and protein trafficking. Hidden Markov model profile-to-profile searches in protein databases using endoplasmic reticulum lumen protein retaining receptors (KDEL, Erd2) as query reveal a large and diverse family of proteins with seven transmembrane helices and common topology and, most likely, similar function. Their coding genes exist in all eukaryota and in several prokaryota. Some are responsible for metabolic diseases (cystinosis, congenital disorder of glycosylation), others are candidate genes for genetic disorders (cleft lip and palate, certain forms of cancer) or solute uptake and efflux (SWEETs) and many have not yet been assigned a function. Comparison with the properties of KDEL receptors suggests that the family members could be involved in protein trafficking and serve as cargo receptors. This prediction sheds new light on a range of biologically, medically and agronomically important proteins and could open the way to discovering the function of many genes not yet annotated. Experimental testing is suggested

    Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

    Get PDF
    Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

    Domain Evolution of Vertebrate Blood Coagulation Cascade Proteins

    No full text
    Vertebrate blood coagulation is controlled by a cascade containing more than 20 proteins. The cascade proteins are found in the blood in their zymogen forms and when the cascade is triggered by tissue damage, zymogens are activated and in turn activate their downstream proteins by serine protease activity. In this study, we examined proteomes of 21 chordates, of which 18 are vertebrates, to reveal the modular evolution of the blood coagulation cascade. Additionally, two Arthropoda species were used to compare domain arrangements of the proteins belonging to the hemolymph clotting and the blood coagulation cascades. Within the vertebrate coagulation protein set, almost half of the studied proteins are shared with jawless vertebrates. Domain similarity analyses revealed that there are multiple possible evolutionary trajectories for each coagulation protein. During the evolution of higher vertebrate clades, gene and genome duplications led to the formation of other coagulation cascade proteins
    corecore