681 research outputs found

    RepSeq-A database of amino acid repeats present in lower eukaryotic pathogens

    Get PDF
    BACKGROUND Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and Plasmodium species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq http://repseq.gugbe.com is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses. RESULTS The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3 - 35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR) - a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5 - 1.0% of the total protein number. Notable exceptions are P. falciparum and D. discoideum, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions. CONCLUSION The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species

    Construction and Random Generation of Hypergraphs with Prescribed Degree and Dimension Sequences

    Full text link
    We propose algorithms for construction and random generation of hypergraphs without loops and with prescribed degree and dimension sequences. The objective is to provide a starting point for as well as an alternative to Markov chain Monte Carlo approaches. Our algorithms leverage the transposition of properties and algorithms devised for matrices constituted of zeros and ones with prescribed row- and column-sums to hypergraphs. The construction algorithm extends the applicability of Markov chain Monte Carlo approaches when the initial hypergraph is not provided. The random generation algorithm allows the development of a self-normalised importance sampling estimator for hypergraph properties such as the average clustering coefficient.We prove the correctness of the proposed algorithms. We also prove that the random generation algorithm generates any hypergraph following the prescribed degree and dimension sequences with a non-zero probability. We empirically and comparatively evaluate the effectiveness and efficiency of the random generation algorithm. Experiments show that the random generation algorithm provides stable and accurate estimates of average clustering coefficient, and also demonstrates a better effective sample size in comparison with the Markov chain Monte Carlo approaches.Comment: 21 pages, 3 figure

    Skittle: A 2-Dimensional Genome Visualization Tool

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is increasingly evident that there are multiple and overlapping patterns within the genome, and that these patterns contain different types of information - regarding both genome function and genome history. In order to discover additional genomic patterns which may have biological significance, novel strategies are required. To partially address this need, we introduce a new data visualization tool entitled Skittle.</p> <p>Results</p> <p>This program first creates a 2-dimensional nucleotide display by assigning four colors to the four nucleotides, and then text-wraps to a user adjustable width. This nucleotide display is accompanied by a "repeat map" which comprehensively displays all local repeating units, based upon analysis of all possible local alignments. Skittle includes a smooth-zooming interface which allows the user to analyze genomic patterns at any scale.</p> <p>Skittle is especially useful in identifying and analyzing tandem repeats, including repeats not normally detectable by other methods. However, Skittle is also more generally useful for analysis of any genomic data, allowing users to correlate published annotations and observable visual patterns, and allowing for sequence and construct quality control.</p> <p>Conclusions</p> <p>Preliminary observations using Skittle reveal intriguing genomic patterns not otherwise obvious, including structured variations inside tandem repeats. The striking visual patterns revealed by Skittle appear to be useful for hypothesis development, and have already led the authors to theorize that imperfect tandem repeats could act as information carriers, and may form tertiary structures within the interphase nucleus.</p

    XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.</p> <p>Results</p> <p>To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper.</p> <p>Conclusion</p> <p>We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.</p

    How to Get the Most out of Your Curation Effort

    Get PDF
    Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology

    Religion's Role in Promoting Health and Reducing Risk Among American Youth

    Full text link
    Although past research has long documented religion's salutary impact on adult health-related behaviors and outcomes, relatively little research has examined the relationship between religion and adolescent health. This study uses large, nationally representative samples of high school seniors to examine the relationship between religion and behavioral predictors of adolescent morbidity and mortality. Relative to their peers, religious youth are less likely to engage in behaviors that compromise their health (e.g., carrying weapons, getting into fights, drinking and driving) and are more likely to behave in ways that enhance their health (e.g., proper nutrition, exercise, and rest). Multivariate analyses suggest that these relationships persist even after controlling for demographic factors, and trend analyses reveal that they have existed over time. Particularly important is the finding that religious seniors have been relatively unaffected by past and recent increases in marijuana use.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66995/2/10.1177_109019819802500604.pd

    Midgut microbiota of the malaria mosquito vector Anopheles gambiae and Interactions with plasmodium falciparum Infection

    Get PDF
    The susceptibility of Anopheles mosquitoes to Plasmodium infections relies on complex interactions between the insect vector and the malaria parasite. A number of studies have shown that the mosquito innate immune responses play an important role in controlling the malaria infection and that the strength of parasite clearance is under genetic control, but little is known about the influence of environmental factors on the transmission success. We present here evidence that the composition of the vector gut microbiota is one of the major components that determine the outcome of mosquito infections. A. gambiae mosquitoes collected in natural breeding sites from Cameroon were experimentally challenged with a wild P. falciparum isolate, and their gut bacterial content was submitted for pyrosequencing analysis. The meta-taxogenomic approach revealed a broader richness of the midgut bacterial flora than previously described. Unexpectedly, the majority of bacterial species were found in only a small proportion of mosquitoes, and only 20 genera were shared by 80% of individuals. We show that observed differences in gut bacterial flora of adult mosquitoes is a result of breeding in distinct sites, suggesting that the native aquatic source where larvae were grown determines the composition of the midgut microbiota. Importantly, the abundance of Enterobacteriaceae in the mosquito midgut correlates significantly with the Plasmodium infection status. This striking relationship highlights the role of natural gut environment in parasite transmission. Deciphering microbe-pathogen interactions offers new perspectives to control disease transmission.Institut de Recherche pour le Developpement (IRD); French Agence Nationale pour la Recherche [ANR-11-BSV7-009-01]; European Community [242095, 223601]info:eu-repo/semantics/publishedVersio
    corecore