103 research outputs found

    Strength in numbers: achieving greater accuracy in MHC-I binding prediction by combining the results from multiple prediction tools

    Get PDF
    BACKGROUND: Peptides derived from endogenous antigens can bind to MHC class I molecules. Those which bind with high affinity can invoke a CD8(+ )immune response, resulting in the destruction of infected cells. Much work in immunoinformatics has involved the algorithmic prediction of peptide binding affinity to various MHC-I alleles. A number of tools for MHC-I binding prediction have been developed, many of which are available on the web. RESULTS: We hypothesize that peptides predicted by a number of tools are more likely to bind than those predicted by just one tool, and that the likelihood of a particular peptide being a binder is related to the number of tools that predict it, as well as the accuracy of those tools. To this end, we have built and tested a heuristic-based method of making MHC-binding predictions by combining the results from multiple tools. The predictive performance of each individual tool is first ascertained. These performance data are used to derive weights such that the predictions of tools with better accuracy are given greater credence. The combined tool was evaluated using ten-fold cross-validation and was found to signicantly outperform the individual tools when a high specificity threshold is used. It performs comparably well to the best-performing individual tools at lower specificity thresholds. Finally, it also outperforms the combination of the tools resulting from linear discriminant analysis. CONCLUSION: A heuristic-based method of combining the results of the individual tools better facilitates the scanning of large proteomes for potential epitopes, yielding more actual high-affinity binders while reporting very few false positives

    Determination of the minimum number of microarray experiments for discovery of gene expression patterns

    Get PDF
    BACKGROUND: One type of DNA microarray experiment is discovery of gene expression patterns for a cell line undergoing a biological process over a series of time points. Two important issues with such an experiment are the number of time points, and the interval between them. In the absence of biological knowledge regarding appropriate values, it is natural to question whether the behaviour of progressively generated data may by itself determine a threshold beyond which further microarray experiments do not contribute to pattern discovery. Additionally, such a threshold implies a minimum number of microarray experiments, which is important given the cost of these experiments. RESULTS: We have developed a method for determining the minimum number of microarray experiments (i.e. time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. The key idea is a similarity measure for two clusterings which is expressed as a function of the data for progressive time points. While the experiments are underway, this function is evaluated. When the function reaches its maximum, it indicates the set of experiments reach a saturated state. Therefore, further experiments do not contribute to the discrimination of patterns. CONCLUSION: The method has been verified with two previously published gene expression datasets. For both experiments, the number of time points determined with our method is less than in the published experiments. It is noted that the overall approach is applicable to other clustering techniques

    A better sequence-read simulator program for metagenomics

    Get PDF
    BACKGROUND: There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. RESULTS: We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. CONCLUSIONS: BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work

    Comparing the Similarity of Different Groups of Bacteria to the Human Proteome

    Get PDF
    Numerous aspects of the relationship between bacteria and human have been investigated. One aspect that has recently received attention is sequence overlap at the proteomic level. However, there has not yet been a study that comprehensively characterizes the level of sequence overlap between bacteria and human, especially as it relates to bacterial characteristics like pathogenicity, G-C content, and proteome size. In this study, we began by performing a general characterization of the range of bacteria-human similarity at the proteomic level, and identified characteristics of the most- and least-similar bacterial species. We then examined the relationship between proteomic similarity and numerous other variables. While pathogens and nonpathogens had comparable similarity to the human proteome, pathogens causing chronic infections were found to be more similar to the human proteome than those causing acute infections. Although no general correspondence between a bacterium’s proteome size and its similarity to the human proteome was noted, no bacteria with small proteomes had high similarity to the human proteome. Finally, we discovered an interesting relationship between similarity and a bacterium’s G-C content. While the relationship between bacteria and human has been studied from many angles, their proteomic similarity still needs to be examined in more detail. This paper sheds further light on this relationship, particularly with respect to immunity and pathogenicity

    The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We study the usage of specific peptide platforms in protein composition. Using the pentapeptide as a unit of length, we find that in the universal proteome many pentapeptides are heavily repeated (even thousands of times), whereas some are quite rare, and a small number do not appear at all. To understand the physico-chemical-biological basis underlying peptide usage at the proteomic level, in this study we analyse the energetic costs for the synthesis of rare and never-expressed versus frequent pentapeptides. In addition, we explore residue bulkiness, hydrophobicity, and codon number as factors able to modulate specific peptide frequencies. Then, the possible influence of amino acid composition is investigated in zero- and high-frequency pentapeptide sets by analysing the frequencies of the corresponding inverse-sequence pentapeptides. As a final step, we analyse the pentadecamer oligodeoxynucleotide sequences corresponding to the never-expressed pentapeptides.</p> <p>Results</p> <p>We find that only DNA context-dependent constraints (such as oligodeoxynucleotide sequence location in the minus strand, introns, pseudogenes, frameshifts, etc.) provide a coherent mechanistic platform to explain the occurrence of never-expressed versus frequent pentapeptides in the protein world.</p> <p>Conclusions</p> <p>This study is of importance in cell biology. Indeed, the rarity (or lack of expression) of specific 5-mer peptide modules implies the rarity (or lack of expression) of the corresponding <it>n</it>-mer peptide sequences (with <it>n </it>< 5), so possibly modulating protein compositional trends. Moreover the data might further our understanding of the role exerted by rare pentapeptide modules as critical biological effectors in protein-protein interactions.</p

    Evaluation of a Hydrogel-Based Diagnostic Approach for the Point-of-Care Based Detection of Neisseria gonorrhoeae

    Get PDF
    Eleven primer pairs were developed for the identification of Neisseria gonorrhoeae. The sensitivity and specificity of these primers were evaluated by Real Time (RT)-PCR melt curve analyses with DNA from 145 N. gonorrhoeae isolates and 40 other Neisseria or non-Neisseria species. Three primer pairs were further evaluated in a hydrogel-based RT-PCR detection platform, using DNA extracted from 50 N. gonorrhoeae cultures. We observed 100% sensitivity and specificity in the hydrogel assay, confirming its potential as a point-of-care test (POCT) for N. gonorrhoeae diagnosis

    Synthetic lethal interactions of DEAD/H-box helicases as targets for cancer therapy

    Get PDF
    DEAD/H-box helicases are implicated in virtually every aspect of RNA metabolism, including transcription, pre-mRNA splicing, ribosomes biogenesis, nuclear export, translation initiation, RNA degradation, and mRNA editing. Most of these helicases are upregulated in various cancers and mutations in some of them are associated with several malignancies. Lately, synthetic lethality (SL) and synthetic dosage lethality (SDL) approaches, where genetic interactions of cancer-related genes are exploited as therapeutic targets, are emerging as a leading area of cancer research. Several DEAD/H-box helicases, including DDX3, DDX9 (Dbp9), DDX10 (Dbp4), DDX11 (ChlR1), and DDX41 (Sacy-1), have been subjected to SL analyses in humans and different model organisms. It remains to be explored whether SDL can be utilized to identity druggable targets in DEAD/H-box helicase overexpressing cancers. In this review, we analyze gene expression data of a subset of DEAD/H-box helicases in multiple cancer types and discuss how their SL/SDL interactions can be used for therapeutic purposes. We also summarize the latest developments in clinical applications, apart from discussing some of the challenges in drug discovery in the context of targeting DEAD/H-box helicases
    corecore