90 research outputs found

    Automated DNA Motif Discovery

    Get PDF
    Ensembl's human non-coding and protein coding genes are used to automatically find DNA pattern motifs. The Backus-Naur form (BNF) grammar for regular expressions (RE) is used by genetic programming to ensure the generated strings are legal. The evolved motif suggests the presence of Thymine followed by one or more Adenines etc. early in transcripts indicate a non-protein coding gene. Keywords: pseudogene, short and microRNAs, non-coding transcripts, systems biology, machine learning, Bioinformatics, motif, regular expression, strongly typed genetic programming, context-free grammar.Comment: 12 pages, 2 figure

    Evolving DNA motifs to predict GeneChip probe performance

    Get PDF
    Background: Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results: Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion: The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. © 2009 Langdon and Harrison; licensee BioMed Central Ltd

    Comparative analysis of nuclear localization signal (NLS) prediction methods

    No full text
    Aim. Comparative analysis of six state-of-the-art nuclear localization signal (NLS) prediction methods (PSORT II, NucPred, cNLSMapper, NLStradamus, NucImport and seqNLS). Methods. Each program was tested for correct predictions using a dataset of 155 experimentally determined NLSs and for false-positives using a dataset of 155 transmembrane proteins, which putatively lack NLS. Results. The most suitable NLS predictors wer fond to be NucPred, NLStradamus and seqNLS; these programs provide the maximum rate of correct to wrong predictions among the tested programs. However, the best results obtained by these programs were only ~ 45 % of the correct predictions. Conclusion. The identification of novel NLSs by predictors still requires experimental verification.Мета. Ідентифікація сигналів ядерної локалізації (NLS) в амінокислотній послідовності білків за допомогою експериментальних методів залишається коштовним і тривалис процесом. Тому в останній час велику популярність отримали комп'ютерні методи прогнозування NLS. Методи. В даній статті ми провели порівняльний аналіз достовірності прогнозування NLS шести різних програм (PSORT II, ​​NucPred, cNLSMapper, NLStradamus, NucImport та SeqNLS). Для кожного алгоритма було оцінена доля істинно позитивних прогнозів на вибірки з 155 експериментально визначених NLS з 128 білків людини, а також частку помилкових подій у вибірці з 155 трансмембранних білків людини, які, як видно, позбавлені NLS. Результати. Найбільшу кількість вірно прогнозованих NLS при найменшій частці хибнопозитивних результатів було отримано для трьох програм: NucPred, NLStradamus та seqNLS. Однак навіть при набільшій ступені достовірності дані алгоритми прогнозують вірно не більше 45 % експериментально визначених NLS. Висновки. Використання будь-яких алгоритмів прогнозування NLS вимагає експериментальної перевірки отриманих результатів.Цель. Идентификация сигналов ядерной локализации (NLS) в аминокислотной последовательности белка экспериментальными методами остается дорогостоящим и долгим процессом. Поэтому в последнее время большую популярность получили компьютерные методы предсказания NLS. Методы. В данной статье мы провели сравнительный анализ достоверности предсказания NLS шести различных программ (PSORT II, NucPred, cNLSMapper, NLStradamus, NucImport и SeqNLS). Для каждого алгоритма была оценена доля истинно положительных предсказаний на выборке из 155 экспериментально определенных NLS из 128 человеческих белков, а также доля ложноположительных предсказаний на выборке из 155 трансмембранных белков человека, которые, предположительно, лишены NLS. Наибольшее количество правильно предсказанных NLS при наименьшей доле ложноположительных результатов было получено для трех программ: NucPred, NLStradamus и seqNLS. Однако даже при наибольшей степени достоверности данные алгоритмы предсказывают правильно не более 45% экспериментально определенных NLS, т.е. использование любых алгоритмов предсказания NLS требует экспериментальной проверки получаемых результатов

    Prediction of nuclear proteins using SVM and HMM models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.</p> <p>Results</p> <p>All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in <it>Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster</it>, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins <url>http://www.imtech.res.in/raghava/nppred/</url>.</p> <p>Conclusion</p> <p>This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.</p

    Functional characterization of 8-oxoguanine DNA glycosylase of Trypanosoma cruzi

    Get PDF
    The oxidative lesion 8-oxoguanine (8-oxoG) is removed during base excision repair by the 8-oxoguanine DNA glycosylase 1 (Ogg1). This lesion can erroneously pair with adenine, and the excision of this damaged base by Ogg1 enables the insertion of a guanine and prevents DNA mutation. In this report, we identified and characterized Ogg1 from the protozoan parasite Trypanosoma cruzi (TcOgg1), the causative agent of Chagas disease. Like most living organisms, T. cruzi is susceptible to oxidative stress, hence DNA repair is essential for its survival and improvement of infection. We verified that the TcOGG1 gene encodes an 8-oxoG DNA glycosylase by complementing an Ogg1-defective Saccharomyces cerevisiae strain. Heterologous expression of TcOGG1 reestablished the mutation frequency of the yeast mutant ogg1-/- (CD138) to wild type levels. We also demonstrate that the overexpression of TcOGG1 increases T. cruzi sensitivity to hydrogen peroxide (H2O2). Analysis of DNA lesions using quantitative PCR suggests that the increased susceptibility to H2O2 of TcOGG1-overexpressor could be a consequence of uncoupled BER in abasic sites and/or strand breaks generated after TcOgg1 removes 8-oxoG, which are not rapidly repaired by the subsequent BER enzymes. This hypothesis is supported by the observation that TcOGG1-overexpressors have reduced levels of 8-oxoG both in the nucleus and in the parasite mitochondrion. The localization of TcOgg1 was examined in parasite transfected with a TcOgg1-GFP fusion, which confirmed that this enzyme is in both organelles. Taken together, our data indicate that T. cruzi has a functional Ogg1 ortholog that participates in nuclear and mitochondrial BER. © 2012 Furtado et al

    Identification and Comparative Analysis of Subolesin/Akirin Ortholog from Ornithodoros turicata Ticks

    Get PDF
    Background: Subolesin is an evolutionary conserved molecule in diverse arthropod species that play an important role in the regulation of genes involved in immune responses, blood digestion, reproduction and development. In this study, we have identified a subolesin ortholog from soft ticks Ornithodoros turicata, the vector of the relapsing fever spirochete in the United States. Methods: Uninfected fed or unfed O. turicata ticks were used throughout this study. The subolesin mRNA was amplified by reverse transcription polymerase chain reaction (RT-PCR) and sequenced. Quantitative-real time PCR (QRT-PCR) was performed to evaluate subolesin mRNA levels at different O. turicata developmental stages and from salivary glands and gut tissues. Bioinformatics and comparative analysis was performed to predict potential post-translational modifications in O. turicata subolesin amino-acid sequences. Results: Our study reveals that O. turicata subolesin gene expression is developmentally regulated, where; adult ticks expressed significantly higher levels in comparison to the larvae or nymphal ticks. Expression of subolesin was evident in both unfed and fed ticks and in the salivary glands and midgut tissues. The expression of subolesin transcripts varied in fed ticks with peak levels at day 14 post-feeding. Phylogenetic analysis revealed that O. turicata subolesin showed a high degree of sequence conservation with subolesin’s from other soft and hard ticks. Bioinformatics and comparative analysis predicted that O. turicata subolesin carry three Protein kinase C and one Casein kinase II phosphorylation sites. However, no myristoylation or glycosylation sites were evident in the O. turicata subolesin sequence. Conclusion: Our study provides important insights in recognizing subolesin as a conserved potential candidate for the development of a broad-spectrum anti-vector vaccine to control not only ticks but also several other arthropods that transmit diseases to humans and animals
    corecore