20 research outputs found

    Interpol: An R package for preprocessing of protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding.</p> <p>Results</p> <p>The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression.</p> <p>Conclusions</p> <p>The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.</p

    Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs.</p> <p>Results</p> <p>We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies.</p> <p>Conclusions</p> <p>Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.</p

    Machine learning on normalized protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths.</p> <p>Findings</p> <p>We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%.</p> <p>Conclusions</p> <p>We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.</p

    Determinação do tropismo viral por ensaios genotípicos e\ud fenotípicos em pacientes brasileiros infectados por HIV-1

    Get PDF
    The clinical application of CCR5 antagonists involves first determining the coreceptor usage by the infecting viral strain.\ud Bioinformatics programs that predict coreceptor usage could provide an alternative method to screen candidates for treatment with\ud CCR5 antagonists, particularly in countries with limited financial resources. Thus, the present study aims to identify the best approach\ud using bioinformatics tools for determining HIV-1 coreceptor usage in clinical practice. Proviral DNA sequences and Trofile results\ud from 99 HIV-1-infected subjects under clinical monitoring were analyzed in this study. Based on the Trofile results, the viral variants\ud present were 81.1% R5, 21.4% R5X4 and 1.8% X4. Determination of tropism using a Geno2pheno\ud [coreceptor]\ud analysis with a false\ud positive rate of 10% gave the most suitable performance in this sampling: the R5 and X4 strains were found at frequencies of 78.5%\ud and 28.4%, respectively, and there was 78.6% concordance between the phenotypic and genotypic results. Further studies are needed\ud to clarify how genetic diversity amongst virus strains affects bioinformatics-driven approaches for determining tropism. Although this\ud strategy could be useful for screening patients in developing countries, some limitations remain that restrict the wider application of\ud coreceptor usage tests in clinical practice.A aplicação clínica dos antagonistas de CCR5 envolve em primeiro lugar determinar o uso de co-receptor pela cepa viral infectante. Programas de bioinformática que prevêem o uso co-receptor poderiam fornecer um método alternativo para selecionar candidatos para o tratamento com os antagonistas do CCR5, particularmente em países com poucos recursos financeiros. Assim, o presente estudo teve por objetivo identificar a melhor abordagem utilizando ferramentas de bioinformática para determinar qual o tipo de co-receptor do HIV-1 que poderia ser usado na prática clínica. Sequências de DNA proviral e Trofile resultados a partir de 99 pacientes infectados pelo HIV-1 sob monitorização clínica foram avaliadas. Com base nos resultados do Teste Trofile, as variantes virais presentes eram R5 (81,1%), R5X4 (21,4%) e X4 (1,8%). Determinação do tropismo pela análise do Geno2pheno, com taxa de falso positivos de 10% apresentou desempenho mais adequado para esta amostragem: as cepas R5 e X4 foram encontradas em frequências de 78,5% e 28,4%, respectivamente, e foi de 78,6% a concordância entre os resultados fenotípicos e genotípicos. Mais estudos são necessários para esclarecer como a diversidade genética entre as cepas do vírus afeta abordagens baseadas na determinação do tropismo pelas ferramentas de bioinformática. Embora esta estratégia possa ser útil para o rastreio de pacientes em países em desenvolvimento, permanecem algumas limitações que restringem a aplicação mais ampla para utilização de testes de co-receptor na prática clínica.The financial support from FAPESP (08/58138-0; 08/51265-6; 10/00222-5); FFM; CNPq
    corecore