6 research outputs found

    Metaheurísticas aplicadas a la resolución del problema de ensamblado de fragmentos de ADN

    Get PDF
    En las últimas décadas, importantes avances en la biología molecular y en la tecnología genética subyacente, han provocado un crecimiento inconmensurable en el volumen y variedad de información generada por una vasta comunidad científica. Por ejemplo, el secuenciamiento genético (genome and proteome sequences, en Inglés), la identificación de genes (gene identification), la identificación del perfil de la expresión genética (gene expression profiling) entre otras áreas genéticas, marcaron y marcan la necesidad de involucrar el conocimento de expertos pertenecientes a otras ciencias tales como las matemáticas, las ciencias de la computación, la física y la biología, a los efectos obtener mejores resultados y en menos tiempo. La bioinformática es, entonces, un campo interdisciplinar dedicado a desarrollar técnicas que permitan: analizar secuencias genéticas, identificar y predecir estructuras moleculares, extraer caracter´ısitcas de microarrays de datos, etc.Eje: Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Metaheurísticas aplicadas a la resolución del problema de ensamblado de fragmentos de ADN

    Get PDF
    En las últimas décadas, importantes avances en la biología molecular y en la tecnología genética subyacente, han provocado un crecimiento inconmensurable en el volumen y variedad de información generada por una vasta comunidad científica. Por ejemplo, el secuenciamiento genético (genome and proteome sequences, en Inglés), la identificación de genes (gene identification), la identificación del perfil de la expresión genética (gene expression profiling) entre otras áreas genéticas, marcaron y marcan la necesidad de involucrar el conocimento de expertos pertenecientes a otras ciencias tales como las matemáticas, las ciencias de la computación, la física y la biología, a los efectos obtener mejores resultados y en menos tiempo. La bioinformática es, entonces, un campo interdisciplinar dedicado a desarrollar técnicas que permitan: analizar secuencias genéticas, identificar y predecir estructuras moleculares, extraer caracter´ısitcas de microarrays de datos, etc.Eje: Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Técnicas metaheurísticas avanzadas aplicadas a la resolución de problemas bioinformáticos

    Get PDF
    La finalidad de esta línea de investigación es el estudio y resolución de problemas del area Bioinformática mediante la utilización de métodos inteligentes. Particularmente, nuestro trabajo se enfoca en la resolución de problemas de secuenciamiento de un genoma por medio del diseño e implementación de nuevas técnicas metaheurísticas ya sean basadas en trayectoria como en población. También consideramos la posibilidad de hibridar y/o distribuir estos métodos dependiendo de la complejidad del problema a resolver.Eje: Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides

    Get PDF
    This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability

    SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.</p> <p>Results</p> <p>SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.</p> <p>Conclusion</p> <p>The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.</p

    Konak-patojen protein etkileşiminin hesaplamalı yöntemler ile tahmini

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Türler arası patojen-konak protein etkileşimlerin bilinmesi enfeksiyonel hastalıkların teşhis ve tedavisi için geliştirilecek çözüm stratejileri açısından hayati öneme sahiptir. Etkileşim tespitinde kullanılan deneysel yöntemlerin maliyetli olması ve uzun zaman almasından dolayı proteinler arası etkileşimlerin modellendiği hesaplamalı yöntemlerin bu alanda önemli bir yeri vardır. Hesaplamalı yöntemler, tespit süresinin kısaltılması ve maliyetin düşürülmesine ek olarak deneysel yöntemlerle yanlış tespit edilen etkileşimlerin kontrolünde de kullanılmaktadır. Veri seyrekliği, veri yetersizliği ve doğrulanmış negatif veri setinin olmaması, patojen-konak protein etkileşim tahmini için kullanılan hesaplamalı yöntemlerin ortak problemidir. Bu çalışmada amaç patojen-konak etkileşim tahmin doğruluğunu arttırmak ve veri yetersizliğinden kaynaklanan olumsuzlukları gidermektir. Bu kapsamda genişletilmiş ağ modeli ve lokasyon tabanlı kodlama yöntemleri önerildi. Genişletilmiş ağ modeli türler arası yeterli etkileşim verisinin olmadığı patojen konak etkileşimleri ile patojen ve konak proteinlere ait tür içi etkileşimlerin entegre edilmesi tahmin doğruluğunu arttırır hipotezinden esinlenerek geliştirildi. Lokasyon tabanlı kodlama, proteinlerin amino asit diziliminin kodlandığı bir öznitelik çıkarım yöntemidir. Makine öğrenmesi algoritmalarında patojen konak etkileşim tahmininde başarımı etkileyen faktörlerden biri kullanılan özniteliklerdir. Biyolojik veri tabanlarında proteinlere ait en fazla veri amino asit dizilim bilgisidir. Sadece amino asit dizilimini baz alarak geliştirilen güçlü bir öznitelik çıkarım yöntemi, patojen konak etkileşim tahmin doğruluğunu arttıracaktır. Ayrıca amino asit dizilim bilgisinin kullanılması sayesinde bilinen tüm etkileşimler için öznitelik vektörlerinin daha kolay çıkarılması sağlanır. Tezde protein kodlama ve protein etkileşim tahmini üzerine çalışan araştırmacıların kullanılabileceği, ücretsiz erişilebilen, kullanıcı dostu bir ara yüze sahip web tabanlı PROSES (Protein Sequencebased encoding system) yazılımı geliştirildi. Yazılım özellikle programlama bilgisi olmayan kişiler için faydalıdır. PROSES şu anda Yalova Üniversitesi web sunucusunda yer alan http://proses.yalova.edu.tr adresinde kullanılmaktadır.Knowledge of the pathogen-host protein interactions in the inter species has a vital prospect for a solution strategy to be developed against diagnosis and treatment of infectious diseases. Modeling interactions between proteins has necessitated the development of computational methods in this field, since detection of interactions by experimental methods is both time-consuming and costly. Computational methods are used in decreasing of the detection time and cost; in addition checking of the false detected interactions via experimental methods. Data scarcity, data inadequacy, and negative data sampling are the common problems of computational methods for used in prediction of pathogen-host protein interaction. In this study, the purpose is that prediction accuracy of the pathogen-host interaction increase and negativeness eliminate because of data inadequacy. Within thisframework, extended network model and location based encoding approaches are proposed. Firstly, the extended network model is created by inspired from the hypothesis of that integrating the known protein interactions within host and pathogen organisms improve the success of prediction of unknown pathogen-host interactions. Secondly, location based encoding is feature extraction method which is used for encoding of amino acid sequences. One of the important factors is feature which affects success in prediction of pathogen-host interaction within machine learning algorithms. In biological databases, the most data is the information of amino acid sequence regarding proteins. Prediction accuracy of pathogen-host interaction will be increased by that a robust feature extraction method is developed on the basis amino acidsequence. Furthermore, extraction of feature vectors for all the known interactions are provided in easier way by the sake of using the information of amino acid sequence. In this thesis, PROSES (Protein SequencebasedEncodingSystem) which is a user-friendly interface and freely accessible web server, has been designed for researchers, who are working on the field of protein encoding and prediction of protein interaction. The web server is especially useful for those who are not familiar with programming languages. PROSES is currently being used at http://proses.yalova.edu.tr which is storedin the web server of Yalova University
    corecore