362 research outputs found

    Highly Accurate Fragment Library for Protein Fold Recognition

    Get PDF
    Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies. First, we identify a protein structural fragment library (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural “keywords” that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large-scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy. Then, based on Frag-k, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multimodal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolution neural network (CNN) to classify the fragment vectors into the corresponding folds. Our results show that DeepFrag-k yields 92.98% accuracy in predicting the top-100 most popular fragments, which can be used to generate discriminative fragment feature vectors to improve protein fold recognition

    Computational Design of Stable and Soluble Biocatalysts

    Get PDF
    Natural enzymes are delicate biomolecules possessing only marginal thermodynamic stability. Poorly stable, misfolded, and aggregated proteins lead to huge economic losses in the biotechnology and biopharmaceutical industries. Consequently, there is a need to design optimized protein sequences that maximize stability, solubility, and activity over a wide range of temperatures and pH values in buffers of different composition and in the presence of organic cosolvents. This has created great interest in using computational methods to enhance biocatalysts' robustness and solubility. Suitable methods include (i) energy calculations, (ii) machine learning, (iii) phylogenetic analyses, and (iv) combinations of these approaches. We have witnessed impressive progress in the design of stable enzymes over the last two decades, but predictions of protein solubility and expressibility are scarce. Stabilizing mutations can be predicted accurately using available force fields, and the number of sequences available for phylogenetic analyses is growing. In addition, complex computational workflows are being implemented in intuitive web tools, enhancing the quality of protein stability predictions. Conversely, solubility predictors are limited by the lack of robust and balanced experimental data, an inadequate understanding of fundamental principles of protein aggregation, and a dearth of structural information on folding intermediates. Here we summarize recent progress in the development of computational tools for predicting protein stability and solubility, critically assess their strengths and weaknesses, and identify apparent gaps in data and knowledge. We also present perspectives on the computational design of stable and soluble biocatalysts

    The AddACO: A bio-inspired modified version of the ant colony optimization algorithm to solve travel salesman problems

    Get PDF
    The Travel Salesman Problem (TSP) consists in finding the minimal-length closed tour that connects the entire group of nodes of a given graph. We propose to solve such a combinatorial optimization problem with the AddACO algorithm: it is a version of the Ant Colony Optimization method that is characterized by a modified probabilistic law at the basis of the exploratory movement of the artificial insects. In particular, the ant decisional rule is here set to amount in a linear convex combination of competing behavioral stimuli and has therefore an additive form (hence the name of our algorithm), rather than the canonical multiplicative one. The AddACO intends to address two conceptual shortcomings that characterize classical ACO methods: (i) the population of artificial insects is in principle allowed to simultaneously minimize/maximize all migratory guidance cues (which is in implausible from a biological/ecological point of view) and (ii) a given edge of the graph has a null probability to be explored if at least one of the movement trait is therein equal to zero, i.e., regardless the intensity of the others (this in principle reduces the exploratory potential of the ant colony). Three possible variants of our method are then specified: the AddACO-V1, which includes pheromone trail and visibility as insect decisional variables, and the AddACO-V2 and the AddACO-V3, which in turn add random effects and inertia, respectively, to the two classical migratory stimuli. The three versions of our algorithm are tested on benchmark middle-scale TPS instances, in order to assess their performance and to find their optimal parameter setting. The best performing variant is finally applied to large-scale TSPs, compared to the naive Ant-Cycle Ant System, proposed by Dorigo and colleagues, and evaluated in terms of quality of the solutions, computational time, and convergence speed. The aim is in fact to show that the proposed transition probability, as long as its conceptual advantages, is competitive from a performance perspective, i.e., if it does not reduce the exploratory capacity of the ant population w.r.t. the canonical one (at least in the case of selected TSPs). A theoretical study of the asymptotic behavior of the AddACO is given in the appendix of the work, whose conclusive section contains some hints for further improvements of our algorithm, also in the perspective of its application to other optimization problems

    Explicit Building Block Multiobjective Evolutionary Computation: Methods and Applications

    Get PDF
    This dissertation presents principles, techniques, and performance of evolutionary computation optimization methods. Concentration is on concepts, design formulation, and prescription for multiobjective problem solving and explicit building block (BB) multiobjective evolutionary algorithms (MOEAs). Current state-of-the-art explicit BB MOEAs are addressed in the innovative design, execution, and testing of a new multiobjective explicit BB MOEA. Evolutionary computation concepts examined are algorithm convergence, population diversity and sizing, genotype and phenotype partitioning, archiving, BB concepts, parallel evolutionary algorithm (EA) models, robustness, visualization of evolutionary process, and performance in terms of effectiveness and efficiency. The main result of this research is the development of a more robust algorithm where MOEA concepts are implicitly employed. Testing shows that the new MOEA can be more effective and efficient than previous state-of-the-art explicit BB MOEAs for selected test suite multiobjective optimization problems (MOPs) and U.S. Air Force applications. Other contributions include the extension of explicit BB definitions to clarify the meanings for good single and multiobjective BBs. A new visualization technique is developed for viewing genotype, phenotype, and the evolutionary process in finding Pareto front vectors while tracking the size of the BBs. The visualization technique is the result of a BB tracing mechanism integrated into the new MOEA that enables one to determine the required BB sizes and assign an approximation epistasis level for solving a particular problem. The culmination of this research is explicit BB state-of-the-art MOEA technology based on the MOEA design, BB classifier type assessment, solution evolution visualization, and insight into MOEA test metric validation and usage as applied to test suite, deception, bioinformatics, unmanned vehicle flight pattern, and digital symbol set design MOPs

    Pharmacogenetic modeling of human cytochrome P450 2D6; On the force of variation in inducing toxicity

    Get PDF
    Understanding the way in which drugs are metabolized by CYP2D6 and hence the underlying mechanisms that define potential toxicity is crucial to avoid adverse reactions. The high occurrence of CYP2D6 polymorphs enhances the complexity of the toxicity assessment of a drug candidate and should be tackled from early drug discovery phase on. The research described in this PhD thesis has been performed to provide novel fundamental insights regarding the metabolic activity of CYP2D6 wild-type and several polymorphs using various state-of-the-art in silico techniques. The results of the CYP2D6-focused studies enhance our knowledge regarding the enzyme particularities, and can be used to accelerate the development of CYP2D6 modeling tools with more accurate and reliable predictions

    Recombinant expression of insoluble enzymes in Escherichia coli: a systematic review of experimental design and its manufacturing implications.

    Get PDF
    Recombinant enzyme expression in Escherichia coli is one of the most popular methods to produce bulk concentrations of protein product. However, this method is often limited by the inadvertent formation of inclusion bodies. Our analysis systematically reviews literature from 2010 to 2021 and details the methods and strategies researchers have utilized for expression of difficult to express (DtE), industrially relevant recombinant enzymes in E. coli expression strains. Our review identifies an absence of a coherent strategy with disparate practices being used to promote solubility. We discuss the potential to approach recombinant expression systematically, with the aid of modern bioinformatics, modelling, and 'omics' based systems-level analysis techniques to provide a structured, holistic approach. Our analysis also identifies potential gaps in the methods used to report metadata in publications and the impact on the reproducibility and growth of the research in this field.Non

    Passive Micromixers

    Get PDF
    Micro-total analysis systems and lab-on-a-chip platforms are widely used for sample preparation and analysis, drug delivery, and biological and chemical syntheses. A micromixer is an important component in these applications. Rapid and efficient mixing is a challenging task in the design and development of micromixers. The flow in micromixers is laminar, and, thus, the mixing is primarily dominated by diffusion. Recently, diverse techniques have been developed to promote mixing by enlarging the interfacial area between the fluids or by increasing the residential time of fluids in the micromixer. Based on their mixing mechanism, micromixers are classified into two types: active and passive. Passive micromixers are easy to fabricate and generally use geometry modification to cause chaotic advection or lamination to promote the mixing of the fluid samples, unlike active micromixers, which use moving parts or some external agitation/energy for the mixing. Many researchers have studied various geometries to design efficient passive micromixers. Recently, numerical optimization techniques based on computational fluid dynamic analysis have been proven to be efficient tools in the design of micromixers. The current Special Issue covers new mechanisms, design, numerical and/or experimental mixing analysis, and design optimization of various passive micromixers

    Esnek atölye tipi hücre çizelgeleme problemleri için çok amaçlı matematiksel model ve genetik algoritma ile çözüm önerisi

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Günümüz rekabetçi iş ortamında, müşteriler daha düşük maliyetle daha yüksek kalitede çeşitli ürünleri satın almak istemektedir. İmalat firmaları, talep çeşitliliğini karşılamak için yüksek derecede ürün çeşitliliğine ve küçük imalat parti büyüklüğüne ihtiyaç duymaktadır. Üretimdeki ürün çeşitlilikleri uzun hazırlık ve taşıma süreleri, karmaşık çizelgeleme problemleri gibi birçok probleme neden olmaktadır. Geleneksel imalat sistemleri, bu tip değişikliklere cevap vermede yeterince esnek değilken Hücresel Üretim Sistemleri üreticilerin bu ihtiyaçlarına cevap verebilecek özelliklere sahiptir. Ayrıca gerçek hayat problemlerinin çoğunda, bir parçanın bazı ya da bütün operasyonları birden fazla makinede işlem görebilmekte ve bazen de bu operasyonlar bir makineyi ya da iş merkezini birden fazla kez ziyaret etmektedir. Bu seçenek sisteme esneklik kazandırırken bu kadar karmaşık bir üretim sisteminin başarılı ve doğru bir şekilde işletilebilmesi kaynakların etkin kullanılmasını da gerektirmektedir. Bu çalışma, istisnai parçaları, hücrelerarası hareketleri, hücrelerarası taşıma sürelerini, sıra bağımlı parça ailesi hazırlık sürelerini ve yeniden işlem gören parçaları dikkate alarak hücresel imalat ortamında esnek atölye tipi çizelgeleme probleminin çözümüne dair bir matematiksel model ve çözüm yöntemi sunmaktadır. Mevcut bilgilerimiz ışığında yapılan bu çalışma Esnek Atölye Tipi Hücre Çizelgeleme Probleminde (EATHÇP) çok amaçlı matematiksel model ve meta-sezgiselinin kullanımı için ilk girişimdir. Bununla birlikte gerçek hayat uygulamaları için EATHÇP süreci, birçok çelişen amacı dikkate almayı gerektirdiği için ele alınan skalerleştirme metodu pratik uygulama ve teorik araştırma açısından oldukça önemlidir. Önerilen karma tamsayılı doğrusal olmayan matematiksel modelle küçük ve orta boyutlu problemler çözülebilmektedir. Büyük boyutlu problemlerin çözümü, doğrusal olmayan modellerle makul zamanlarda olamayacağı ya da çok uzun süreceği için konik skalerleştirmeli çok amaçlı matematiksel modeli kullanan bir Genetik Algoritma (GA) meta-sezgisel çözüm yöntemi önerilmiştir. GA yaklaşımının en iyi veya en iyiye yakın çözüme ulaşmasına etki eden parametrelerin en iyi kombinasyonu belirlemek amacı ile bir deney tasarımı gerçekleştirilmiştir. Bu tez çalışması için Eskişehir Tülomsaş Motor Fabrikası'nda bir uygulama çalışması yürütülmüştür. Yürütülen bu çalışma, altı farklı amaç ağırlık değerleri kullanılarak hem konik skalerleştirmeli GA yaklaşımı ile hem de ağırlıklı toplam skalerleştirmeli GA yaklaşımı ile çözülmüştür. Amaç ağırlıklarının beşinde çok amaçlı konik skalerleştirme GA yaklaşımının daha baskın sonuçlara ulaşabildiği vurgulanmıştır. Ayrıca, önerilen çok amaçlı modelin gerçek hayat problemleri için de makul zamanda uygun çözümler üretebildiği gösterilmiştir.In today's highly competitive business environment, customers desire to buy various products with higher quality at lower costs. Manufacturing firms require a high degree of product variety and small manufacturing lot sizes to meet the demand variability. The product variations in manufacturing cause many problems such as lengthy setup and transportation times, complex scheduling. Cellular Manufacturing Systems contain the characteristics, which will respond to the needs of manufacturers, even though Conventional Manufacturing Systems are not flexible enough to respond to changes. In addition, in most real life manufacturing problems, some or all operations of a part can be processed on more than one machine, and sometimes operations may visit a machine or work center more than once. It is necessary to use resources effectively in order to run such a complex production system successfully. In this study, a mathematical model and a solution approach that deals with a flexible job shop scheduling problem in cellular manufacturing environment is proposed by taking into consideration exceptional parts, intercellular moves, intercellular transportation times, sequence-dependent family setup times, and recirculation. To the best of our knowledge, this is the first attempt to use multi-objective mathematical model and meta-heuristic approach for a Flexible Job Shop Cell Scheduling Problem (FJCSP). However, in the real-life applications, the scalarization method considered is highly important in terms of theoretical research and practical application because the FJCSP process is not easy because of many conflicting objectives. The proposed mixed integer non-linear model can be used for solving small and middle scaled problems. Solution of large scaled problems is not possible in reasonable time or takes too long time, so a Genetic Algorithm (GA) meta-heuristic approach that uses a multi-objective mathematical model with conic scalarization has been presented. An experimental design was used to determine the best combination of parameters which are affected performance of genetic algorithm to achieve optimum or sub-optimum solution. In this thesis study, a case study was conducted in Tülomsaş Locomotive and Engine Factory in Eskişehir. This study was solved by using both conic scalarization GA approach and weighted sum scalarization GA approach with six different weights of objective. It is emphasized that the multi-objective conic scalarization GA approach has better quality than other approach for five different weights of objective. In addition, it has been shown that the multi-objective model could also obtain optimum results in reasonable time for the real-world problems

    Mining of soluble enzymes from genomic databases

    Get PDF
    Enzymy jsou proteiny urychlující chemické reakce s velkým potenciálem pro farmaceutický a obecně chemický průmysl. Enzymatická funkce je obvykle zajištěna několika nepostradatelnými aminokyselinami, které tvoří tzv. aktivní místo, kde se odehrává chemická reakce. V této práci jsou prezentovány dva integrované softwarové nástroje pro dolování a racionální výběr nových rozpustných enzymů - EnzymeMiner a SoluProt.  EnzymeMiner slouží k hledání nových enzymů. Na vstupu vyžaduje jednu nebo více sekvencí zvoleného enzymu spolu se seznamem klíčových aminokyselin. Tento seznam slouží k zvýšení pravděpodobnosti, že nalezený enzym bude mít podobnou funkci jako vstupní enzym. Výstupem EnzymeMineru je množina anotovaných sekvencí nalezených v databázi. Za účelem ulehčení výběru několika málo kandidátů pro experimentální ověření v laboratoři integruje EnzymeMiner anotace z dostupných databází - informaci o zdrojovém organismu a prostředí, ve kterém se vyskytuje, a informaci o proteinových doménách, ze kterých se enzym skládá. Hlavním kritériem pro výběr kandidátů je rozpustnost predikovaná druhým prezentovaným nástrojem, SoluProtem. SoluProt je metoda založená na strojovém učení, která predikuje heterologní rozpustnou expresi proteinu v organismu Escherichia coli . Vstupem je sekvence a výstupem je pravděpodobnost, že protein bude exprimován v rozpustné formě. SoluProt využívá model gradient boosting machine a byl trénován na datové sadě odvozené od databáze TargetTrack. Při srovnání na vyvážené nezávislé datové sadě odvozené z databáze NESG dosáhl SoluProt přesnosti 58,5 % a hodnoty AUC 0,62, čímž lehce převyšuje ostatní existující nástroje. Nástroje EnzymeMiner i SoluProt jsou často využívány řadou uživatelů z oblasti proteinového inženýrství za účelem hledání nových rozpustných biokatalyzátorů chemických reakcí. Ty mají velký potenciál snížit energetickou náročnost a ekologickou zátěž mnoha průmyslových procesů.Enzymes are proteins accelerating chemical reactions, which makes them attractive targets for both pharmaceutical and industrial applications. The enzyme function is mediated by several essential amino acids which form the optimal chemical environment to catalyse the reaction. In this work, two integrated bioinformatics tools for mining and rational selection of novel soluble enzymes, EnzymeMiner and SoluProt, are presented. EnzymeMiner uses one or more enzyme sequences as input along with a description of essential residues to search the protein database. The description of essential amino acids is used to increase the probability of similar enzymatic function. EnzymeMiner output is a set of annotated database hits. EnzymeMiner integrates taxonomic, environmental, and protein domain annotations to facilitate selection of promising targets for experiments. The main prioritization criterion is solubility predicted by the second tool being presented, SoluProt.  SoluProt is a machine-learning method for the prediction of soluble protein expression in Escherichia coli . The input is a protein sequence and the output is the probability of such protein to be soluble. SoluProt exploits a gradient boosting machine to decide on the output prediction class. The tool was trained on TargetTrack database. When evaluated against a balanced independent test set derived from the NESG database, SoluProt accuracy was 58.5% and its AUC 0.62, slightly exceeding those of a suite of alternative solubility prediction tools. Both EnzymeMiner and SoluProt are frequently used by the protein engineering community to find novel soluble biocatalysts for chemical reactions. These have a great potential to decrease energetic consumption and environmental burden of many industrial chemical processes.
    corecore