846 research outputs found

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    Many-objectives optimization: a machine learning approach for reducing the number of objectives

    Get PDF
    Solving real-world multi-objective optimization problems using Multi-Objective Optimization Algorithms becomes difficult when the number of objectives is high since the types of algorithms generally used to solve these problems are based on the concept of non-dominance, which ceases to work as the number of objectives grows. This problem is known as the curse of dimensionality. Simultaneously, the existence of many objectives, a characteristic of practical optimization problems, makes choosing a solution to the problem very difficult. Different approaches are being used in the literature to reduce the number of objectives required for optimization. This work aims to propose a machine learning methodology, designated by FS-OPA, to tackle this problem. The proposed methodology was assessed using DTLZ benchmarks problems suggested in the literature and compared with similar algorithms, showing a good performance. In the end, the methodology was applied to a difficult real problem in polymer processing, showing its effectiveness. The algorithm proposed has some advantages when compared with a similar algorithm in the literature based on machine learning (NL-MVU-PCA), namely, the possibility for establishing variable–variable and objective–variable relations (not only objective–objective), and the elimination of the need to define/chose a kernel neither to optimize algorithm parameters. The collaboration with the DM(s) allows for the obtainment of explainable solutions.This research was funded by POR Norte under the PhD Grant PRT/BD/152192/2021. The authors also acknowledge the funding by FEDER funds through the COMPETE 2020 Programme and National Funds through FCT (Portuguese Foundation for Science and Technology) under the projects UIDB/05256/2020, and UIDP/05256/2020, the Center for Mathematical Sciences Applied to Industry (CeMEAI) and the support from the São Paulo Research Foundation (FAPESP grant No 2013/07375-0, the Center for Artificial Intelligence (C4AI-USP), the support from the São Paulo Research Foundation (FAPESP grant No 2019/07665-4) and the IBM Corporation

    Multi-objective optimization of single screw polymer extrusion based on artificial intelligence

    Get PDF
    The performance of the single screw polymer extrusion process depends on the definition of the best set of design variables, including operating conditions and/or geometrical parameters, which can be seen as a multi-objective optimization problem where several objectives and constraints must be satisfied simultaneously. The most efficient way to solve this problem consists in linking a modelling routine with optimization algorithms able to deal with multi-objectives, for example, those based on a population of solutions. This implies that the modelling routine must be run several times, and, thus, the computation time can become expensive, since they are based on the use of sophisticated numerical methods due to the need to obtain reliable results [1]. The aim of this work is to present an alternative based on the use of Artificial Intelligence (AI) techniques in order to reduce the number of modelling evaluations required during the optimization process. This analysis will be based on the use of a data analysis technique, named DAMICORE, able to define important interrelations between all variables related to extrusion and, then, optimize the process [2,3,4]. For that purpose, the results obtained for three practical examples will be presented and discussed. These case studies include the optimization of screw geometrical parameters, barrel grooves section and rotational barrel segment. It will be shown that the results obtained, taking into consideration the design variables, the objectives and the constraints defined, are in agreement with the expected thermomechanical behaviour of the process

    Molecular identification of a cyclodextrin glycosyltransferase-producinmicroorganism and phylogenetic assessment of enzymatic activities

    Get PDF
    Cyclodextrin glycosyltransferases (CGTases) are important enzymes in the biotechnology field because they catalyze starch conversion into cyclodextrins and linear oligosaccharides, which are used in food, pharmaceutical and cosmetic industries. The CGTases are classified according to their product specificity in α-, β-, α/β-and γ-CGTases. As molecular markers are the preferred tool for bacterial identification, we employed six molecular markers (16S rRNA, dnaK, gyrB, recA, rpoB and tufA) to test the identification of a CGTase-producing bacterial strain (DF 9R) in a phylogenetic context. In addition, we assessed the phylogenetic relationship of CGTases along bacterial evolution. The results obtained here allowed us to identify the strain DF 9R as Paenibacillus barengoltzii, and to unveil a complex origin for CGTase types during archaeal and bacterial evolution. We postulate that the α-CGTase activity represents the ancestral type, and that the γ-activity may have derived from β-CGTases.Fil: Caminata Landriel, Soledad. Universidad Nacional de Luján. Departamento de Ciencias Básicas. Area de Química Biológica; ArgentinaFil: Castillo de las Mercedes Julieta. Universidad Nacional de Luján. Departamento de Ciencias Básicas. Area de Química Biológica; ArgentinaFil: Taboga, Oscar Alberto. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Ferrarotti, Susana Alicia. Universidad Nacional de Luján. Departamento de Ciencias Básicas. Area de Química Biológica; ArgentinaFil: Gottlieb, Alexandra Marina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Ecología, Genética y Evolución. Laboratorio de Citogenética y Evolución; ArgentinaFil: Costa, Hernán. Universidad Nacional de Luján. Departamento de Ciencias Básicas. Area de Química Biológica; Argentina. Universidad Nacional de Luján. Instituto de Ecología y Desarrollo Sustentable. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Ecología y Desarrollo Sustentable; Argentin

    Application of artificial intelligence techniques in the optimization of single screw polymer extrusion

    Get PDF
    As with most real optimization problems, polymer processing technologies can be seen as multi-objective optimization problems. Due to the high computation times required by the numerical modelling routines usually available to calculate the values of the objective function, as a function of the decision variables, it is necessary to develop alternative optimization methodologies able to reduce the number of solutions to be evaluated, when compared with the technics normally employed, such as evolutionary algorithms. Therefore, in this work is proposed the use of artificial intelligence based on a data analysis technique designated by DAMICORE surpasses those limitations. An example from single screw polymer extrusion is used to illustrate the efficient use of a methodology proposed.This research was partially funded by NAWA-Narodowa Agencja Wymiany Akademickiej, under grant PPN/ULM/2020/1/00125 and European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No 734205–H2020-MSCA-RISE-2016. The authors also acknowledge the funding by FEDER funds through the COMPETE 2020 Programme and National Funds through FCT (Portuguese Foundation for Science and Technology) under the projects UIDB/05256/2020, and UID-P/05256/2020, the Center for Mathematical Sciences Applied to Industry (CeMEAI) and the support from the São Paulo Research Foundation (FAPESP grant No 2013/07375-0, the Center for Artificial Intelligence (C4AI-USP), the support from the São Paulo Research Foundation (FAPESP grant No 2019/07665-4) and the IBM Corporation

    Use of data analysis techniques for multi-objective optimization of real problems: decision making

    Get PDF
    Most, if not all, real optimization problems can be seen as multi-objective since several objectives are to be satisfied concurrently and are often conflicting. Also, due to the high computation times usually required by the numerical modelling routines available to calculate the values of the objective function, as a function of the decision variables, it is necessary to develop alternative optimization methodologies able to reduce the number of solutions to be evaluated, i.e., if compared with the procedures typically employed, such as evolutionary algorithms. Moreover, in a multi-objective environment, it is also necessary at the end of the optimization process to select a single solution from the pool of optimal non-dominated solutions obtained. Real industrial processes can be characterized by different types of data that can influence assertively its performance. For example, in the industrial process studied here, polymer processing, variables related to operating conditions of the machine, polymer properties and system geometry affect its operation since the thermomechanical environment developed allows obtaining mathematical relationships between these design variables and the objectives to be accomplished. This enables the direct process optimization using those routines to evaluate the solutions proposed by the optimization algorithms. However, this routine must be run several times, implying high computation times due to the sophistication of the numerical codes This work aims to apply Artificial Intelligence based on a data analysis technique, designated by DAMICORE, to surpass those limitations, improve the optimization process and help the selection of the best-equilibrated solution at the end. An example from single screw polymer extrusion is used to illustrate the efficient use of a methodology proposed, with a focus on decision making. Solving Multi-Objective Optimization Problems (MOOP) requires some interaction with a DM, for example, an expert in the field. The aim is to use data analysis techniques to reduce and improve the quality of those interactions, which can be done by integrating optimization methodologies with data analysis tools, i.e., the use of data to drive the optimization. At least, two different possibilities can be applied by data-driven optimization: i) replacement of the original method of calculating the objectives by a metamodel or surrogate, and 2) helping the computer in deciding about the best solutions to the problem. The aim here is to use the DAMICORE framework to facilitate the optimization taking into account the limitations/characteristics referred to above. The DAMICORE structure is based on the estimation of distances by compression algorithms called Normalized Compression Distance (NCD). Then, a Feature Sensitivity Optimization based on Phylogram Analysis (FS-OPA) is used to find the set of principal features related to the real problem environment. The present study focus on two levels of learning, which will be used to study an industrial case study using real data: First-level learning – the aim is to find clusters of variables sharing information, designated by clades, each representing the set of variables with important interactions. The result of this level is a table with a list of variables with a cluster per row. Second-level learning – the application of FS-OPA allows the estimation of the contribution of each clade of variables to the objectives, which is made by determining the distance between the clades of objectives (oclade) to each variable clade (vclade) using the phylogram obtained. These distances are an estimation of the influence of a clade to improve an objective. The results of this level are two different matrices, one with the phylogram distances from vclades to oclades and the second with the relative phylograms distances from each variable to each objective. From a practical point of view, the application of this method to the data of each population of solutions previously obtained during the multi-objective optimization using evolutionary algorithms will allow capturing the interactions between the decision variables and the objectives and, in the end, select the most important objectives to the process. Therefore, the multi-dimensional space, that results from the six objectives existent in the problem solved, can be reduced, which will help the decision maker in selecting in an easy way the solution to be applied in real practice. The results obtained for this practical example are in agreement with the expected thermomechanical behaviour of the process, which demonstrated that AI techniques can be useful in solving practical engineering problems

    ViCTree: an automated framework for taxonomic classification from protein sequences

    Get PDF
    Motivation: The increasing rate of submission of genetic sequences into public databases is providing a growing resource for classifying the organisms that these sequences represent. To aid viral classification, we have developed ViCTree, which automatically integrates the relevant sets of sequences in NCBI GenBank and transforms them into an interactive maximum likelihood phylogenetic tree that can be updated automatically. ViCTree incorporates ViCTreeView, which is a JavaScript-based visualisation tool that enables the tree to be explored interactively in the context of pairwise distance data. Results: To demonstrate utility, ViCTree was applied to subfamily Densovirinae of family Parvoviridae. This led to the identification of six new species of insect virus. Availability: ViCTree is open-source and can be run on any Linux- or Unix-based computer or cluster. A tutorial, the documentation and the source code are available under a GPL3 license, and can be accessed at http://bioinformatics.cvr.ac.uk/victree_web/

    Software for optimization of SNP and PCR-RFLP genotyping to discriminate many genomes with the fewest assays

    Get PDF
    BACKGROUND: Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. RESULTS: A method and SPR Opt (SNP and PCR-RFLP Optimization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. CONCLUSION: This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at

    Host specificity and coevolution of Flavobacteriaceae endosymbionts within the siphonous green seaweed Bryopsis

    Get PDF
    The siphonous green seaweed Bryopsis harbors complex intracellular bacterial communities. Previous studies demonstrated that certain species form close, obligate associations with Flavobacteriaceae. A predominant imprint of host evolutionary history on the presence of these bacteria suggests a highly specialized association. In this study we elaborate on previous results by expanding the taxon sampling and testing for host–symbiont coevolution Therefore, we optimized a PCR protocol to directly and specifically amplify Flavobacteriaceae endosymbiont 16S rRNA gene sequences, which allowed us to screen a large number of algal samples without the need for cultivation or surface sterilization. We analyzed 146 Bryopsis samples, and 92 additional samples belonging to the Bryopsidales and other orders within the class Ulvophyceae. Results indicate that the Flavobacteriaceae endosymbionts are restricted to Bryopsis, and only occur within specific, warm-temperate and tropical clades of the genus. Statistical analyses (AMOVA) demonstrate a significant non-random host–symbiont association. Comparison of bacterial 16S rRNA and Bryopsis rbcL phylogenies, however, reveal complex host–symbiont evolutionary associations, whereby closely related hosts predominantly harbor genetically similar endosymbionts. Bacterial genotypes are rarely confined to a single Bryopsis species and most Bryopsis species harbored several Flavobacteriaceae, obscuring a clear pattern of coevolution
    • …
    corecore